[IQUG] [iqug] Longer load times for larger files

Mumy, Mark mark.mumy at sap.com
Tue Oct 8 09:40:55 MST 2019


With IQ 16, we will parallel load a single file or multiple files in the FROM or USING FILE syntax.

I do agree that it is odd to see a 1m row file load in 2-5 seconds while a file that is 25x larger take 100x longer.  Give or take.

To gauge relative performance difference, look at it this way…

  *   Load 100m rows
     *   100 files of 1m rows each (1 load statement)
     *   10 files of 10m each (1 load statement)
     *   1 file with 100m rows (1 load statement)
  *   That will give you a real comparison to load the same data, using different methods
  *   You could even compare with these changes to gauge the difference between single load and multiple load statements:
     *   100 files of 1m rows each (100 load statements)
     *   10 files of 10m each (10 load statements)

I am a big fan of the trickle approach.  If you can generate data of 1m rows and load it sooner, then go with that approach.   Call it a microbatch for lack of a better term.  We do this quite a lot in the sensor/IOT/telco world where the data is constantly coming in.  Queue to a certain size (or time) then load.  While loading, queue up again.  If there are multiple files, then load them as a single statement.

Make sense?

Mark

Mark Mumy
SAP Platform and Technologies Global Center of Excellence
M +1 347-820-2136 | E mark.mumy at sap.com<mailto:mark.mumy at sap.com>
My Blogs: https://blogs.sap.com/author/markmumy/

https://sap.na.pgiconnect.com/I825063
Conference tel: 18663127353,,8035340905#

From: "iqug-bounces at iqug.org" <iqug-bounces at iqug.org> on behalf of David Louie <David.Louie at blackrock.com>
Date: Tuesday, October 8, 2019 at 10:58 AM
To: IQ Group <iqug at dssolutions.com>
Subject: [IQUG] [iqug] Longer load times for larger files

We made some changes to the file load size for our 30 TB /270 billion row table to reduce the frequency of loading small files in favor of less frequent larger files.

This was done in the past and with the reduction of the number of loads in the past by consolidating files either thru concatenation in the load table or just creating large files we did see reduction in overall load on our IQ server.

What we are noticing is that files which we had fixed at 1 million rows are loading more efficiently than much larger files. (20 to 36 times larger) for this particular table.  I think it’s because of the size of the table ( 30 TB) that anything 20 to 36 times the fixed 1 million row files are inefficient.

The reason we have diff files sizes is we have multiple sources that write to this table and each source decides how large a file to send to IQ.

At this point I want to recommend that we just cap our file sizes at 1 million rows so while we will increase the number of loads we will be reducing the duration and less load on the server during the larger loads.

Does this make sense?

David


Loading timings


auxiliary_data.20191007.log:10/7/2019 11:28:49  Successfully inserted 1000000 row of gen_stat in 5.134363 seconds
auxiliary_data.20191007.log:10/7/2019 11:29:57  Successfully inserted 1000000 row of gen_stat in 64.551229 seconds
auxiliary_data.20191007.log:10/7/2019 11:30:20  Successfully inserted 1000000 row of gen_stat in 3.23862 seconds
auxiliary_data.20191007.log:10/7/2019 11:30:27  Successfully inserted 1000000 row of gen_stat in 3.652991 seconds
auxiliary_data.20191007.log:10/7/2019 11:31:05  Successfully inserted 1000000 row of gen_stat in 35.782272 seconds
auxiliary_data.20191007.log:10/7/2019 11:31:11  Successfully inserted 1000000 row of gen_stat in 2.616213 seconds
auxiliary_data.20191007.log:10/7/2019 11:32:16  Successfully inserted 1000000 row of gen_stat in 61.51385 seconds
auxiliary_data.20191007.log:10/7/2019 11:32:35  Successfully inserted 1000000 row of gen_stat in 2.888681 seconds
auxiliary_data.20191007.log:10/7/2019 11:32:41  Successfully inserted 1000000 row of gen_stat in 2.665366 seconds
auxiliary_data.20191007.log:10/7/2019 11:36:35  Successfully inserted 1000000 row of gen_stat in 31.066439 seconds
auxiliary_data.20191007.log:10/7/2019 12:54:48  Successfully inserted 23009199 row of gen_stat in 4656.094913 seconds
auxiliary_data.20191007.log:10/7/2019 13:18:11  Successfully inserted 25031477 row of gen_stat in 1216.772118 seconds
auxiliary_data.20191007.log:10/7/2019 14:41:31  Successfully inserted 30510111 row of gen_stat in 4950.109965 seconds
auxiliary_data.20191007.log:10/7/2019 15:14:48  Successfully inserted 26423283 row of gen_stat in 1789.997015 seconds
auxiliary_data.20191007.log:10/7/2019 15:29:50  Successfully inserted 29257762 row of gen_stat in 854.697949 seconds
auxiliary_data.20191007.log:10/7/2019 16:10:32  Successfully inserted 26753405 row of gen_stat in 2403.363381 seconds
auxiliary_data.20191007.log:10/7/2019 16:43:05  Successfully inserted 31472001 row of gen_stat in 1707.108822 seconds
auxiliary_data.20191007.log:10/7/2019 17:16:02  Successfully inserted 35815949 row of gen_stat in 1883.786133 seconds
auxiliary_data.20191007.log:10/7/2019 17:39:15  Successfully inserted 32102350 row of gen_stat in 1313.394245 seconds
auxiliary_data.20191007.log:10/7/2019 17:59:51  Successfully inserted 29479916 row of gen_stat in 1186.562292 seconds
auxiliary_data.20191007.log:10/7/2019 19:05:58  Successfully inserted 26585169 row of gen_stat in 3881.658904 seconds
auxiliary_data.20191007.log:10/7/2019 19:25:22  Successfully inserted 22415291 row of gen_stat in 991.608469 seconds
auxiliary_data.20191007.log:10/7/2019 19:52:58  Successfully inserted 23670737 row of gen_stat in 1625.757705 seconds
auxiliary_data.20191007.log:10/7/2019 20:14:24  Successfully inserted 23558831 row of gen_stat in 1251.106106 seconds
auxiliary_data.20191007.log:10/7/2019 21:08:46  Successfully inserted 23013321 row of gen_stat in 3206.305238 seconds
auxiliary_data.20191007.log:10/7/2019 22:14:37  Successfully inserted 21923090 row of gen_stat in 3833.625858 seconds
auxiliary_data.20191007.log:10/7/2019 22:55:43  Successfully inserted 24264485 row of gen_stat in 1923.370118 seconds
auxiliary_data.20191007.log:10/7/2019 23:23:26  Successfully inserted 36374625 row of gen_stat in 1500.935426 seconds
auxiliary_data.20191007.log:10/7/2019 23:39:57  Successfully inserted 22965001 row of gen_stat in 940.567729 seconds
auxiliary_data.20191007.log:10/7/2019 23:51:11  Successfully inserted 16731483 row of gen_stat in 562.661122 seconds
auxiliary_data.20191008.log:10/8/2019 0:08:58  Successfully inserted 16745384 row of gen_stat in 1043.87225 seconds
auxiliary_data.20191008.log:10/8/2019 0:23:23  Successfully inserted 15975122 row of gen_stat in 759.971957 seconds
auxiliary_data.20191008.log:10/8/2019 0:50:38  Successfully inserted 21113760 row of gen_stat in 1601.839543 seconds
auxiliary_data.20191008.log:10/8/2019 1:34:07  Successfully inserted 24944490 row of gen_stat in 2566.978859 seconds
auxiliary_data.20191008.log:10/8/2019 2:00:28  Successfully inserted 19789594 row of gen_stat in 1541.349109 seconds
auxiliary_data.20191008.log:10/8/2019 2:15:42  Successfully inserted 19249616 row of gen_stat in 882.372392 seconds
auxiliary_data.20191008.log:10/8/2019 2:16:13  Successfully inserted 1000000 row of gen_stat in 24.020464 seconds
auxiliary_data.20191008.log:10/8/2019 2:16:54  Successfully inserted 1000000 row of gen_stat in 35.055039 seconds
auxiliary_data.20191008.log:10/8/2019 2:17:20  Successfully inserted 1000000 row of gen_stat in 15.729922 seconds
auxiliary_data.20191008.log:10/8/2019 2:17:26  Successfully inserted 1000000 row of gen_stat in 3.178688 seconds
auxiliary_data.20191008.log:10/8/2019 2:17:32  Successfully inserted 1000000 row of gen_stat in 3.382881 seconds
auxiliary_data.20191008.log:10/8/2019 2:17:39  Successfully inserted 1000000 row of gen_stat in 4.898126 seconds
auxiliary_data.20191008.log:10/8/2019 2:17:43  Successfully inserted 1000000 row of gen_stat in 3.093337 seconds
auxiliary_data.20191008.log:10/8/2019 2:17:49  Successfully inserted 1000000 row of gen_stat in 3.119656 seconds
auxiliary_data.20191008.log:10/8/2019 2:17:55  Successfully inserted 1000000 row of gen_stat in 2.661561 seconds
auxiliary_data.20191008.log:10/8/2019 2:18:07  Successfully inserted 1000000 row of gen_stat in 8.343856 seconds
auxiliary_data.20191008.log:10/8/2019 2:19:11  Successfully inserted 1000000 row of gen_stat in 2.659806 seconds
auxiliary_data.20191008.log:10/8/2019 2:19:18  Successfully inserted 1000000 row of gen_stat in 3.695008 seconds
auxiliary_data.20191008.log:10/8/2019 2:19:24  Successfully inserted 1000000 row of gen_stat in 3.428878 seconds
auxiliary_data.20191008.log:10/8/2019 2:19:29  Successfully inserted 1000000 row of gen_stat in 2.594805 seconds
auxiliary_data.20191008.log:10/8/2019 2:20:00  Successfully inserted 1000000 row of gen_stat in 26.374229 seconds
auxiliary_data.20191008.log:10/8/2019 2:20:57  Successfully inserted 1000000 row of gen_stat in 41.625754 seconds

This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.
For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://iqug.org/pipermail/iqug/attachments/20191008/15269839/attachment-0001.html>


More information about the IQUG mailing list