[IQUG] [iqug] Sharding my largest IQ table approach

Ron Watkins rwatkins at dssolutions.com
Wed Oct 9 13:08:01 MST 2019


Making a multiple step query where you query individual table segments with the results being added to a temp table, then you can do basic selects, groups or distincts from the temp table.

Ron

 

From: iqug-bounces at iqug.org [mailto:iqug-bounces at iqug.org] On Behalf Of Louie, David
Sent: Wednesday, October 9, 2019 9:52 AM
To: Mumy, Mark; Mark Mumy
Cc: iqug at dssolutions.com
Subject: Re: [IQUG] [iqug] Sharding my largest IQ table approach

 

Hi Mark,

 

I just tested selects on the union all view and seeing for the most part all select (singleton) are running around the same time as the table selects.

 

As you predicted running joins which involve the view and other tables are not running at all.   The join version to the actual table took secs and the joins to the view hangs for 1 hour and I killed it as it’s apparent it’s not going to work.

 

Is there any recourse to this other than to not join to the view?

 

Thanks

David

 

 

From: Mumy, Mark [mailto:mark.mumy at sap.com] 
Sent: Thursday, August 15, 2019 4:56 PM
To: Louie, David <David.Louie at blackrock.com>; Mark Mumy <markdmumy at gmail.com>
Cc: iqug at dssolutions.com
Subject: Re: [IQUG] [iqug] Sharding my largest IQ table approach

 

External Email: Use caution with links and attachments

OK, so that’s good then.  You eliminate most of the complexities with sharding.  You could still get into some performance trouble, but it is much rarer when you don’t have joins.

 

Mark

 

Mark Mumy

SAP Platform and Technologies Global Center of Excellence

M +1 347-820-2136 | E  <mailto:mark.mumy at sap.com> mark.mumy at sap.com

My Blogs:  <https://urldefense.proofpoint.com/v2/url?u=https-3A__blogs.sap.com_author_markmumy_&d=DwMGaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=hvVkP5y30w3_fHBwXsoZ2pqbwQXietniE01Z666b0tE&m=SmTPBmBfWc2phwaZexVRx5mjJlXZUGwIUQse6bC5Fk0&s=IqPYxfejrKwu-6zNNBjCBuaarox0HU11FK1_PKP9S_M&e=> https://blogs.sap.com/author/markmumy/

                      

 <https://urldefense.proofpoint.com/v2/url?u=https-3A__sap.na.pgiconnect.com_I825063&d=DwMGaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=hvVkP5y30w3_fHBwXsoZ2pqbwQXietniE01Z666b0tE&m=SmTPBmBfWc2phwaZexVRx5mjJlXZUGwIUQse6bC5Fk0&s=VDmJuosymKK0Q_sKFRzuzaW_H6-dJdxItvNY4D2Da-U&e=> https://sap.na.pgiconnect.com/I825063

Conference tel: 18663127353,,8035340905#

 

From: David Louie <David.Louie at blackrock.com>
Date: Thursday, August 15, 2019 at 3:50 PM
To: Mark Mumy <mark.mumy at sap.com>, "markdmumy at gmail.com" <markdmumy at gmail.com>
Cc: IQ Group <iqug at dssolutions.com>
Subject: RE: [IQUG] [iqug] Sharding my largest IQ table approach

 

I see Mark.  Fortunately the table we want to shard will never require a join.

 

However we will look into the 16.1 zone map feature.

 

Thanks

David

 

 

From: Mumy, Mark [mailto:mark.mumy at sap.com] 
Sent: Thursday, August 15, 2019 4:39 PM
To: Louie, David <David.Louie at blackrock.com>; Mark Mumy <markdmumy at gmail.com>
Cc: iqug at dssolutions.com
Subject: Re: [IQUG] [iqug] Sharding my largest IQ table approach

 

External Email: Use caution with links and attachments

Sharding is simply taking the large object and splitting it up into smaller ones.  Doesn’t matter if it is 2 or 200.  These shards are stiched back together, as you describe, via a UNION ALL view.

 

Suppose you need to join a table to that view:  select * from small_table, shard_view where small_table.field = shard_view.field

 

Fairly simple query.  The only way to perform this, though, is to join the small_table to each and every table in the shard_view.  OK, not so bad as IQ tries to do this in parallel.

 

But what if I change the query slightly:

select *, count(*) from small_table, shard_view where small_table.field = shard_view.field

group by small_table.field2

 

Now, IQ has to group the sharded data by a field that the sharded data is not sharded by.  This is an immensely more complex query that will take a lot of time, compute, and store resources.

 

No matter what, sharding can lead to poor performance.  With my example, though, we can help alleviate some of that impact by making sure that nearly all data is in a single table.

 

Ultimately, though, using IQ 16.1 SP03 or SP04 would allow for the best flexibility as you wouldn’t be forced to shard the table.  You could possibly use the zone map feature and reduce the HG index count to keep load performance up without losing performance.

 

Mark

 

Mark Mumy

SAP Platform and Technologies Global Center of Excellence

M +1 347-820-2136 | E  <mailto:mark.mumy at sap.com> mark.mumy at sap.com

My Blogs:  <https://urldefense.proofpoint.com/v2/url?u=https-3A__blogs.sap.com_author_markmumy_&d=DwMGaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=hvVkP5y30w3_fHBwXsoZ2pqbwQXietniE01Z666b0tE&m=zOseCgizu2EhwiBzcMUTPHQlGNDNmxZePytUl-NzqNQ&s=iuMdUfSJkLiE3U0aa6Gqzlh46m2LRffgPCb1DNR_qmI&e=> https://blogs.sap.com/author/markmumy/

                      

 <https://urldefense.proofpoint.com/v2/url?u=https-3A__sap.na.pgiconnect.com_I825063&d=DwMGaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=hvVkP5y30w3_fHBwXsoZ2pqbwQXietniE01Z666b0tE&m=zOseCgizu2EhwiBzcMUTPHQlGNDNmxZePytUl-NzqNQ&s=vLBGv7k2k8IC1cw6b5eywwTN4feyopEP675HQ1mY2pk&e=> https://sap.na.pgiconnect.com/I825063

Conference tel: 18663127353,,8035340905#

 

From: David Louie <David.Louie at blackrock.com>
Date: Thursday, August 15, 2019 at 3:27 PM
To: Mark Mumy <mark.mumy at sap.com>, "markdmumy at gmail.com" <markdmumy at gmail.com>
Cc: IQ Group <iqug at dssolutions.com>
Subject: RE: [IQUG] [iqug] Sharding my largest IQ table approach

 

Thanks Mark.

 

What is the difference between ‘sharding’ and the current and history approach which I originally suggested.

 

Would sharding imply more than 2 tables in the view which brings them all together and hence would be more efficient for queries vs sharding across more than 2 tables?

 

-David

 

 

From: Mumy, Mark [mailto:mark.mumy at sap.com] 
Sent: Thursday, August 15, 2019 4:08 PM
To: Louie, David <David.Louie at blackrock.com>; Mark Mumy <markdmumy at gmail.com>
Cc: iqug at dssolutions.com
Subject: Re: [IQUG] [iqug] Sharding my largest IQ table approach

 

External Email: Use caution with links and attachments

I would skip 16.0 and go to 16.1.  16.0 goes end of life quite soon, so that in the next 6 months you will be doing another upgrade.

 

According to the IQ PAM <https://urldefense.proofpoint.com/v2/url?u=https-3A__apps.support.sap.com_sap-28bD1lbiZjPTAwMQ-3D-3D-29_support_pam_pam.html-3Fsmpsrv-3Dhttps-253A-252F-252Fwebsmp102.sap-2Dag.de-23pvnr-3D67838200100900005877-26pt-3Dg-257Cd-26ainstnr-3D67838200104900016656-26ts-3D0&d=DwMGaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=hvVkP5y30w3_fHBwXsoZ2pqbwQXietniE01Z666b0tE&m=574XgbVqNTENu3ZFuUIvW2zjimZuqYsJzBl1BPGoxIE&s=DCuRg7RBQ6WoC9-e0IXLeTJLAWpGgjTG8Pwkb2FbjD0&e=> :


End of Mainstream Maintenance:

2020-11-30

 

 

IQ 16.1 SP03 and later has a feature called Zone Maps.  One of the reasons that your loads get slower is because of HG indexes.  We’ve made strides with these over the years, but they can still get in the way.  We introduced Zone Maps as a way to help with large tables to the point where HG indexes may not be needed.  If you reduce the number of HG indexes, you reduce the need to shard the table.

 

Sharding will always come with a performance hit as the data has to be stitched back together at runtime for each user.  With some users and queries, the impact may be negligible.  For others, it may be a significant hit.

 

Another approach would be to do what you propose with a current and history table.  The difference being this.  Data is always loaded into CURRENT.  And there is a view that joins CURRENT and HISTORY.  The difference is that at some set interval, maybe hourly or daily, the CURRENT table data is moved into HISTORY and then removed from CURRENT.  This way most query operations are focused on the data in HISTORY and CURRENT is small so that loads are fast.  It does require a maintenance window to move the data, though.

 

Mark

 

Mark Mumy

SAP Platform and Technologies Global Center of Excellence

M +1 347-820-2136 | E  <mailto:mark.mumy at sap.com> mark.mumy at sap.com

My Blogs:  <https://urldefense.proofpoint.com/v2/url?u=https-3A__blogs.sap.com_author_markmumy_&d=DwMGaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=hvVkP5y30w3_fHBwXsoZ2pqbwQXietniE01Z666b0tE&m=574XgbVqNTENu3ZFuUIvW2zjimZuqYsJzBl1BPGoxIE&s=A9xlJjchiukY6iSl8O5VtAybhfWTSBev-HP6pDAcdVg&e=> https://blogs.sap.com/author/markmumy/

                      

 <https://urldefense.proofpoint.com/v2/url?u=https-3A__sap.na.pgiconnect.com_I825063&d=DwMGaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=hvVkP5y30w3_fHBwXsoZ2pqbwQXietniE01Z666b0tE&m=574XgbVqNTENu3ZFuUIvW2zjimZuqYsJzBl1BPGoxIE&s=K9CrsVNCrAoJZUOfr4FdhOhwKTFpJqd4azP8cjs_1Zw&e=> https://sap.na.pgiconnect.com/I825063

Conference tel: 18663127353,,8035340905#

 

From: "iqug-bounces at iqug.org" <iqug-bounces at iqug.org> on behalf of David Louie <David.Louie at blackrock.com>
Date: Thursday, August 15, 2019 at 2:51 PM
To: "markdmumy at gmail.com" <markdmumy at gmail.com>
Cc: IQ Group <iqug at dssolutions.com>
Subject: Re: [IQUG] [iqug] Sharding my largest IQ table approach

 

Hi Mark,

 

IQ 15.4 (but going to 16.0 next month).

 

Multiple users can see the data some with larger look back periods than others.  Adhoc so it can be anything.

 

We have data going back to 2000.

 

David

 

From: Mark Mumy [mailto:markdmumy at gmail.com] 
Sent: Thursday, August 15, 2019 11:34 AM
To: Louie, David <David.Louie at blackrock.com>
Cc: iqug at dssolutions.com
Subject: Re: [IQUG] [iqug] Sharding my largest IQ table approach

 

External Email: Use caution with links and attachments

How do you plan on using the data?   A user can see it all?  Only for a period of time, like 1 year at a time? 

 

Which IQ version?

Mark 

 

========================

 

Sent from my mobile device


On Aug 15, 2019, at 10:15, Louie, David <David.Louie at blackrock.com> wrote:

Hi Folks,

 

We need to bite the bullet and shard our largest IQ table ( 248 billion rows, 30 TB) as load table times have gotten painfully slow.

 

So we will create a <table>_history and <table>_current,  our loader will load into the <table>_current> and we will create a new view <table> which does <table_history> UNION ALL <table>_current.

 

The <table>_history will be the copy of the entire table so we will start with an empty <table>_current (hence reaping the benefits of the smaller footprint and faster loads)

 

My question is from a view creation and IQ optimizer/performance perspective is this approach fine or would you recommend breaking the <table>_history into additional shards ( <table>_year for example)?

 

Thanks

David

 

 

This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.

_______________________________________________
IQUG mailing list
IQUG at iqug.org
http://iqug.org/mailman/listinfo/iqug <https://urldefense.proofpoint.com/v2/url?u=http-3A__iqug.org_mailman_listinfo_iqug&d=DwMFaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=hvVkP5y30w3_fHBwXsoZ2pqbwQXietniE01Z666b0tE&m=jPeuuWL3l5KwKBQa6_y5kLsluSbOaKnp4Jkz56UWBac&s=US3yxUzi7QKYWZdu4WHisu2OSSvL4F4o0LcaBX9_asY&e=> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://iqug.org/pipermail/iqug/attachments/20191009/8e1ecfeb/attachment-0001.html>


More information about the IQUG mailing list