[IQUG] Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

Steve Shen sshen at sscinc.com
Mon Sep 10 09:26:29 MST 2018


Hi Mark,

Thank you very much.

Regards
Steve Shen
From: Mumy, Mark [mailto:mark.mumy at sap.com]
Sent: Monday, September 10, 2018 11:56 AM
To: Steve Shen; Baker, Chris; 'iqug at iqug.org'
Cc: 'Mark Mumy'; 'Ron Watkins'
Subject: Re: Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

Max_Plans_Cached controls how many query plans a connection can keep around.

Using the proper manuals….

https://help.sap.com/viewer/a894a54d84f21015b142ffe773888f8c/16.1.3.0/en-US/3bd7ffa56c5f10149bd9de084889e1b1.html?q=max_plans_cached

Over time, a given connection may acquire cached plans that can optimize performance for the execution of a number of services. The number of cached plans can be controlled using the max_plans_cached option.

https://help.sap.com/viewer/a898e08b84f21015969fa437e89860c8/16.1.3.0/en-US/3bc695906c5f10148aaabe9d8f52d0e5.html?q=max_plans_cached
The maximum number of execution plans to be stored in a cache. This property corresponds to the max_plans_cached option.

Mark

Mark Mumy
Strategic Technology Incubation Group
SAP Global Center of Excellence |  SAP
M +1 347-820-2136 | E mark.mumy at sap.com<mailto:mark.mumy at sap.com>
My Blogs: https://blogs.sap.com/author/markmumy/

https://sap.na.pgiconnect.com/I825063
Conference tel: 18663127353,,8035340905#

From: Steve Shen <sshen at sscinc.com<mailto:sshen at sscinc.com>>
Date: Monday, September 10, 2018 at 10:46 AM
To: Mark Mumy <mark.mumy at sap.com<mailto:mark.mumy at sap.com>>, Chris Baker <c.baker at sap.com<mailto:c.baker at sap.com>>, "iqug at iqug.org<mailto:iqug at iqug.org>" <iqug at iqug.org<mailto:iqug at iqug.org>>
Cc: "markdmumy at gmail.com<mailto:markdmumy at gmail.com>" <markdmumy at gmail.com<mailto:markdmumy at gmail.com>>, Ron Watkins <rwatkins at dssolutions.com<mailto:rwatkins at dssolutions.com>>
Subject: RE: Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

It’ confusing that there was a PUBLIC database option called max_plans_cached:
PUBLIC
         max_plans_cached
         40

“max_plans_cached” was also a connection property:

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1604/doc/html/saiq-connection-properties.html

Regards,
Steve Shen
From: Steve Shen
Sent: Monday, September 10, 2018 11:28 AM
To: 'Mumy, Mark'; Baker, Chris; iqug at iqug.org<mailto:iqug at iqug.org>
Cc: 'Mark Mumy'; 'Ron Watkins'
Subject: RE: Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

I am suspecting that the performance bottleneck was caused by the following database option:

max_plans_cached: The current database option value was 20.

This number seemed too low.  I could reset the value to 40 the following way successfully; but it’s a static change and it will need the IQ to be bounced:

sybase at ykt1eiquat37: /siqsoftware/sybase/release ==> isql -S EIQCGSUAT -U DBA
Password:
1> set option PUBLIC.max_plans_cached=40
2> go
1>

Why couldn’t I reset it the following way successfully?

sybase at ykt1eiquat37: /siqsoftware/sybase/release ==> isql -S EIQCGSUAT -U DBA
Password:
1> call sa_server_option ('max_plans_cached', '40')
2> go
Msg 155, Level 15, State 0:
SQL Anywhere Error -200: Invalid option 'max_plans_cached' -- no PUBLIC setting
exists
1> call sa_server_option ('PUBLIC.max_plans_cached', '40')
2> go
Msg 155, Level 15, State 0:
SQL Anywhere Error -200: Invalid option 'PUBLIC.max_plans_cached' -- no PUBLIC
setting exists

Can I change it dynamically without rebooting the IQ server?

Thanks and regards,
Steve Shen
From: Mumy, Mark [mailto:mark.mumy at sap.com]
Sent: Sunday, September 09, 2018 5:36 PM
To: Steve Shen; Baker, Chris; iqug at iqug.org<mailto:iqug at iqug.org>
Cc: 'Mark Mumy'; 'Ron Watkins'
Subject: Re: Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

And ASE is not IQ.  So what works well for ASE may not work well with IQ.  In fact, I am fairly certain it won’t.  ASE uses significantly smaller IOs than IQ and pushes a lot less data than IQ does.

I go back to your original point.  You said that this worked under IQ 15.4 on raw devices.  Two big changes.  To determine what the issue is, you either need to drop back to a version that worked well (and use filesystems) or revert to raw devices and test the current version of IQ.

Mark

Mark Mumy
Strategic Technology Incubation Group
SAP Global Center of Excellence |  SAP
M +1 347-820-2136 | E mark.mumy at sap.com<mailto:mark.mumy at sap.com>
My Blogs: https://blogs.sap.com/author/markmumy/

https://sap.na.pgiconnect.com/I825063
Conference tel: 18663127353,,8035340905#

From: Steve Shen <sshen at sscinc.com<mailto:sshen at sscinc.com>>
Date: Saturday, September 8, 2018 at 4:22 PM
To: Mark Mumy <mark.mumy at sap.com<mailto:mark.mumy at sap.com>>, Chris Baker <c.baker at sap.com<mailto:c.baker at sap.com>>, "iqug at iqug.org<mailto:iqug at iqug.org>" <iqug at iqug.org<mailto:iqug at iqug.org>>
Cc: "markdmumy at gmail.com<mailto:markdmumy at gmail.com>" <markdmumy at gmail.com<mailto:markdmumy at gmail.com>>, Ron Watkins <rwatkins at dssolutions.com<mailto:rwatkins at dssolutions.com>>
Subject: RE: Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

Hi Mark,

I missed reading your reply email until today.  Thank you very much.

It’s very difficult to convince the senior management to not use file systems because SSD based file systems  on hosting ASE production servers at Linux seemed working very well.  We used to have poor performance issues while using SAN based file systems at Solaris version 10; but it’s not so anymore using SSD based file systems at Linux, Kernel 2.6 or Kernel 2.7.

I am very sure that Red Hat 7.4 using xfs file systems was using Direct I/O because we did specify ASE devices with “direction=true” and the ASE log file did indicate that DIRECTIO was effective.
The following were just one example on a given ASE server:

00:0000:00000:00002:2018/09/07 23:08:07.90 kernel  Virtual device 86 started using asynchronous (with DIRECTIO) i/o.
00:0000:00000:00002:2018/09/07 23:08:07.90 kernel  Virtual device 87 started using asynchronous (with DIRECTIO) i/o.

Besides the DIRECTIO was automatically taken care of by the latest system patch at IQ version 16.0 at Red Hat 7.4.

I will speak to UNIX SAs next week on your recommendations below:
“
You could also try rebuilding the filesystem with larger block sizes.  Use a block size that matches the IQ block size (not page size).
mkfs.xfs -b size= 8192 /dev/sdXXXXXX
•         Used 16384 assuming you had the default IQ layout of a 128k page which forces an 8k block
•         Adjust if your page size is higher

Measure filesystem fragmentation to see if this is the cause:
xfs_db -c frag -r /dev/sdXXXXXX

Expanding Chris’ mount options:
•         nodiratime,relatime,noatime
“

What frustrated me was UNIX SA managers sometimes could not tell me the block size on SAN or SSD.  And they told me that those SSD disks seemed one big disk to them.  They had no way to dedicate some SSD disks for data store and some other SSD disks for temp store.  They claimed that having one file system vs. multiple file systems made no difference in terms of performances. We used to segregate ASE data file system from the log file system while using SAN disks; but nowadays it’s just one file system for data and temporary database devices using SSD! Were these statements really true?

Most of the file systems mentioned on Linux or Solaris technical documents were based on traditional local disks or SAN disks.  They might not be applicable to SSDs; but I am not the expert in SSDs.  I am waiting for your feedbacks:  Does it really matter to have one big file system with multiple files created on data and temp store files vs. having multiple file systems on data and temp store files in SSD storage environments?  For example I can have one big file system having 1 SYSTEM_MAIN, 3 SYSTEM_TEMP and 10 USER_MAIN data store files.  Or I can create 14 file systems.  Will the idea of having 14 file systems definitely perform better than one file system in SSD storage devices?  BTW, SSDs were shared for ASE servers and IQ servers.

Regards,
Steve Shen
From: Mumy, Mark [mailto:mark.mumy at sap.com]
Sent: Friday, September 07, 2018 12:54 PM
To: Baker, Chris; Steve Shen; iqug at iqug.org<mailto:iqug at iqug.org>
Cc: 'Mark Mumy'; 'Ron Watkins'
Subject: Re: Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

Just to add to Chris’ comments….

You have to make sure that the filesystem support direct IO and that IQ is able to open the devices with direct IO.  This “should” bypass the cache.  It sounds like it is either the filesystem block size that is blocking things up or that the filesystem cache is being used and it getting in the way of things.

As a sanity check, can you run this same test on IQ with raw devices?  If you can do that, it will tell us where the issue is.  We think it is the filesystem, and that will certainly show if it is.  Without a test on raw devices it is total speculation.

You could also try rebuilding the filesystem with larger block sizes.  Use a block size that matches the IQ block size (not page size).
mkfs.xfs -b size= 8192 /dev/sdXXXXXX
•         Used 16384 assuming you had the default IQ layout of a 128k page which forces an 8k block
•         Adjust if your page size is higher

Measure filesystem fragmentation to see if this is the cause:
xfs_db -c frag -r /dev/sdXXXXXX

Expanding Chris’ mount options:
•         nodiratime,relatime,noatime

Also, tell the senior manager that “supported” is not the same as performant.  We support single row inserts, too.  Would they want to change all your bulk loading to single INSERT statements?  No!  While supported, it will not perform at the same level.

Lastly…. How is the filesystem made up?  Does it point to just 1 LUN?  Multiple LUNs on the SAN?  Was it created as concatenated or striped?  All of this can severely alter the performance of the filesystem.

Mark

Mark Mumy
Strategic Technology Incubation Group
SAP Global Center of Excellence |  SAP
M +1 347-820-2136 | E mark.mumy at sap.com<mailto:mark.mumy at sap.com>
My Blogs: https://blogs.sap.com/author/markmumy/

https://sap.na.pgiconnect.com/I825063
Conference tel: 18663127353,,8035340905#

From: Chris Baker <c.baker at sap.com<mailto:c.baker at sap.com>>
Date: Friday, September 7, 2018 at 11:35 AM
To: Steve Shen <sshen at sscinc.com<mailto:sshen at sscinc.com>>, "iqug at iqug.org<mailto:iqug at iqug.org>" <iqug at iqug.org<mailto:iqug at iqug.org>>
Cc: "markdmumy at gmail.com<mailto:markdmumy at gmail.com>" <markdmumy at gmail.com<mailto:markdmumy at gmail.com>>, Mark Mumy <mark.mumy at sap.com<mailto:mark.mumy at sap.com>>, Ron Watkins <rwatkins at dssolutions.com<mailto:rwatkins at dssolutions.com>>
Subject: RE: Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

IMHO and AFAIK this is one of the performance penalties of using a file system.  The filesystem often a smaller ‘segment size’ than the IQ block/page size.  The net effect is to break up a single IQ I/O into multiple filesystem I/Os.

You are not stating how the filesystems are mounted or what type they are, or whether you have large enough IQ main and temp caches as well – which can help.

Not much you can do about fs performance, but you can look at the Linux elevator scheduler for the fs block device (if it is not on a SAN or NFS, etc – sometimes the scheduler is not available under some types of block devices) and set it to ‘noop’ or ‘deadline’ to see if it helps.

For EXT3/4 filesystems, you can also look at turning off journaling, which can slow the filesystem I/O down considerably.  e.g. for EXT4, consider mounting the filesystem with additional ‘noatime,nodiratime’ options.  I think there may also be other options to turn off journaling, that can be provided by some of the other respondents.

Chris

Chris Baker | Database Engineering Evangelist | CERT | PI HANA Platform Data Management | SAP
T +1 416-226-7033<tel:+1%20416-226-7033> | M +1 647-224-2033<tel:+1%20647-224-2033> | TF +1 866-716-8860<tel:+1%20866-716-8860>
SAP Canada Inc. 445 Wes Graham Way, Waterloo, N2L 6R2<x-apple-data-detectors://17/1>
c.baker at sap.com<mailto:c.baker at sap.com> | www.sap.com<http://www.sap.com/>

https://sap.na.pgiconnect.com/I826572
Conference tel: 1-866-312-7353,,9648565377#<tel:1-866-312-7353,,9648565377%23>

From: Steve Shen <sshen at sscinc.com<mailto:sshen at sscinc.com>>
Sent: Friday, September 7, 2018 12:23 PM
To: iqug at iqug.org<mailto:iqug at iqug.org>
Cc: 'Mark Mumy' <markdmumy at gmail.com<mailto:markdmumy at gmail.com>>; Mumy, Mark <mark.mumy at sap.com<mailto:mark.mumy at sap.com>>; Baker, Chris <c.baker at sap.com<mailto:c.baker at sap.com>>; 'Ron Watkins' <rwatkins at dssolutions.com<mailto:rwatkins at dssolutions.com>>; Steve Shen <sshen at sscinc.com<mailto:sshen at sscinc.com>>
Subject: Slower and slower performances at IQ version 16.0 while many more concurrent jobs were run in parallel

Today topic:
                Changing from fast to slower and slower IQ 16.0 performance at Linux Red Hat 7.4 using file systems

Let me share my IQ environments with you:


  1.  The O/S is Red Hat version 7.4.
  2.  he IQ version is “SAP IQ/16.0.110.2805/11355/P/sp11.20/Enterprise Linux64 - x86_64 - 2.6.18-194.el5/64bit/2018-06-15 02:57:16”.
  3.  The senior manager overrode my concerns on its potential performance penalty and decided using file systems because SAP manuals documented that file systems were supported at IQ.
  4.  The file systems used SSDs.

My clients were complaining about the following:
1.       The average reporting job duration on the similar SQL statements was 395 seconds while 5 concurrent jobs were run in parallel.
2.       It became 475 seconds while 11 concurrent jobs were run in parallel.
3.       It became 666 seconds while 20 concurrent jobs were run in parallel.
4.       Many more concurrent parallel runs will take place in the next two weeks!

In other words this IQ performance ran very fast when there were few jobs running concurrently; but the performance was declining heavily while many more concurrent jobs were run in parallel!

My key clients could not accept this kind of declining performance at IQ version 16.0.

I have turned off both OS_File_Cache_Buffering and OS_File_Cache_Buffering_Tempdb.  I also set IQ_USE_DIRECTIO=1 even though it’s not necessary at Red Hat 7.4.

What else can I do to make the first IQ instance at Linux using file systems to perform much better in many concurrent parallel runs?

Thank you.

Regards,

Steve Shen

This email with all information contained herein or attached hereto may contain confidential and/or privileged information intended for the addressee(s) only. If you have received this email in error, please contact the sender and immediately delete this email in its entirety and any attachments thereto.
This email with all information contained herein or attached hereto may contain confidential and/or privileged information intended for the addressee(s) only. If you have received this email in error, please contact the sender and immediately delete this email in its entirety and any attachments thereto.
This email with all information contained herein or attached hereto may contain confidential and/or privileged information intended for the addressee(s) only. If you have received this email in error, please contact the sender and immediately delete this email in its entirety and any attachments thereto.
This email with all information contained herein or attached hereto may contain confidential and/or privileged information intended for the addressee(s) only. If you have received this email in error, please contact the sender and immediately delete this email in its entirety and any attachments thereto.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://iqug.org/pipermail/iqug/attachments/20180910/8e1e3ff8/attachment-0001.html>


More information about the IQUG mailing list