oracle 执行计划改变导致数据库负载过高解决办法

作者:袖梨 2022-06-29

数据库主机负载
这里明显表现系统load 偏高,而且还在上升中;top的进程中,占用cpu都计划100%
top - 16:25:39 up 123 days,  1:42,  4 users,  load average: 46.19, 45.08, 43.93
Tasks: 1469 total,  28 running, 1439 sleeping,   0 stopped,   2 zombie
Cpu(s): 45.9%us,  1.1%sy,  0.0%ni, 47.1%id,  5.2%wa,  0.1%hi,  0.6%si,  0.0%st
Mem:  264253752k total, 262605260k used,  1648492k free,   413408k buffers
Swap: 33554424k total,   458684k used, 33095740k free, 67110504k cached
 
  PID USERPR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 2622 oracle    18   0  150g  34m  28m R 100.0  0.0  11:58.21 oracleq9db1 (LOCAL=NO)   
15881 oracle    21   0  150g  35m  28m R 100.0  0.0  72:14.20 oracleq9db1 (LOCAL=NO)   
17214 oracle    19   0  150g  38m  32m R 100.0  0.0  20:47.44 oracleq9db1 (LOCAL=NO)   
17705 oracle    21   0  150g  34m  28m R 100.0  0.0  27:33.00 oracleq9db1 (LOCAL=NO)   
 6110 oracle    19   0  150g  34m  28m R 99.8  0.0  12:33.34 oracleq9db1 (LOCAL=NO)    
 6876 oracle    19   0  150g  34m  27m R 99.5  0.0  12:43.90 oracleq9db1 (LOCAL=NO)    
17205 oracle    23   0  150g  34m  27m R 99.5  0.0  21:37.18 oracleq9db1 (LOCAL=NO)    
24629 oracle    20   0  150g  35m  29m R 99.5  0.0  28:10.62 oracleq9db1 (LOCAL=NO)    
26959 oracle    19   0  150g  34m  27m R 99.5  0.0  47:17.87 oracleq9db1 (LOCAL=NO)    
 7655 oracle    18   0  150g  30m  25m R 98.5  0.0   2:28.45 oracleq9db1 (LOCAL=NO)    
16377 oracle    18   0  150g  34m  28m R 98.5  0.0  36:07.11 oracleq9db1 (LOCAL=NO)    
24637 oracle    18   0  150g  37m  30m R 98.2  0.0  26:39.15 oracleq9db1 (LOCAL=NO)    
 6106 oracle    21   0  150g  40m  33m R 97.2  0.0  11:37.75 oracleq9db1 (LOCAL=NO)    
28785 oracle    18   0  150g  34m  28m R 96.9  0.0  24:03.29 oracleq9db1 (LOCAL=NO)    
24278 oracle    17   0  150g  31m  26m S 96.5  0.0   3:15.51 oracleq9db1 (LOCAL=NO)    
24283 oracle    17   0  150g  33m  28m S 96.5  0.0   6:25.26 oracleq9db1 (LOCAL=NO)    
 7098 oracle    18   0  150g  32m  27m R 94.6  0.0   2:20.22 oracleq9db1 (LOCAL=NO)    
 6874 oracle    17   0  150g  34m  28m R 87.0  0.0  12:02.92 oracleq9db1 (LOCAL=NO)    
18206 oracle    16   0  150g  34m  27m R 86.1  0.0  16:11.28 oracleq9db1 (LOCAL=NO)    
 7096 oracle    17   0  150g  29m  24m R 85.4  0.0   3:01.72 oracleq9db1 (LOCAL=NO)    
数据库等待事件
 SID EVENT
---------- ----------------------------------------------------------------
 183 gc cr request
 185 latch: cache buffers chains
 239 db file sequential read
 292 gc cr request
 406 gc cr request
 410 db file sequential read
 463 gc current request
 572 gc buffer busy acquire
 575 gc buffer busy acquire
 577 latch: cache buffers chains
 629 db file sequential read
 747 gc cr request
 919 latch: cache buffers chains
 974 gc cr request
1033 log file sync
1141 db file parallel write
1153 gc cr request
1199 db file sequential read
1378 db file sequential read
1495 gc cr request
1540 db file parallel write
1547 gc buffer busy acquire
1662 gc cr request
1715 gc buffer busy acquire
1770 SQL*Net message to client
1830 latch: cache buffers chains
1884 gc cr request
2113 db file sequential read
2173 db file sequential read
2229 rdbms ipc reply
2292 db file sequential read
2341 db file sequential read
2348 gc cr request
2460 gc cr request
2632 gc cr request
2684 gc cr request
2687 db file sequential read
2749 db file sequential read
2913 gc cr request
2967 db file sequential read
3038 gc cr request
3087 SQL*Net message to client
3089 gc cr request
3194 db file sequential read
3195 db file sequential read
3309 latch: cache buffers chains
3371 gc cr request
3485 gc cr request
3535 gc cr request
可以这里有很多gc cr request等待和cache buffers chains等待,第一反应就是很可能系统某条sql执行计划不正确导致逻辑读剧增.

分析awr报告
故障时候的逻辑读 top sql可以看出来cdwjdd67x27mh的sql逻辑读异常大

awr1
分析正常时间点awr报告中cdwjdd67x27mh逻辑读情况
awr2
通过对比可以发现两个awr中,cdwjdd67x27mh 语句执行次数差不多,但是单次逻辑读从800多突变为34000多,增加了40多倍

cdwjdd67x27mh 语句分析
执行计划
17:36:08 sys@Q9DB>select * from table(dbms_xplan.display_awr('cdwjdd67x27mh'));
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID cdwjdd67x27mh
--------------------
select * from ( select          *                               from                            GE_TAOBAO_BILL           o
                 WHERE 1=1                       and
o.CREATED_TIME >= :1            and o.CREATED_TIME < :2                   
                                and o.REC_SITE_ID = :3                               
             and o.STATUS_ID = :4            
             and o.SERVICE_TYPE = :5 ) where rownum
<= 36000
 
Plan hash value: 647855111
 
----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                           |       |       |     5 (100)|          |        |
|   1 |  COUNT STOPKEY                       |                           |       |       |            |          |        |
|   2 |   FILTER                             |                           |       |       |            |          |        |
|   3 |    PARTITION RANGE ITERATOR          |                           |     1 |   455 |     5   (0)| 00:00:01 |   KEY |   KEY |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| GE_TAOBAO_BILL            |     1 |   455 |     5   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      INDEX RANGE SCAN                | IDX_TAOBAO_BILL_CR_ISDP_S |     1 |       |     4   (0)| 00:00:01 |   KEY |   KEY |
----------------------------------------------------------------------------------------------------------------------------------
 
SQL_ID cdwjdd67x27mh
--------------------
select * from ( select          *                               from                            GE_TAOBAO_BILL           o
                 WHERE 1=1                       and
o.CREATED_TIME >= :1            and o.CREATED_TIME < :2                   
                                and o.REC_SITE_ID = :3                               
             and o.STATUS_ID = :4  
             and o.SERVICE_TYPE = :5 ) where rownum
<= 36000
 
Plan hash value: 2979024279
 
---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                          |       |       |    13 (100)|          |       ||
|   1 |  COUNT STOPKEY                       |                          |       |       |            |          |       ||
|   2 |   FILTER                             |                          |       |       |            |          |       ||
|   3 |    PARTITION RANGE ITERATOR          |                          |     1 |   455 |    13   (0)| 00:00:01 |   KEY |   KEY |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| GE_TAOBAO_BILL           |     1 |   455 |    13   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      INDEX RANGE SCAN                | IDX_TAOBAO_BILL_CR_REC_S |     4 |       |     9   (0)| 00:00:01 |   KEY |   KEY |
---------------------------------------------------------------------------------------------------------------------------------
这里可以发现cdwjdd67x27mh在数据库中有两个执行计划

通过awr数据,看执行计划的变化情况
10:50:29 sys@Q9DB>select a.INSTANCE_NUMBER,a.snap_id,a.sql_id,a.plan_hash_value,b.begin_interval_time
10:50:30   2  from dba_hist_sqlstat a, dba_hist_snapshot b
10:50:30   3  where sql_id ='cdwjdd67x27mh'
10:50:30   4  and a.snap_id = b.snap_id
10:50:30   5  order by instance_number,begin_interval_time  desc;
 
INSTANCE_NUMBER    SNAP_ID SQL_ID        PLAN_HASH_VALUE BEGIN_INTERVAL_TIME
--------------- ---------- ------------- --------------- ----------------------------------------
              1      20719 cdwjdd67x27mh      2979024279 12-MAY-14 05.00.12.753 PM
              1      20719 cdwjdd67x27mh       647855111 12-MAY-14 05.00.12.753 PM
              1      20719 cdwjdd67x27mh      2979024279 12-MAY-14 05.00.12.702 PM
              1      20719 cdwjdd67x27mh       647855111 12-MAY-14 05.00.12.702 PM
              1      20718 cdwjdd67x27mh       647855111 12-MAY-14 04.00.24.197 PM
              1      20718 cdwjdd67x27mh      2979024279 12-MAY-14 04.00.24.197 PM
              1      20718 cdwjdd67x27mh       647855111 12-MAY-14 04.00.24.172 PM
              1      20718 cdwjdd67x27mh      2979024279 12-MAY-14 04.00.24.172 PM
              1      20717 cdwjdd67x27mh       647855111 12-MAY-14 03.11.22.251 PM
              1      20717 cdwjdd67x27mh      2979024279 12-MAY-14 03.11.22.251 PM
              1      20717 cdwjdd67x27mh       647855111 12-MAY-14 03.11.22.188 PM
              1      20717 cdwjdd67x27mh      2979024279 12-MAY-14 03.11.22.188 PM
              ………………
              1      20696 cdwjdd67x27mh      2979024279 11-MAY-14 07.00.07.142 PM
              1      20696 cdwjdd67x27mh      2979024279 11-MAY-14 07.00.07.105 PM
              1      20695 cdwjdd67x27mh      2979024279 11-MAY-14 06.00.12.771 PM
              1      20695 cdwjdd67x27mh      2979024279 11-MAY-14 06.00.12.707 PM
              1      20694 cdwjdd67x27mh      2979024279 11-MAY-14 05.00.48.249 PM
              1      20694 cdwjdd67x27mh      2979024279 11-MAY-14 05.00.48.170 PM
              1      20693 cdwjdd67x27mh      2979024279 11-MAY-14 04.00.37.841 PM
              …………
              2      20719 cdwjdd67x27mh      2979024279 12-MAY-14 05.00.12.753 PM
              2      20719 cdwjdd67x27mh      2979024279 12-MAY-14 05.00.12.702 PM
              2      20718 cdwjdd67x27mh      2979024279 12-MAY-14 04.00.24.197 PM
              2      20718 cdwjdd67x27mh      2979024279 12-MAY-14 04.00.24.172 PM
              2      20717 cdwjdd67x27mh      2979024279 12-MAY-14 03.11.22.251 PM
              2      20717 cdwjdd67x27mh      2979024279 12-MAY-14 03.11.22.188 PM
这里可以清晰看到,执行计划在1节点中发生震荡(两个执行计划都有选择),仔细看两个执行计划,会发现就是查询的时候所使用的index不同而已,继续分析两个index
IDX_TAOBAO_BILL_CR_ISDP_S      CREATED_TIME                            1
IDX_TAOBAO_BILL_CR_ISDP_S      IS_DISPART                              2
IDX_TAOBAO_BILL_CR_ISDP_S      STATUS_ID                               3
 
IDX_TAOBAO_BILL_CR_REC_S       REC_SITE_ID                             1
IDX_TAOBAO_BILL_CR_REC_S       CREATED_TIME                            2
IDX_TAOBAO_BILL_CR_REC_S       STATUS_ID                               3
分析数据分布情况(其他列省略)
TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT LAST_ANAL
------------------------------ ------------------------------ ------------ ---------
GE_TAOBAO_BILL                 CREATED_TIME                      287080448 05-MAY-14
GE_TAOBAO_BILL                 REC_SITE_ID                           13176 05-MAY-14
GE_TAOBAO_BILL                 IS_DISPART                                2 05-MAY-14
GE_TAOBAO_BILL                 STATUS_ID                                 7 05-MAY-14
这里可以看到数据库在正常的时候使用IDX_TAOBAO_BILL_CR_REC_S没有问题,但是在某些情况下选择使用IDX_TAOBAO_BILL_CR_ISDP_S这个就有问题,导致逻辑读过高,其实对于该sql语句,是因为IDX_TAOBAO_BILL_CR_REC_S 索引不合理导致,如果创建CREATED_TIME,REC_SITE_ID,STATUS_ID,就不会因为传输的值范围不同而使用IDX_TAOBAO_BILL_CR_ISDP_S的情况。针对该情况,因为表非常大,短时间内无法修改index,只能考虑使用 sql profile固定执行计划

sql profile固定计划
SQL>@coe.sql cdwjdd67x27mh
 
Parameter 1:
SQL_ID (required)
 
 
 
PLAN_HASH_VALUE AVG_ET_SECS
--------------- -----------
     2979024279        .011
      647855111       5.164
 
Parameter 2:
PLAN_HASH_VALUE (required)
 
Enter value for 2: 2979024279
 
Values passed:
~~~~~~~~~~~~~
SQL_ID         : "cdwjdd67x27mh"
PLAN_HASH_VALUE: "2979024279"
 
 
Execute coe_xfr_sql_profile_cdwjdd67x27mh_2979024279.sql
on TARGET system in order to create a custom SQL Profile
with plan 2979024279 linked to adjusted sql_text.
 
 
COE_XFR_SQL_PROFILE completed.
SQL>@coe_xfr_sql_profile_cdwjdd67x27mh_2979024279.sql
SQL>DECLARE
  2  sql_txt CLOB;
  3  h       SYS.SQLPROF_ATTR;
  4  BEGIN
  5  sql_txt := q'[
  6  select * from ( select
  7                  *
  8 
  9                  from
 10 
 11                  GE_TAOBAO_BILL
 12 
 13                  o
 14 
 15                   WHERE 1=1
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27                                  and o.CREATED_TIME >= :1
 28 
 29 
 30                                  and o.CREATED_TIME < :2
 31 
 32 
 33                                  and o.REC_SITE_ID = :3
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49                                  and o.STATUS_ID = :4
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64                                  and o.SERVICE_TYPE = :5 ) where rownum <= 36000
 65  ]';
 66  h := SYS.SQLPROF_ATTR(
 67  q'[BEGIN_OUTLINE_DATA]',
 68  q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
 69  q'[OPTIMIZER_FEATURES_ENABLE('11.2.0.3')]',
 70  q'[DB_VERSION('11.2.0.3')]',
 71  q'[FIRST_ROWS]',
 72  q'[OUTLINE_LEAF(@"SEL$F5BB74E1")]',
 73  q'[MERGE(@"SEL$2")]',
 74  q'[OUTLINE(@"SEL$1")]',
 75  q'[OUTLINE(@"SEL$2")]',
 76  q'[INDEX_RS_ASC(@"SEL$F5BB74E1" "O"@"SEL$2" ("GE_TAOBAO_BILL"."REC_SITE_ID" "GE_TAOBAO_BILL"."CREATED_TIME" "GE_TAOBAO_BILL"."STATUS_ID"))]',
 77  q'[END_OUTLINE_DATA]');
 78  :signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
 79  DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
 80  sql_text    => sql_txt,
 81  profile     => h,
 82  name        => 'coe_cdwjdd67x27mh_2979024279',
 83  description => 'coe cdwjdd67x27mh 2979024279 '||:signature||'',
 84  category    => 'DEFAULT',
 85  validate    => TRUE,
 86  replace     => TRUE,
 87  force_match => FALSE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ );
 88  END;
 89  /
 
PL/SQL procedure successfully completed.
 
SQL>WHENEVER SQLERROR CONTINUE
SQL>SET ECHO OFF;
 
            SIGNATURE
---------------------
 18414135509058398362
 
 
... manual custom SQL Profile has been created
 
 
COE_XFR_SQL_PROFILE_cdwjdd67x27mh_2979024279 completed
固定执行计划后,系统负载恢复正常
load恢复正常,单个进程占用cpu 也正常
top - 18:25:29 up 123 days,  4:25,  3 users,  load average: 17.59, 16.72, 16.10
Tasks: 1559 total,   6 running, 1551 sleeping,   0 stopped,   2 zombie
Cpu(s):  6.2%us,  1.2%sy,  0.0%ni, 84.3%id,  7.4%wa,  0.1%hi,  0.9%si,  0.0%st
Mem:  264253752k total, 262395300k used,  1858452k free,   305928k buffers
Swap: 33554424k total,   467420k used, 33087004k free, 66811412k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                 
 3010 oracle    15   0  150g  26m  21m S 23.6  0.0   0:08.60 oracleq9db2 (LOCAL=NO)  
26936 oracle    15   0  150g  33m  28m S 22.9  0.0   0:12.53 oracleq9db2 (LOCAL=NO)  
 9691 oracle    15   0  150g  34m  28m D 21.0  0.0   0:48.73 oracleq9db2 (LOCAL=NO)  
 9694 oracle    15   0  150g  34m  28m S 17.4  0.0   0:45.74 oracleq9db2 (LOCAL=NO)  
14366 oracle    15   0  150g  33m  28m R 17.4  0.0   0:25.11 oracleq9db2 (LOCAL=NO)  
 2471 oracle    -2   0  150g 180m  37m S 16.7  0.1  10795:55 ora_lms3_q9db2          
 2463 oracle    -2   0  150g 180m  37m S 15.7  0.1  10648:38 ora_lms1_q9db2          
 2459 oracle    -2   0  150g 180m  37m S 14.8  0.1  10726:42 ora_lms0_q9db2          
 2467 oracle    -2   0  150g 180m  36m S 14.8  0.1  10980:33 ora_lms2_q9db2          
13324 oracle    15   0  150g  31m  26m S 14.1  0.0   0:22.75 oracleq9db2 (LOCAL=NO)  
16740 oracle    15   0  150g  34m  28m R 13.4  0.0   0:43.85 oracleq9db2 (LOCAL=NO)  
 1908 oracle    15   0  150g  27m  22m S 11.8  0.0   0:03.95 oracleq9db2 (LOCAL=NO)  
 9689 oracle    15   0  150g  33m  27m S 11.8  0.0   0:46.01 oracleq9db2 (LOCAL=NO)  
19410 oracle    15   0  150g  31m  26m S 11.8  0.0   0:43.18 oracleq9db2 (LOCAL=NO)  
14102 oracle    15   0  150g  31m  25m S 11.5  0.0   2:17.23 oracleq9db2 (LOCAL=NO)  
 1914 oracle    15   0  150g  29m  24m S 11.1  0.0   0:04.77 oracleq9db2 (LOCAL=NO)  
31106 oracle    15   0  150g  34m  28m S  9.8  0.0   1:24.85 oracleq9db2 (LOCAL=NO)  
31139 oracle    15   0  150g  30m  24m S  9.8  0.0   1:21.75 oracleq9db2 (LOCAL=NO)  
 2498 oracle    15   0  150g  42m  35m S  7.9  0.0   3838:43 ora_lgwr_q9db2          
28108 oracle    15   0  150g  36m  29m S  7.9  0.0   0:18.19 oracleq9db2 (LOCAL=NO)  
 2392 oracle    15   0  150g  35m  17m S  7.5  0.0   5304:57 ora_lmd0_q9db2    

 

联系:手机(13429648788) QQ(107644445)

链接:http://www.xifenfei.com/5309.html

标题:执行计划改变导致数据库负载过高

作者:惜分飞©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

 

相关文章

精彩推荐