前几日有11.2.0.3套库出现了性能问题,这里简单记录, 当时只是表现几个会话的同一条INSERT SQL语句出现了较高的latch: undo global data 等待,latch miss是 ktudba: KSLBEGIN ,同时还有BBS 等待。
SQL> @snapper ash 5 1 all
Sampling SID all with interval 5 seconds, taking 1 snapshots...
-- Session Snapper v4.22 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
---------------------------------------------------------------------------------------------------------------
ActSes %Thread | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS
---------------------------------------------------------------------------------------------------------------
44.00 (4400%) | 1 | 4rwaxk5gtg1t7 | 2 | latch: undo global data | Other
38.60 (3860%) | 1 | 4rwaxk5gtg1t7 | 0 | latch: undo global data | Other
14.00 (1400%) | 1 | 4rwaxk5gtg1t7 | 2 | buffer busy waits | Concurrency
12.40 (1240%) | 1 | 4rwaxk5gtg1t7 | 0 | buffer busy waits | Concurrency
6.00 (600%) | 1 | 4rwaxk5gtg1t7 | 2 | ON CPU | ON CPU
5.60 (560%) | 1 | 4dxyn82n0zspx | 0 | ON CPU | ON CPU
4.80 (480%) | 1 | 4rwaxk5gtg1t7 | 0 | ON CPU | ON CPU
3.40 (340%) | 1 | 4dxyn82n0zspx | 0 | latch: cache buffers chains | Concurrency
3.20 (320%) | 1 | | | ON CPU | ON CPU
1.40 (140%) | 1 | | | log file sync | Commit
-- End of ASH snap 1, end=2016-06-28 10:14:13, seconds=6, samples_taken=5, AAS=177.6
SQL> @sqlid 4rwaxk5gtg1t7 %
Show SQL text, child cursors and execution stats for SQLID 4rwaxk5gtg1t7 child %
HASH_VALUE PLAN_HASH_VALUE CH# SQL_TEXT
---------- --------------- ---- ----------------------------------------------------------------------------
1603766055 0 0 INSERT INTO T (REGION, TELNUM, ...... UDPTYPE) VALUES ( :REGION, :TELNUM, .... :UDPTYPE)
CH# PARENT_HANDLE OBJECT_HANDLE PLAN_HASH PARSES H_PARSES EXECUTIONS FETCHES ROWS_PROCESSED ROWS_PER_FETCH CPU_SEC CPU_SEC_EXEC ELA_SEC LIOS PIOS SORTS USERS_EXECUTING
---- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- -------------- -------------- ---------- ------------ ---------- ---------- ---------- ---------- ---------------
0 C0000014D7859A70 C0000014D786C268 0 58 26 12772 0 12772 9.32 .000729721 28.339461 246717 4523 0 0
TIP:
这里使用了Poder的脚本, 从AWR当时也是latch: undo global data 为数据库TOP 1 wait event, 查看了该语句就是一个insert values 简单语句,执行频率从AWR 得到的是半小时27000次,执行速度还可以,但是为什么会这么多的latch undo 呢?
LATCH: UNDO GLOBAL DATA
This latch serializes the access to the Undo (aka Rollback) segment information in the SGA.
Every time a session wants to know about the state of the Undo Segments, it has to get this latch.
该事件确认是与undo 相关,通常会伴有enq: US – contention 或 latch undo 的location是ktusm_stealext_2或ktusm_stealext: KSLBEGIN,对于stealext是因查找可用的segment 未发现offline 或过期的从未过期的窃取空间的调用, 那样增加UNDO表空间就可以得到缓解。
下面确认当时的情况是不是这种现象。
SQL> select * from v$latch_misses where SLEEP_COUNT>0 and parent_name like 'undo global data%' order by sleep_count;
PARENT_NAME WHERE NWFAIL_COUNT SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT LOCATION
---------------------- ------------------------ ------------ ----------- ------------- -------------- ----------------------
undo global data kturar: KSLBEGIN 0 1 11 0 kturar: KSLBEGIN
undo global data ktugnb: KSLBEGIN 0 1 4 0 ktugnb: KSLBEGIN
undo global data ktusmasp: ktugd_tuux 0 3 28 0 ktusmasp: ktugd_tuux
undo global data kturdc: KSLBEGIN 0 3 0 0 kturdc: KSLBEGIN
undo global data ktur set recov bit 0 4 26 0 ktur set recov bit
undo global data ktusmofxu_1: kslgetl 0 8 0 0 ktusmofxu_1: kslgetl
undo global data ktuswr_3 0 36 4 0 ktuswr_3
undo global data ktudax:child 0 36 16 0 ktudax:child
undo global data ktubnd:child 0 106 108 0 ktubnd:child
undo global data ktufrbs_2 0 571 1292 0 ktufrbs_2
undo global data ktudnx:child 0 1125 1044 0 ktudnx:child
undo global data ktucof: at start 0 2003 0 0 ktucof: at start
undo global data ktusmupst: KSLBEGIN 0 2118 9203 0 ktusmupst: KSLBEGIN
undo global data kturax 0 5107 0 0 kturax
undo global data ktudba: KSLBEGIN 0 43637217 43636922 0 ktudba: KSLBEGIN
NOTE:
这里latch: undo global data 的location是ktudba: KSLBEGIN 和上面所说的情况不太一致, 后来我也尝试增加了一些UNDO 结果一样问题没有什么缓解。 要先搞明白ktudba: KSLBEGIN是什么?
KSLBEGIN 是一种get latch时的非常快的宏,在取得latch后调用KSLEND;
ktudba 像是一种从usn 转换data block address的宏
@ Latch Miss Sources
@ ——————
@ undo global data ktudba: KSLBEGIN 0 117,271,003 115,396,045
@ .
@ .
@ ktu.h:
@ ——
@ #define ktudba(usn, dbap, scnp, arspp)
@ ktuGetUsegDba((usn), (dbap), (scnp), (arspp),
@ NULLP(ub2), NULLP(ksqn), NULLP(uword))
@ .
@ ktuGetUsegDba – Kernel Transaction Undo convert from usn to DBA
检查当时的数据库参数
SQL> @p undo
NAME VALUE
---------------------------------------- ----------------------------------------
_lm_spare_undo 0
_flush_undo_after_tx_recovery TRUE
_gc_undo_affinity FALSE
_gc_undo_block_disk_reads TRUE
_gc_tsn_undo_affinity TRUE
_undo_block_compression TRUE
undo_management AUTO
undo_tablespace undotbs1
_collect_undo_stats TRUE
_undo_debug_mode 0
_verify_undo_quota FALSE
_in_memory_undo TRUE <<<<<<<<<<<<<<<
_undo_autotune FALSE <<<<<<<<<<<<<<<
_highthreshold_undoretention 4294967294 <<<<<<<<<<<<<<<
undo_retention 7200 <<<<<<<<<<<<<<<
_undo_debug_usage 0
_disable_undo_tablespace_alerts FALSE
_smon_undo_seg_rescan_limit 10
_optimizer_undo_cost_change 11.2.0.3
_optimizer_undo_changes FALSE
_enable_default_undo_threshold TRUE
Note:
As the “_undo_autotune” instance parameter is set to FALSE, the undo retention is not automatically calculated based on the instance activity
UNDO 问题通常在_undo_autotune= false 和_highthreshold_undoretention value large时问题会较明显, 下面检查undo segment的使用情况.
SQL> col status format a20
SQL> col cnt format 999,999,999 head "How Many?"
SQL>
SQL> select status, count(*) cnt from dba_rollback_segs
2 group by status
3 /
STATUS How Many?
-------------------- ------------
OFFLINE 2,053
ONLINE 1,126
SQL> select tablespace_name,status,sum(bytes)/1024/1024 mb from DBA_UNDO_EXTENTS group by tablespace_name,status;
TABLESPACE_NAME STATUS MB
------------------------------ -------------------- ----------
UNDOTBS1 ACTIVE 34585.125 <<<<<<<<<<<<<<
UNDOTBS1 EXPIRED 13843.75
UNDOTBS1 UNEXPIRED 12030.375
UNDOTBS2 ACTIVE 72.5625
UNDOTBS2 EXPIRED 3000.25
UNDOTBS2 UNEXPIRED 7369.5625
SQL> @a
A-Script: Display active sessions...
COUNT(*) SQL_ID STATE EVENT
---------- ------------- ------- ----------------------------------------------------------------
63 4rwaxk5gtg1t7 WAITING latch: undo global data <<<<<<<<<<<<
14 4rwaxk5gtg1t7 ON CPU On CPU / runqueue
13 4rwaxk5gtg1t7 WAITING buffer busy waits
12 4dxyn82n0zspx WAITING latch: cache buffers chains
7 4dxyn82n0zspx ON CPU On CPU / runqueue
2 0cgk3a3kjpg8m WAITING db file sequential read
2 5upb68p4xd0h4 WAITING SQL*Net message from dblink
1 4rwaxk5gtg1t7 WAITING latch: cache buffers chains
1 5dt8ha6p0uuc4 WAITING read by other session
Note:
当时的actve undo seg 使用30多G引起注意,这里oltp类型库,平时都是些小事务,当时的业务量也不高, 查询undo是被哪个用了
SQL>select b.sid,--会话编号
b.SERIAL#,
b.USERNAME,
b.MACHINE,
b.sql_id,
a.START_TIME,--事务启动时间
a.USED_UBLK, --使用的UNDO块数
a.USED_UREC, --使用的UNDO记录条数
a.START_UBAFIL, --使用的UNDO文件号
a.START_UBABLK --使用的UNDO起始块号
from v$transaction a, v$session b
where a.ses_addr = b.saddr;
SID SERIAL# USERNAME MACHINE SQL_ID START_TIME USED_UBLK USED_UREC START_UBAFIL START_UBABLK
------ ---------- --------- ---------- ------------- -------------------- ---------- ---------- ------------ ------------
10881 55703 ANBOB kinjk1 gd29uxmzbmj2g 06/28/16 11:48:30 1 9 250 1504704
9065 60993 S111008 HBYDYKWIN 4abmpbbmt8fv5 06/28/16 11:43:50 1 1 248 1573967
7626 52347 ANBOB qmweb81 06/28/16 10:48:00 1 1 245 84451
11357 46961 COMMON qdyya1 0jfw9hdzvvk9t 06/28/16 11:48:14 107 2416 586 6550
6731 20111 ANBOB qdtza1 0pymq63h93cr7 06/27/16 14:56:29 2290118 26084282 248 650766 <<<<<< 16101 36593 ANBOB qmweb54 06/28/16 09:45:03 1 1 250 620376 12821 24875 ANBOB qmyy31401 fd72s60zy0dk9 06/28/16 11:48:30 1 17 250 1504600 20855 2013 ANBOB qmyy31401 06/28/16 11:48:32 1 1 245 1249799 4705 53047 ANBOB kmyy17 0cgk3a3kjpg8m 06/28/16 11:48:32 2 49 248 700088 15569 32497 SONGZHZH HBYDYKWIN 06/28/16 10:46:57 1 1 248 237094 10910 55127 SHILQ HBYDYKWIN 06/28/16 11:07:14 1 1 250 1285765 2986 32313 ANBOB qmweb53 06/28/16 11:37:08 1 1 250 354863 50 rows selected. SQL> @usid 6731
USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID OPID CPID SQL_ID HASH_VALUE LASTCALL STATUS SADDR PADDR TADDR LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------------------ ------------- ----------- ---------- -------- ---------------- ---------------- ---------------- -----------------
ANBOB '6731,20111' 936346265 grid qdtza1 (TNS V1-V3) 11564 4478 17667 0pymq63h93cr7 3767644903 75139 ACTIVE C000001604A7DFB8 C0000016138BF9C0 C0000015C3F1EA20 20160627 14:56:29
SQL> @xi 0pymq63h93cr7 %
eXplain the execution plan for sqlid 0pymq63h93cr7 child %...
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 0pymq63h93cr7, child number 0
DELETE FROM "ANBOB"."T" "A1"
Note:
现在已经明确是sid=6731的会话做了一个delete 全表的操作, 该表正是上面insert的表,这个事务目前使用了2290118个undo block,
当然后期我们确认了是个人用户通过DBLINK 执行的该表删除的误操作,确认后杀掉了该会话,估算了回滚剩余时间。上面的latch: undo的等待事件消失。
SET SERVEROUT ON
SQL> declare
2 l_start number;
3 l_end number;
4 begin
5 select sum(ktuxesiz) into l_start from x$ktuxe where KTUXECFL ='DEAD';
6 dbms_lock.sleep(60);
7 select sum(ktuxesiz) into l_end from x$ktuxe where KTUXECFL ='DEAD';
8 dbms_output.put_line('time est Day:'|| round(l_end/(l_start -l_end)/60/24,2));
9 end;
10 /
PL/SQL procedure successfully completed.
SQL> set serveroutput on
SQL> /
time est Day:.04
SQL> select tablespace_name,status,sum(bytes)/1024/1024 mb from DBA_UNDO_EXTENTS group by tablespace_name,status;
TABLESPACE_NAME STATUS MB
------------------------------ --------- ----------
UNDOTBS1 ACTIVE 98.0625
UNDOTBS1 EXPIRED 14017.8125
UNDOTBS1 UNEXPIRED 45236.3125
UNDOTBS2 ACTIVE 78.5625
UNDOTBS2 EXPIRED 3022
UNDOTBS2 UNEXPIRED 6706.8125
迷雾城堡免广告 最新版v0.1.30
迷雾城堡免广告是一款非常好玩的模拟建造类手游,玩家无需看广告
鉴车大师免广告 安卓版v1.2.2
鉴车大师免广告是一款非常好玩的模拟类手游,玩家在游戏中不用看
从前有条街 安卓最新版v1.5
从前有条街是一款非常好玩的模拟经营类手游,玩家在游戏中将会进
我的世界源之界冰火魔龙 最新版v阿夜整合
我的世界源之界冰火魔龙模组整合包是一款像素风格的沙河模拟生存
假面骑士创骑腰带模拟器 安卓版v6
假面骑士创骑腰带模拟器是一个专为喜欢假面骑士的用户打造的变身