这是一个网友的问题,10201的windows环境,非归档,无备份,数据库open的时候报常见的ORA-00600 4193错误。
这个错误很常见,我们也分析过多次,这里再次来看下网友这里的情况。其中alert log如下:
Fri Nov 07 13:58:18 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov 07 13:58:18 2014
SMON: enabling cache recovery
Fri Nov 07 13:58:19 2014
Errors in file e:oracleproduct10.2.0adminorcludumporcl_ora_7420.trc:
ORA-00600: 内部错误代码, 参数: [4193], [65], [71], [], [], [], [], []
Fri Nov 07 13:58:21 2014
Doing block recovery for file 1 block 415
Block recovery from logseq 6910, block 3 to scn 262520099
Fri Nov 07 13:58:21 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 6910 Reading mem 0
Mem# 0 errs 0: E:ORACLEPRODUCT10.2.0ORADATAORCLREDO03.LOG
Block recovery stopped at EOT rba 6910.5.16
Block recovery completed at rba 6910.5.16, scn 0.262520098
Doing block recovery for file 1 block 9
Block recovery from logseq 6910, block 3 to scn 262520097
Fri Nov 07 13:58:21 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 6910 Reading mem 0
Mem# 0 errs 0: E:ORACLEPRODUCT10.2.0ORADATAORCLREDO03.LOG
Block recovery completed at rba 6910.5.16, scn 0.262520098
Fri Nov 07 13:58:21 2014
Errors in file e:oracleproduct10.2.0adminorcludumporcl_ora_7420.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00607: 当更改数据块时出现内部错误
ORA-00600: 内部错误代码, 参数: [4193], [65], [71], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
对于ORA-00600 4193错误,Oracle docs是这样解释的:
ERROR:
ORA-600 [4193] [a] [b]
VERSIONS:
versions 6.0 to 10.1
DESCRIPTION:
A mismatch has been detected between Redo records and Rollback (Undo)
records.
We are validating the Undo block sequence number in the undo block against
the Redo block sequence number relating to the change being applied.
This error is reported when this validation fails.
通过查看网友传的alert log信息,发现这哥们进行了大量的操作,几乎把所有的恢复动作都试了一遍。如下:
System parameters with non-default values:
processes = 300
sessions = 335
event = 10513 trace name context forever,level 2 : 10512 trace name context forever,level 1: 10511 trace name context forever,level 2: 10510 trace name context forever,level 1
sga_max_size = 1619001344
__shared_pool_size = 150994944
__large_pool_size = 8388608
__java_pool_size = 8388608
__streams_pool_size = 0
sga_target = 1619001344
control_files = E:ORACLEPRODUCT10.2.0ORADATAORCLCONTROL01.CTL, E:ORACLEPRODUCT10.2.0ORADATAORCLCONTROL02.CTL, E:ORACLEPRODUCT10.2.0ORADATAORCLCONTROL03.CTL
db_block_size = 8192
__db_cache_size = 1442840576
compatible = 10.2.0.1.0
db_file_multiblock_read_count= 16
db_recovery_file_dest = E:oracleproduct10.2.0/flash_recovery_area
db_recovery_file_dest_size= 2147483648
_allow_resetlogs_corruption= TRUE
undo_management = MANUAL
undo_tablespace = UNDOTBS1
_smu_debug_mode = 4
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=orclXDB)
job_queue_processes = 10
audit_file_dest = E:ORACLEPRODUCT10.2.0ADMINORCLADUMP
background_dump_dest = E:ORACLEPRODUCT10.2.0ADMINORCLBDUMP
user_dump_dest = E:ORACLEPRODUCT10.2.0ADMINORCLUDUMP
core_dump_dest = E:ORACLEPRODUCT10.2.0ADMINORCLCDUMP
db_name = orcl
open_cursors = 300
pga_aggregate_target = 203423744
我们可以看到他使用了event 来屏蔽smon的回滚段,使用了隐含参数强制打开,使用了undo参数来修改表空间。据说还使用了强制
offline 回滚段的一些参数。
实际上,针对这个问题,我们先不管怎么解决,首先我们需要分析为什么Oracle这里会报这个错误?
首先有几个问题:
1) Oracle 在open的过程之中是执行什么sql报错的?
2) 为什么会报错
3)如果强制屏蔽回滚段是否有影响?
对于第一个问题,很简单,我们搜索trace就很容易定位到是这个SQL在执行时报错的:
ORA-00600: 内部错误代码, 参数: [4193], [65], [71], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
进一步搜索,我们还可以定位到Oracle在执行这个递归SQL时,是在操作什么回滚段的时候报错的(这里是US#表示回滚段编号):
Cursor#5(050D0F84) state=BOUND curiob=60F2AE90
curflg=d fl2=0 par=050D0E84 ses=7AB23D28
sqltxt(7A734404)=update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
hash=9caba1288112094d5553173dd30cd6c3
parent=0CF83F1C maxchild=01 plk=0EEB3834 ppn=n
cursor instantiation=60F2AE90
child#0(7A7342C0) pcs=0CF84120
clk=0EED3A90 ci=0CF83774 pn=0E493BC4 ctx=0C686938
kgsccflg=0 llk[60F2AE94,60F2AE94] idx=0
xscflg=e0100666 fl2=d100400 fl3=4022218c fl4=0
Bind bytecodes
Opcode = 5 Bind Rpi Scalar Sql In (not out) Nocopy
Offsi = 36, Offsi = 0
......
Opcode = 1 Unoptimized
Offsi = 36, Offsi = 240
kkscoacd
Bind#0
oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
kxsbbbfp=7a7346ba bln=32 avl=09 flg=09
value="_SYSSMU1$"
Bind#1
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=60f26c6c bln=24 avl=02 flg=05
value=2
..........
value=1
Bind#12
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=60f26c90 bln=22 avl=02 flg=05
value=1
Frames pfr 60F27024 siz=4832 efr 60F2706C siz=4820
Cursor frame dump
enxt: 3.0x00000710 enxt: 2.0x00000104 enxt: 1.0x00000ac0
pnxt: 2.0x00000004 pnxt: 1.0x00000008
kxscphp 05471F28 siz=1000 inu=0 nps=416
kxscehp 05472450 siz=1000 inu=0 nps=744
----------------------------------------
很明显,我们可以看到,本质上Oracle是在对回滚段_SYSSMU1$ 进行update时出现问题了。
现在我们来回答第2个问题,Oracle为什么会报错呢? 我们再来看下这个ORA-00600错误:
ORA-00600: 内部错误代码, 参数: [4193], [65], [71], [], [], [], [], []
根据文档的解释,这个错误的意思是redo record的seq和undo record的seq不匹配导致。
那么这里的65和71到底是什么意思呢 ? 这个错误的格式是这样: ORA-600 [4193] [a] [b]
a 即 65,表示undo record seq
b 即 71,表示redo record seq.
那么Oracle这里为什么会得出一个不一致的结果呢?很明显,65 是不等于71的。
从网友提供的Trace文件,我们可以看到这样一段信息:
ktudb redo: siz: 252 spc: 6372 flg: 0x0012 seq: 0x0047 rec: 0x08
xid: 0x0000.026.00000052
ktubl redo: slt: 38 rci: 0 opc: 11.1 objn: 15 objd: 15 tsn: 0
Undo type: Regular undo Begin trans Last buffer split: No
Temp Object: No
Tablespace Undo: No
0x00000000 prev ctl uba: 0x0040019f.0047.07
prev ctl max cmt scn: 0x0000.0f92fecd prev tx cmt scn: 0x0000.0f92fecf
很明显,这是redo相关信息,这里的seq 为0×0047,转换之后即为71. 原来,这就是ora-00600错误的b 值的来源.
那么ora-00600错误的a值,65又是哪儿来的呢 ? 从前面的UBA信息,我们可以知道,这个事务对应的undo block是:0x0040019f
那么我们来看下这个undo block中的内容是什么样的? 搜索block地址,我们发现这是一个system的block,显然是system 回滚段
的block,如下:
Block after image is corrupt:
buffer tsn: 0 rdba: 0x0040019f (1/415)
scn: 0x0000.0f468845 seq: 0x01 flg: 0x04 tail: 0x88450201
frmt: 0x02 chkval: 0x911f type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x08DB6000 to 0x08DB8000
8DB6000 0000A202 0040019F 0F468845 04010000 [[email protected].....]
.........
8DB7FF0 2E3004C1 332E02C1 323202C1 88450201 [..0....3..22..E.]
********************************************************************************
UNDO BLK:
xid: 0x0000.043.00000047 seq: 0x41 cnt: 0x22 irb: 0x22 icl: 0x0 flg: 0x0000
Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset
---------------------------------------------------------------------------
0x01 0x1f2c 0x02 0x1e30 0x03 0x1d34 0x04 0x1c38 0x05 0x1b3c
0x06 0x1a40 0x07 0x1944 0x08 0x1848 0x09 0x174c 0x0a 0x1650
0x0b 0x1554 0x0c 0x1458 0x0d 0x135c 0x0e 0x1260 0x0f 0x1164
0x10 0x1068 0x11 0x0f6c 0x12 0x0e70 0x13 0x0d74 0x14 0x0c78
0x15 0x0b7c 0x16 0x0a80 0x17 0x09c4 0x18 0x0908 0x19 0x084c
0x1a 0x0790 0x1b 0x06d4 0x1c 0x0618 0x1d 0x055c 0x1e 0x04a0
0x1f 0x03e4 0x20 0x02e8 0x21 0x01ec 0x22 0x00f0
......
.......
*-----------------------------
* Rec #0x7 slt: 0x22 objn: 15(0x0000000f) objd: 15 tblspc: 0(0x00000000)
* Layer: 11 (Row) opc: 1 rci 0x00
Undo type: Regular undo Begin trans Last buffer split: No
Temp Object: No
Tablespace Undo: No
rdba: 0x00000000
*-----------------------------
uba: 0x0040019f.0041.06 ctl max scn: 0x0000.0f3d6a95 prv tx scn: 0x0000.0f3d6a97
txn start scn: scn: 0x0000.0f44c775 logon user: 0
prev brb: 4194714 prev bcl: 0
KDO undo record:
KTB Redo
op: 0x04 ver: 0x01
op: L itl: xid: 0x0000.031.00000047 uba: 0x0040019f.0041.06
flg: C--- lkc: 0 scn: 0x0000.0f44c774
KDO Op code: URP row dependencies Disabled
xtype: XA flags: 0x00000000 bdba: 0x0040006a hdba: 0x00400069
itli: 1 ispac: 0 maxfr: 4863
tabn: 0 slot: 3(0x3) flag: 0x2c lock: 0 ckix: 93
ncol: 17 nnew: 12 size: 0
col 1: [ 9] 5f 53 59 53 53 4d 55 33 24
col 2: [ 2] c1 02
col 3: [ 2] c1 03
col 4: [ 2] c1 2a
col 5: [ 6] c5 03 39 11 3a 05
col 6: [ 1] 80
col 7: [ 4] c3 13 4c 39
col 8: [ 4] c3 04 52 5b
col 9: [ 1] 80
col 10: [ 2] c1 03
col 11: [ 2] c1 02
col 16: [ 2] c1 02
我们根据前面的UBA:uba: 0x0040019f.0047.07 定位到该事务的信息应该是undo block的第7个record中,当我们定位到第7个record时,
我们可以看到,这里的seq其实却是41,转换为10进制就是65. 这也就是ORA-00600这个错误的由来.
最后我们来回答第3个问题,那么就如果强制屏蔽回滚段是否有影响呢?很明显,网友这里的对象是obj 15,这是一个核心对象。
强制的屏蔽回滚段肯定是不妥的。其实处理方法有很多种,针对类似的问题我已经在道森Oracle培训的特殊恢复课程中讲过多次了。