oracle记录一次存储异常数据库恢复后遗症ORA-600[kafspa:columnBuffer1]错误处理

作者:袖梨 2022-06-29


先说下前提,这个库以前是由于存储异常,找硬件厂商做了raid重组,然后我进行数据恢复的,恢复出来数据之后,应用厂商通过验证和补数据,然后迁移到另外一台机器做生产用的,这个库一直没有怎么看,最近检查数据库发现ORA-600[kafspa:columnBuffer1]错误,通过删除异常记录的方式解决.
数据库alert日志
Mon Aug 10 00:00:21 2015
LNS: Standby redo logfile selected for thread 1 sequence 617 for destination LOG_ARCHIVE_DEST_2
Mon Aug 10 00:00:33 2015
Errors in file D:APPADMINISTRATORdiagrdbmsxffxfftracexff_j002_6900.trc  (incident=146517):
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
Incident details in: D:APPADMINISTRATORdiagrdbmsxffxffincidentincdir_146517xff_j002_6900_i146517.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:APPADMINISTRATORdiagrdbmsxffxfftracexff_j002_6900.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_280"
ORA-20011: Approximate NDV failed:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
ORA-06512: at "SYS.DBMS_STATS", line 31228
分析日志发现
*** 2015-07-19 06:00:30.231
*** SESSION ID:(578.751) 2015-07-19 06:00:30.231
*** CLIENT ID:() 2015-07-19 06:00:30.231
*** SERVICE NAME:(SYS$USERS) 2015-07-19 06:00:30.231
*** MODULE NAME:(DBMS_SCHEDULER) 2015-07-19 06:00:30.231
*** ACTION NAME:(ORA$AT_OS_OPT_SY_220) 2015-07-19 06:00:30.231
 
Dump continued from file: D:APPADMINISTRATORdiagrdbmsxffxfftracexff_j001_4444.trc
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
 
========= Dump for incident 146142 (ORA 600 [kafspa:columnBuffer1]) ========
 
*** 2015-07-19 06:00:30.231
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=g0q33k8qtbcpd) -----
/* SQL Analyze(1) */ select /*+  full(t)    no_parallel(t) no_parallel_index(t)
 dbms_stats cursor_sharing_exact use_weak_name_resl
dynamic_sampling(0) no_monitoring no_substrb_pad
…………
to_char(substrb(dump(max("LIST_NO"),16,0,32),1,120)) from "CHF"."T_XIFENFEI" t
…………
对表进行收集统计信息
SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE)
;
BEGIN DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE); EN
D;
 
*
第 1 行出现错误:
ORA-20011: Approximate NDV failed: ORA-00600: 内部错误代码, 参数:
[kafspa:columnBuffer1], [2883], [1], [], [], [], [], [], [], [], [], []
ORA-06512: 在 "SYS.DBMS_STATS", line 24232
ORA-06512: 在 "SYS.DBMS_STATS", line 24332
ORA-06512: 在 line 1
 
 
SQL> desc "CHF"."T_XIFENFEI"
 名称                                      是否为空? 类型
 ----------------------------------------- -------- -----------------
 
 VISIT_DATE                                         DATE
…………
 GETDRUG_FLAG                                       VARCHAR2(2)
…………
通过上面的alert日志和trace文件以及人工收集统计信息,基本上可以定位是由于数据库自动收集统计信息进程在进行统计信息收集之时,对于”CHF”.”T_XIFENFEI”表进行收集统计信息由于某种错误,从而出现该错误.查询mos,发现此类问题主要是由于varchar2类型存储的数据长度超过了表定义长度.

通过验证官方所说
C:UsersAdministrator>exp "'/ as sysdba'" tables="CHF"."T_XIFENFEI" file
=y:/1.dmp log=y:/1.log
 
Export: Release 11.2.0.4.0 - Production on 星期四 8月 13 11:03:22 2015
 
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
 
 
连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Produc
tion
With the Partitioning, OLAP, Data Mining and Real Application Testing options
已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集
 
即将导出指定的表通过常规路径...
当前的用户已更改为 CHF
. . 正在导出表                    T_XIFENFEI
EXP-00015: 错误出现在行 1339552 (表 T_XIFENFEI, 列 GETDRUG_FLAG), 数据类型 1
EXP-00001: 数据字段截断 - 列长度 =2, 缓冲区大小 =2 实际大小 =17Errors in file :
OCI-21500: 内部错误代码, 参数: [kghfrempty:ds], [0x00652FCC8], [], [], [], [], [
], []
 
 
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
kgerinv_internal()+  CALL???  skgudmp()            000000000 006447680 000000000
 
139                                                006447680
kgerinv()+49         CALL???  kgerinv_internal()   000000001 000676B4D 0064985B0
 
                                                   000000000
kgerin()+49          CALL???  kgerinv()            000000018 000799612 000072000
 
                                                   000000000
kghnerror()+294      CALL???  kgerin()             006447680 00645092C 006447680
 
                                                   000000001
kghfrempty()+639     CALL???  kghnerror()          0000001F0 000000000
                                                   BE019800000000 7E01960000
kghgex()+1433        CALL???  kghfrempty()+368     000000000 00652CAD8 000000000
 
                                                   000000000
kghfnd()+808         CALL???  kghgex()             001004000 000000000 001BEDD10
 
                                                   001A7131C
kghalo()+610         CALL???  kghfnd()             00012C450 00012C4A0 000000000
 
                                                   006446FD0
kghgex()+445         CALL???  kghalo()             006494848 000000000 001BEDD10
 
                                                   00190A575
kghfnd()+808         CALL???  kghgex()             000000001 0000001A0 000000000
 
                                                   006493D68
kghalo()+610         CALL???  kghfnd()             000000000 006447680 0FFFFFFFF
 
                                                   006447680
kpuhhalo()+358       CALL???  kghalo()             000000000 000000178 07FFFFFFF
 
                                                   000000001
kpuertb_reallocTemp  CALL???  kpuhhalo()           00652C498 000003E84 001C0EA44
 
Buf()+192                                          000000000
kpuex_reallocTempBu  CALL???  kpuertb_reallocTemp  000004007 0018BA3BF 00012CAB0
 
f()+67                        Buf()                001AB296F
kpudefn()+347        CALL???  kpuex_reallocTempBu  00012CC38 001004000 001BEDD44
 
                              f()                  000000004
kpudfn()+1506        CALL???  kpudefn()            00012F3D0 000000004 006520044
 
                                                   000000000
OCIDefineByPos()+10  CALL???  kpudfn()             004327570 000000000 00012F3D0
 
2                                                  000000004
00000001400116E5     CALL???  OCIDefineByPos()     1043B9300 0043B92C0 0044002B8
 
                                                   004401394
000000014004AFC7     CALL???  00000001400113BA     00012F380 00012F0E0 000000068
 
                                                   14004B2B6
000000014001E784     CALL???  000000014004A37E     000013F30 140095A71 140097520
 
                                                   14009F540
00000001400027A7     CALL???  000000014001E39F     14009F838 00012FB5C 140097520
 
                                                   14009F540
000000014000102C     CALL???  0000000140001E2C     000000005 004327570
                                                   1D0D5749D21764D 000000000
000000014006BEF0     CALL???  000000014000100E     000130000 1AFBFE2D0D8
                                                   000000000 000000000
000000007748652D     CALL???  000000014006BDD0     000000000 000000000 000000000
 
                                                   000000000
00000000775BC521     CALL???  0000000077486520     000000000 000000000 000000000
 
                                                   000000000
 
 
call stack performance statistics:
total                  : 0.778000 sec
setup                  : 0.350000 sec
stack unwind           : 0.099000 sec
symbol translation     : 0.021000 sec
printing the call stack: 0.304000 sec
printing frame data    : 0.000000 sec
printing argument data : 0.000000 sec
 
 
----- End of Call Stack Trace -----
这里通过exp验证到数据在GETDRUG_FLAG列上有异常,本来定义列长度为2,可是实际数据长度为17,明显不符

通过plsql定位具体错误rowid
SQL> set serveroutput on
SQL> DECLARE
  2   TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  3   CURSOR c1 IS  select /*+index(t PK_T_XIFENFEI_BAK_NEW)*/ rowid from CHF.T_XIFENFEI t;
  4   r RowIDTab;
  5   rows  NATURAL := 20000;
  6   bad_rows number := 0 ;
  7   errors number;
  8   error_code number;
  9   myrowid rowid;
 10  BEGIN
 11   OPEN c1;
 12   LOOP
 13     FETCH  c1 BULK COLLECT INTO r LIMIT rows;
 14     EXIT WHEN r.count=0;
 15     BEGIN
 16      FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
 17       insert into CHF.T_XIFENFEI_new
 18       select /*+ ROWID(A) */ *
 19       from CHF.T_XIFENFEI A where rowid = r(i);
 20     EXCEPTION
 21     when OTHERS then
 22      BEGIN
 23       errors := SQL%BULK_EXCEPTIONS.COUNT;
 24       FOR err1 IN 1..errors LOOP
 25           error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
 26           myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
 27           bad_rows := bad_rows + 1;
 28           insert into system.had_rows values('CHF.T_XIFENFEI',myrowid, error_code);
 29       END LOOP;
 30       END;
 31     END;
 32    commit;
 33   END LOOP;
 34   commit;
 35   CLOSE c1;
 36   dbms_output.put_line('Total Bad Rows: '||bad_rows);
 37  END;
 38  /
Total Bad Rows: 1
 
PL/SQL 过程已成功完成。
 
 
SQL> SELECT row_id FROM  system.had_rows ;
 
 ROW_ID
 ------------------
 
 AAAT8wAAEAAAM29AAX
 
SQL> select * from  CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX';
select * from  CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX'
                        *
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kafspa:columnBuffer1], [2883], [1], [], [], [],
[], [], [], [], [], []
通过这里基本上可以定位到是该条rowid记录有问题,和业务进行沟通,确定该条记录可以删除(也不能访问,其实不删除也没用)

删除异常记录
SQL> delete from CHF.T_XIFENFEI WHERE ROWID='AAAT8wAAEAAAM29AAX';
 
已删除 1 行。
 
SQL> commit;
 
提交完成。
收集统计信息
SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI',CASCADE=>TRUE)
;
 
PL/SQL 过程已成功完成。
通过清理异常记录,数据库可以正常收集统计信息,未再报ORA-00600[kafspa:columnBuffer1]错误,故障较完美解决

补充几个现象
1. analyze table “CHF”.”T_XIFENFEI” estimate statistics; 分析表统计信息正常,但是dbms_stats收集报错(因为dbms_stats相当对于每个列进行了扫描,而analyze应该不是)
2. 在报ORA-00600[kafspa:columnBuffer1]的情况下,ctas依旧可以成功,但是普通插入不行(因为ctas相当加油append操作),因此在有些情况下,需要慎重append(特别是有逻辑坏块的时候)

相关文章

精彩推荐