数据库版本和平台信息
数据库版本为10.2.0.1版本,而且是32位的win 2003 sp2之上
ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows Server 2003 Version V5.2 Service Pack 2
CPU : 2 - type 586, 1 Physical Cores
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:2608M/3990M, Ph+PgF:4511M/5871M, VA:1242M/2047M
Instance name: orcl
数据库报大量ORA-600[kjhn_post_ha_alert0-862]错误
数据库的mmon进程报大量ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []错误
Wed Jun 03 21:50:40 2015
Restarting dead background process MMON
MMON started with pid=11, OS id=3804
Wed Jun 03 21:50:43 2015
Errors in file e:oracleproduct10.2.0adminorclbdumporcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
Wed Jun 03 21:50:49 2015
Errors in file e:oracleproduct10.2.0adminorclbdumporcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
Wed Jun 03 21:55:44 2015
Errors in file e:oracleproduct10.2.0adminorclbdumporcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
Wed Jun 03 21:55:49 2015
Errors in file e:oracleproduct10.2.0adminorclbdumporcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
Wed Jun 03 22:00:40 2015
Thread 1 advanced to log sequence 476
Current log# 1 seq# 476 mem# 0: E:ORACLEPRODUCT10.2.0ORADATAORCLREDO01.LOG
Wed Jun 03 22:00:44 2015
Errors in file e:oracleproduct10.2.0adminorclbdumporcl_mmon_3804.trc:
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
查询对应trace文件发现
ORA-00600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [], [] , [], [], [], []
Current SQL statement for this session:
BEGIN :success := dbms_ha_alerts_prvt.check_ha_resources; END;
人工执行该过程
SQL> var success varchar2
SQL> begin
2 :success := sys.dbms_ha_alerts_prvt.check_ha_resources;
3 end;
4 /
PL/SQL procedure successfully completed.
SQL> print success
SUCCESS
--------------------------------
N
通过查询相关资料得到如下说明
@ This check is triggered with FAN enabled at this instance and it seems to be
@ associated with a startup action. From the procedure itself which is called
@ this is a run-once MMON (startup) action which supports instance down
@ notification reliability. It does the folowing a) registers the current
@ instance incarnation in recent_resource_incarnations$ if it's not already
@ there b) deletes recent_resource_incarnations$ records that don't apply to
@ this database. They may, e.g., have been copied from seed db or from a former
@ DataGuard primary c) scans recent_resource_incarnations$ for instance
@ incarnations that are no longer alive, and submits instance down alerts for
@ them . If all is good then return 'Y' else 'N' (or error) if there is a
@ failure. That failure is to get back to MMON, so that it may retry this
@ action later. In the local instance I get a 'Y' but in the customer's system
@ it fails with a 'N' which seems related to the ORA-600 assert.
@ This function is kjhn_post_ha_alert0() which is internal and does the real work of
@ posting HA alerts. It is used by both kjhn_post_ha_alert and
@ kjn_post_ha_alert_plsql. Its parameters are basically the same as those of
@ kjhn_post_ha_alert,other than the fact that it uses individual parameters
@ rather than the more easily extensible structure. Also the parameters passed
@ to it are the instance_name and the host_name which is the kernelized
@ implementation for posting HA alerts. Without actually having the arguments
@ the guess is that either the host_name or the instance_name raised in the
@ assert is null which triggered it.
mmon进程尝试调用相关程序,然后无法得出正确值,返回N,然后会一直尝试,如果不能得到返回Y,就会一直报ORA-600,错误.通过上述的三种情况来说,都和recent_resource_incarnations$表有关系.
该故障原因是由于:mmon在调用kjhn_post_ha_alert0函数在执行的时候,如果发现参数host_name或者instance_name为null,就会报该错误出来.
处理方法
This problem has been documented as Bug 5173066 REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS.
The bug is fixed in 11.1.0.6. A workaround is available for the problem.
该bug在11.1.0.6中得以修复
To implement the workaround, please execute the following steps as the SYS user:
1. Collect the following information and spool it to a file for your records.
a. output of select * from v$instance
b. show parameter instance_name
c. set pages 1000
d. select * from recent_resource_incarnations$
2. Create a backup table of recent_resource_incarnations$.
SQL> create table recent_resource_inc$bk as select * from recent_resource_incarnations$;
3. Truncate recent_resource_incarnations$. Be sure to do this while the instance is up and running.
Do not issue this statement if a shutdown is pending.
SQL> truncate table recent_resource_incarnations$;
4. Perform a clean shutdown, followed by a startup.
具体参考:
ORA-600 [kjhn_post_ha_alert0-862] Continuously Repeated in the Alert Log (Doc ID 401640.1)
Bug 5173066 : REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS
Bug 5173066 : REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS
Hdr: 5173066 10.2.0.1.0 RDBMS 10.2.0.1.0 RAC PRODID-5 PORTID-207 ORA-600
Abstract: REPEATED ORA-600 [KJHN_POST_HA_ALERT0-862] FROM MMON PROCESS
*** RSERNA 04/19/06 12:06 pm ***
TAR:
----
@SR:5322693.992
PROBLEM:
--------
Tue Apr 18 18:17:58 2006
Completed: alter database open
Tue Apr 18 18:18:33 2006
Errors in file c:oracleproduct10.2.0adminsfabdumpsfa_mmon_1384.trc:
ORA-600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [],
[], [], [], [], []
Tue Apr 18 18:18:59 2006
Errors in file c:oracleproduct10.2.0adminsfabdumpsfa_mmon_1384.trc:
ORA-600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [],
[], [], [], [], []
Tue Apr 18 18:20:31 2006
Errors in file c:oracleproduct10.2.0adminsfabdumpsfa_mmon_1384.trc:
ORA-600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [],
[], [], [], [], []
Tue Apr 18 18:20:35 2006
Errors in file c:oracleproduct10.2.0adminsfabdumpsfa_mmon_1384.trc:
ORA-600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [],
[], [], [], [], []
Tue Apr 18 18:25:32 2006
Errors in file c:oracleproduct10.2.0adminsfabdumpsfa_mmon_1384.trc:
ORA-600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [],
[], [], [], [], []
Tue Apr 18 18:25:35 2006
Errors in file c:oracleproduct10.2.0adminsfabdumpsfa_mmon_1384.trc:
ORA-600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [],
[], [], [], [], []
generating large trace files.
DIAGNOSTIC ANALYSIS:
--------------------
Customer already has several laptops with the same scenario.
Personal Oracle Database 10g Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 1 - type 586
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:728M/1271M, Ph+PgF:2550M/3033M, VA:1746M/2047M
Instance name: sfa
*** SERVICE NAME:(SYS$BACKGROUND) 2006-04-18 18:18:33.736
*** SESSION ID:(106.1) 2006-04-18 18:18:33.736
*** 2006-04-18 18:18:33.736
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [],
[], [], [], [], []
Current SQL statement for this session:
BEGIN :success := dbms_ha_alerts_prvt.check_ha_resources; END;
----- PL/SQL Call Stack -----
object line object
handle number name
6959CF70 418 package body SYS.DBMS_HA_ALERTS_PRVT
6959CF70 552 package body SYS.DBMS_HA_ALERTS_PRVT
6959CF70 305 package body SYS.DBMS_HA_ALERTS_PRVT
692627B4 1 anonymous block
Executing the procedure manually gives:
SQL> var success varchar2
SQL> begin
2 :success := sys.dbms_ha_alerts_prvt.check_ha_resources;
3 end;
4 /
PL/SQL procedure successfully completed.
SQL> print success
SUCCESS
--------------------------------
N
WORKAROUND:
-----------
None.
No documentation found on what the procedure does, but
assuming the success='N' is why the errors are occuring.
RELATED BUGS:
-------------
REPRODUCIBILITY:
----------------
As soon as database starts.
TEST CASE:
----------
None.
STACK TRACE:
------------
ksedst ksedmp ksfdmp kgerinv kgeasnmierr kjhn_post_ha_alert spefcmpa
spefmccallstd pextproc peftrusted psdexsp rpiswu2 psdextp pefccal
pefcal pevm_FCAL pfrinstr_FCAL pfrrun_no_tool pfrrun plsql_run peicnt
kkxexe opiexe kpoal8 opiodr kpoodr xupirtrc upirtrc kpurcsc
kpuexecv8 kpuexec OCIStmtExecute kjhn_mmon_action 459
kjhn_check_ha_reso urces kebm_ronce_dispatc her kebm_ronce_execute
ksb_run_timeout_an d_intr_action kebm_open_action ksbabs kebm_mmon_main
ksbrdp opirip VInfreq__opidrv sou2o opimai_real opimai
SUPPORTING INFORMATION:
-----------------------
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------
DIAL-IN INFORMATION:
--------------------
IMPACT DATE:
------------
*** RSERNA 04/19/06 12:44 pm *** (CHG: Sta->16)
*** RSERNA 04/19/06 12:44 pm ***
@ RDA and alert_traces have been uploaded.
*** SSRAO 04/19/06 09:03 pm *** (CHG: Asg->SSRAO)
*** SSRAO 05/09/06 01:49 am ***
@ BDE Screening in progress. Thanks
*** SSRAO 05/09/06 01:51 am ***
@ Reviewing code for the assert. Thanks
*** SSRAO 05/09/06 01:54 am ***
@ The assert condition is
@ --
@ KSEORAASSERTNM(instance_name && *instance_name &&
@ host_name && *host_name,
@ OERINM("kjhn_post_ha_alert0-862"));
@ -- which is the instance_name and the host_name are set within the code. This
@ function is kjhn_post_ha_alert0() which is internal and does the real work of
@ posting HA alerts. It is used by both kjhn_post_ha_alert and
@ kjn_post_ha_alert_plsql. Its parameters are basically the same as those of
@ kjhn_post_ha_alert,other than the fact that it uses individual parameters
@ rather than the more easily extensible structure. Also the parameters passed
@ to it are the instance_name and the host_name which is the kernelized
@ implementation for posting HA alerts. Without actually having the arguments
@ the guess is that either the host_name or the instance_name raised in the
@ assert is null which triggered it. Still reviewing. Thanks
*** SSRAO 05/09/06 01:54 am *** (CHG: SubComp->RAC)
*** SSRAO 05/09/06 02:21 am ***
@ From he stack it seems we have
@ --
@ kjhn_mmon_action kjhn_check_ha_resources kebm_ronce_dispatcher
@ kebm_ronce_execute ksb_run_timeout_and_intr_action
@ --
@ which is called from within the MMON routine for checking the HA resources.
@ Statement executed through PLSQL is
@ --
@ BEGIN :success := dbms_ha_alerts_prvt.check_ha_resources; END;
@ ..
@ 04/13/06 13:52:27 >ERROR: exception at
@ dbms_ha_alerts_prvt.check_ha_resources637: SQLCODE -600,ORA-600: internal
@ error
@ code, arguments: [kjhn_post_ha_alert0-862], [], [], [], [], [], [], []
@ 04/13/06 13:52:27 >parameter dump for dbms_ha_alerts_prvt.check_ha_resources
@ 04/13/06 13:52:27 > - local_db_unique_name (SFA)
@ 04/13/06 13:52:27 > - local_db_domain (==N/A==)
@ 04/13/06 13:52:27 > - rows deleted (0)
@ --
@ .
@ This check is triggered with FAN enabled at this instance and it seems to be
@ associated with a startup action. From the procedure itself which is called
@ this is a run-once MMON (startup) action which supports instance down
@ notification reliability. It does the folowing a) registers the current
@ instance incarnation in recent_resource_incarnations$ if it's not already
@ there b) deletes recent_resource_incarnations$ records that don't apply to
@ this database. They may, e.g., have been copied from seed db or from a former
@ DataGuard primary c) scans recent_resource_incarnations$ for instance
@ incarnations that are no longer alive, and submits instance down alerts for
@ them . If all is good then return 'Y' else 'N' (or error) if there is a
@ failure. That failure is to get back to MMON, so that it may retry this
@ action later. In the local instance I get a 'Y' but in the customer's system
@ it fails with a 'N' which seems related to the ORA-600 assert. Still
@ reviewing. Thanks
*** SSRAO 05/09/06 02:31 am ***
@ Tracking this trace output
@ --
@ 04/13/06 13:52:27 >parameter dump for
@ dbms_ha_alerts_prvt.check_ha_resources
@ 04/13/06 13:52:27 > - local_db_unique_name (SFA)
@ 04/13/06 13:52:27 > - local_db_domain (==N/A==)
@ 04/13/06 13:52:27 > - rows deleted (0)
@ -- we have
@ - the db_unique_name and domain printed
@ - The SQL
@ --
@ DELETE FROM recent_resource_incarnations$
@ WHERE db_unique_name <> local_db_unique_name
@ OR db_domain <> local_db_domain;
@ -- executed to delete rri$ records for other databases (e.g. seed db, former
@ DataGuard primary). which ptints (0) rows so nothing was deleted here . For
@ all suspected dead instances we have post_ha_alert() called with the
@ arguments
@ --
@ post_ha_alert(reason_id => dbms_server_alert.
@ RSN_FAN_INSTANCE_DOWN, same_transaction =>
@ TRUE,
@ clear_old_alert => TRUE,
@ database_unique_name => local_db_unique_name,
@ database_domain => alert_db_domain,
@ instance_name => instance.instance_name,
@ host_name => instance.host_name,
@ event_reason => 'unknown',
@ event_time => instance.shutdown_time);
@ --
@ - which calls the assert. So far we know the dbunique_name is SFA and the
@ domain is not set or N/A and hence '' by default. Thanks
*** SSRAO 05/09/06 02:40 am ***
@ The PLSQL code calls kjhn_post_ha_alert_plsql() which in turn calls
@ kjhn_post_ha_alert0() which causes the assert so the parameters are being
@ passed and called from the PLSQL function , we need to track these parameters
@ for the failure to be able to isolate what is causing the assert. Still
@ reviewing the code for tracing capabilities. Thanks
*** SSRAO 05/09/06 05:33 pm ***
@ It seems that v$instance is returning a null hostname and which also
@ indicates why this package is returning a "F" which indicates a failure from
@ their side. Also from the traces this might be owned by the HA side but these
@ alerts are irrespective of whether RAC is present or not.
@ Reference from kqfv.h we have the hostname come from KSUXSHST from
@ x$ksuxsinst which in turn is populated from the callback of ksuxsrow().
@ This calls slkmnm() which in turn calls gethostbynam() to populate this
@ information so the things we need to verify here are
@ a) output of select * from v$instance
@ b) show parameter instance_name
@ .
@ Thanks
*** SSRAO 05/09/06 05:34 pm *** (CHG: Sta->10)
*** SSRAO 05/09/06 05:35 pm ***
@ Note that there might be two issues at play here
@ a) The hostname is not being set or the instance_name is not being set
@ b) The assert is not documented and should it depend on this condition
@ Depending on the outcome of the provided data we might fork this into two
@ separate bugs as they deal with different areas of the code. Thanks
*** RSERNA 05/10/06 06:53 am *** (CHG: Sta->16)
*** RSERNA 05/10/06 06:53 am ***
@ SQL> select * from v$instance;
@ .
@ INSTANCE_NUMBER INSTANCE_NAME
@
@ --------------- ----------------
@
@ .
@ HOST_NAME
@
@ ----------------------------------------------------------------
@
@ .
@ VERSION STARTUP_T STATUS PAR THREAD# ARCHIVE
@ LOG_SWITCH_WAIT
@ ----------------- --------- ------------ --- ---------- -------
@ ---------------
@ .
@ LOGINS SHU DATABASE_STATUS INSTANCE_ROLE ACTIVE_ST BLO
@
@ ---------- --- ----------------- ------------------ --------- ---
@
@ .
@ 1 sfa
@
@ W8KKWR81
@
@ 10.2.0.1.0 10-MAY-06 OPEN NO 1 STOPPED
@
@ ALLOWED NO ACTIVE PRIMARY_INSTANCE NORMAL NO
@
@ .
@
@
@ .
@ .
@ SQL> show parameter instance_name
@ NAME TYPE VALUE
@
@ ------------------------------------ -----------
@ ------------------------------
@ instance_name string sfa
@
*** SSRAO 05/10/06 09:41 am ***
@ I do not seem to understand this as both the hostname and the instance_name
@ is populated correctly with v$instance and hence the callback ksuxsrow() is
@ executing correctly to provide this information. Going to back into the
@ package to see if I can find additional clues to capture some of the
@ parameters passed.
@ Is this a test setup ? Is the customer willing to take some diagnostic .plb
@ scripts to execute and provide us with additional data ? Thanks
*** SSRAO 05/10/06 09:41 am *** (CHG: Sta->10)
*** RSERNA 05/10/06 10:34 am *** (CHG: Sta->16)
*** RSERNA 05/10/06 10:34 am ***
@ Yes customer is willing to take a diagnostic script or patch.
*** SSRAO 05/15/06 11:41 am ***
@ Working on a diagnostic. Thanks
*** SSRAO 05/15/06 01:19 pm ***
@ Reviewing the procedure check_ha_resources also
@ - recent_resource_incarnations$ table is populated from v$instance
@ - This is another table which contains the instance and the host_name
@ - This could be another source from where the error is being reported
@ - Please provide the output of
@ --
@ set pages 1000
@ select * from recent_resource_incarnations$
@ --
@ - check_ha_resources() does
@ - a) registers the current instance incarnation in
@ recent_resource_incarnations$ if it's not already there b) deletes
@ recent_resource_incarnations$ records that don't apply to this database. They
@ may, e.g. have been copied from seed db or from a former DataGuard primary
@ c) Scans recent_resource_incarnations$ for instance incarnations that are no
@ longer alive, and submits instance down alerts for them. Breaking this down
@ into function code we have
@ - INSERT INTO recent_resource_incarnations$ as select * from v$instance
@ - DELETE FROM recent_resource_incarnations$
@ WHERE db_unique_name <> local_db_unique_name
@ OR db_domain <> local_db_domain;
@ - Cursor dead_instances defines this as join from
@ recent_resource_incarnations$ so it appears that there is already something
@ out there which is causing these problems. Thanks
*** SSRAO 05/15/06 01:19 pm *** (CHG: Sta->10)
*** RSERNA 05/16/06 06:44 am *** (CHG: Sta->16)
*** RSERNA 05/16/06 06:44 am ***
@ File: recent_resource_incarnations.out uploaded.
@ .
@ Had customer execute:
@ c:hostname
@ W8KKWR81
*** SSRAO 05/16/06 07:41 am ***
@ From the provide traces this seems to be more clearer of why we are seeing
@ this. The output reveals entries like
@ --
@ INSTANCE 1
@ sfa
@ SFA
@ ==N/A==
@ sfa
@ .
@ .
@ ==N/A==
@ -- where a normal entry is
@ --
@ INSTANCE 1
@ sfa
@ SFA
@ ==N/A==
@ sfa
@ W8KKWR81
@ W8KKWR81
@ ==N/A==
@ 09-JAN-06 04.09.36.000000000 AM
@ -- so the hostname is missing for the above entries which is causing the
@ problem. For this table itself it is supposed to insert an entry when an
@ instance comes up and the cleanup is supposed to happen when it goes down or
@ another instance detects this instance going down (if RAC). Since this is a
@ single instance configuration the instance inserts this entry during startup
@ and then deletes the same when the instance is being shutdown or on
@ subsequent startup notes the state change. There are 57 rows in this table
@ which in itself does not look right . All the entries except the one with the
@ instance startup time around "10-JAN-06 08.17.00.000000000 PM" ar normal.
@ This table is populated using v$instance so there was some problem around
@ that time when the hostname was not listed which has caused this to manifest
@ .
@ Summary so far
@ - hostname for entry "10-JAN-06 08.17.00.000000000 PM" is missing
@ - Multiple entries in "recent_resource_incarnations$" which need cleanup
@ - Dead instances are constructed using this list which is why the assert
@ - post_instance_up() in MMON registers the current instance incarnation in
@ recent_resource_incarnations$ if it's not already there and scans the same
@ for instance incarnations that are no longer alive, and submits instance down
@ alerts for them
@ - check_ha_resources() - registers the db in recent_resource_incarnations
@ (RRI) or deletes recent_resource_incarnations$ records that don't apply
@ to this database. They may, e.g., have been copied from seed db or from a
@ former DataGuard primary and scans RRI for instance incarnations that
@ are no longer alive, and submits instance down alerts for them
@ - clear_instance_resources() - It clears all incarnations for the given
@ instance that started before the down event.
@ - The only deletes I found for this table are
@ --
@ DELETE FROM recent_resource_incarnations$
@ WHERE resource_type = 'INSTANCE'
@ AND resource_name = clear_instance_resources.instance_name
@ AND db_unique_name = database_unique_name
@ AND db_domain = NVL(database_domain, '==N/A==')
@ AND startup_time < SYS_EXTRACT_UTC(event_time);
@ ...
@ DELETE FROM recent_resource_incarnations$
@ WHERE db_unique_name <> local_db_unique_name
@ OR db_domain <> local_db_domain;
@ --
@ - Entries in this table are from "09-JAN-06 04.09.36.000000000 AM" to
@ "16-MAY-06 01.13.51.000000000 PM" with no fixed pattern in between.
@ - clear_instance_resources() is supposed to clean the entries out from the
@ description but that is not happening. This is called from
@ kjhn_clear_instance_resources() in kjhn.c which executes the following using
@ OCI
@ --
@ "BEGIN"
@ " dbms_ha_alerts_prvt.clear_instance_resources("
@ " :dbdomain, :dbuniquename, :instance_name, :event_time);"
@ "END;"
@ --
@ - I did not spot any OCI errors but that is what might be happening. Since
@ this in information for an alert this table can be purged (backup please) but
@ this seems to be a re-curring case.
@ - This is called from kjhn_post_ha_alert0()
@ --
@ if ( reason_id == KELT_FAN_INSTANCE_DOWN )
@ {
@ kjhn_clear_instance_resources(database_domain, database_unique_name,
@ instance_name, event_time);
@ }
@ --
*** SSRAO 05/16/06 07:45 am ***
@ The interesting thing is that the block which executes the code for ha_alerts
@ and which returns "kjhn_post_ha_alert0-862" ORA-600 be much before the
@ location where the assert is being processed. Since the code is in a KSETRY
@ block when the ORA-600 is reported is does not process
@ kjhn_clear_instance_resource() as that is the last funcction in the KSETRY
@ block which explains why the superfluos entries for all recent days as its
@ unable to clean it up when it gets posted by FAN or the reason_id of
@ KELT_FAN_INSTANCE_DOWN. The root cause of this still looks like the entry of
@ "09-JAN-06 04.09.36.000000000 AM" where the hostname is null. This is
@ generated from gv$instance at that time and populated
@ .
@ Questions
@ - Do we have the alert log dating that timeframe ?
@ - Is there something specific about 9th Jan which the customer can tell us as
@ the problems with the hostname have been fixed since then. Thanks
*** SSRAO 05/16/06 07:45 am *** (CHG: Sta->10)
*** SSRAO 05/16/06 07:47 am ***
@ Also if the customer wishes to get rid of the ORA-600 the following steps
@ as "sys" would help.
@ --
@ - create table recent_resource_incarnation$_backup as select * from
@ recent_resource_incarnation$;
@ - truncate table recent_resource_incarnation$;
@ - Please perform the truncate while this instance is up and running and not
@ going down and perform a clean shutdown after this
@ - This should stop the ORA-600's and subsequently the instance should have
@ clean shutdowns with no errors. Please provide the infomation requested above
@ too . Thanks
*** RSERNA 05/16/06 08:51 am *** (CHG: Sta->16)
*** RSERNA 05/16/06 08:51 am ***
@ THANKS. I will give customer workaround because they definitely want traces
@ to stop since they are large.
@ i have uploaded alert log: alert_sfa_latest.log
@ which covers Nov 2005 to April 2006.
@ Something definitely happened in January.
@ From customer
@ "We build laptops from an image. The database on the image was created on
@ November. But the laptop was imaged only in January."
@ .
@ I'm not sure how they do the image.
*** SSRAO 05/16/06 01:07 pm ***
@ Its hard to decipher this from the alert as the timeline is as follows
@ - Sun Jan 08 23:11:02 2006 - shutdown
@ - Tue Jan 10 15:17:00 2006 - startup
@ - Nothing in the timeframe listed
@ - Tue Jan 10 15:18:31 2006 - last startup
@ - Wed Jan 11 14:27:06 2006 - startup so unclean shutdown
@ - Wed Jan 11 14:29:36 2006 - ORA-600 reported
@ - We need to know what happened between Jan 8th and the 11th to be able to
@ conclude why v$instance did not return the valid hostname. If they imaged
@ their laptop then is this imaged with no hostname initially set when the
@ Oracle service came up. Also note the source is v$instance which pulls it
@ from x$ksuxsinst which calls slkmnm()->gethostbyname() which is a common
@ routine. The solution is to find out why the hostname was null on the
@ "09-JAN-06 04.09.36.000000000 AM" which is not in the alert log too. Thanks
*** SSRAO 05/16/06 01:07 pm *** (CHG: Sta->10)
*** RSERNA 05/16/06 01:42 pm ***
@ So far so good. They tried the suggested workarounds and no more errors.
@ Customer stated "We used ghost image from microsoft to image all the laptops.
@ We installed oracle on the host image and then slapped it on to other
@ laptops from the host image."
@ .
@ All the laptops they have imaged in this way are experiencing the problem.
@ So the problem may be on the original host. I will find out more.
*** RSERNA 05/19/06 11:47 am *** (CHG: Sta->16)
*** RSERNA 05/19/06 11:47 am ***
@ Customer's latest update
@ "I just got a new laptop that was created with the original image.
@ I checked the table recent_resource_incarnations$ and it has
@ the correct value for the host_name column. "
@ So it appears that the original image used to image all the laptops doesn't
@ always lead to the errors. Some laptops are fine, others are not.
@ .
@ They have also said "I checked with our desktop administrator and he told we
@ can't query the database on our image directly"
@ I'm not sure why.
@ .
@ How else can we diagnose this?
*** RSERNA 05/19/06 12:11 pm ***
@ I asked customer why they could not query the db directly on the image. Their
@ reply was "Our helpdesk administrator is saying that, taking a look
@ at copy of a brand new laptop is similar to seeing the image copy, which
@ makes sense. Image copy copies everything, that is how we have built our 300
@ laptops for our sales force. ". Doesn't really answer the question, but
@ thought I'd pass it along. Anyway, the same Image Copy is used on all the
@ laptops, some had the problems, some did not.
*** SSRAO 05/19/06 12:35 pm ***
@ We might be at a deadend for this. Summarizing the problem so we know what we
@ are looking for
@ .
@ - The routines called are for checking any instances are dead and for sending
@ the appropriate notification
@ - This scans the recent_resource_incarnations$ table for the details
@ - recent_resource_incarnations$ is populated from gv$instance
@ - The first instance (if RAC) which sees this clears the entry
@ - The notification is sent by this instance as well and this is cleared
@ - If the hostname is not populated we see this ORA-600
@ - The point which is not clear is how did gv$instance not have the hostname
@ - gv$instance -> x$ksuxsinst -> slkmnm()-> gethostbyname()
@ - gethostbyname() is generic call which gets the hostname from the OS
@ - I do not see much from gethostbyname() or failure codes from it
@ - The database alert log does not have enough information at that point
@ - The imaging activity is external to the database
@ - Clearing this table would prevent the assert
@ - Root cause is why the null hostname came into the table and this cannot be
@ deciphered with the information provided.
@ .
@ The reason I'm passing this bug is when there is an unclean shutdown the
@ cursor dead_instances is scanned and picked up from where a stale entry with
@ a bad hostname will cause every subsequent startup to assert with no cleanup.
@ I believe that this aberrational condition could be handled with cleanup of
@ the record and raising the ORA-600 , nonetheless not being able to send a
@ notification is not probbaly as serious to raise so many asserts.
@ The hostname issue cannot be debugged further without further details of what
@ happened on that day. BDE Screening complete
*** SSRAO 05/19/06 12:35 pm *** (CHG: Sta->11)
*** BUGPATCH 05/19/06 12:35 pm *** (CHG: Asg->RPARK)
*** BUGPATCH 05/19/06 12:35 pm ***
@ Assigned to RDBMS/RAC area queue owner
@ by SSRAO via the ST Bug Assignment Tool
*** TAKIBA 05/21/06 08:52 pm ***
@ Overflow queue screening.
@ .
@ I think it's better to investigate from viewpoint of PL/SQL.
@ The host_name was already corrupted on the table
@ recent_resource_incarnations$. Therefore, there are 2 possibilities.
@ .
@ 1. v$instance gave the empty host_name
@ .
@ 2. v$instance gave a correct host_name, but Pl/SQL function
@ rdbms/src/server/ccl/crs/prvtkjhn.sql: check_ha_resources
@ corrupted the value.
@ .
@ [quoted from rdbms/src/server/ccl/crs/prvtkjhn.sql: check_ha_resources
@ L562]
@ ============================================================
@ INSERT
@ INTO recent_resource_incarnations$
@ ( resource_type, resource_id, resource_name, db_unique_name,
@ db_domain, instance_name, host_name, startup_time, location,
@ incarnation )
@ SELECT 'INSTANCE', instance_number, instance_name,
@ local_db_unique_name,
@ local_db_domain,
@ instance_name, host_name,
@ SYS_EXTRACT_UTC(
@ instance_startup_timestamp_tz(vi.startup_time)),
@ host_name, '==N/A=='
@ FROM v$instance vi;
@ ============================================================
*** TAKIBA 05/21/06 08:55 pm *** (CHG: Asg->PLSREP Prod->11 Comp->PLSQL SubComp->)
*** TAKIBA 05/21/06 08:55 pm ***
@ I guess they can easily route the bug to port-specific (v$instance
@ issue probably in slkmnm()) or SQL execution.
*** AMANGAL 05/22/06 10:53 am *** (CHG: Asg->RDBMSREP Prod->5 Comp->RDBMS)
*** AMANGAL 05/22/06 10:53 am *** (CHG: Asg->RPARK)
*** AMANGAL 05/22/06 10:55 am ***
@ This needs to be investigated by the owners of check_ha_resources .
*** TAKIBA 05/28/06 04:34 am *** (CHG: SubComp->RAC)
*** TAKIBA 05/28/06 04:34 am ***
@ Put the subcomponent RAC for now.
*** BUGPATCH 06/05/06 10:21 pm *** (CHG: Asg->TAKIBA)
*** BUGPATCH 06/05/06 10:21 pm ***
@ Assigned from the area queue to TAKIBA
via the ST Bug Assignment Tool
*** TAKIBA 06/07/06 02:46 am ***
@ I describe the 2 possibilities again.
@ .
@ The host_name was already corrupted on the table
@ recent_resource_incarnations$. Therefore, there are 2 possibilities.
@ .
@ 1. v$instance gave the empty host_name
@ .
@ If GetComputerName() returned an error, slkmnm() can return a
@ NULL string.
@ .
@ [quoted from rdbms/src/common/osds/slkmnm.c@@/RDBMS_10.2.0.1.0_NT_RELEASE
@ slkmnm() L150]
@ ============================================================
@ len = mnm_l - 1;
@ if (GetComputerName(mnm, &len) == FALSE)
@ {
@ *mnm = ' ';
@ return(0);
@ }
@ else
@ return(len);
@ ============================================================
@ .
@ 2. v$instance gave a correct host_name, but Pl/SQL function
@ rdbms/src/server/ccl/crs/prvtkjhn.sql: check_ha_resources
@ corrupted the value.
@ .
@ [quoted from rdbms/src/server/ccl/crs/prvtkjhn.sql: check_ha_resources
@ L562]
@ ============================================================
@ INSERT
@ INTO recent_resource_incarnations$
@ ( resource_type, resource_id, resource_name, db_unique_name,
@ db_domain, instance_name, host_name, startup_time, location,
@ incarnation )
@ SELECT 'INSTANCE', instance_number, instance_name,
@ local_db_unique_name,
@ local_db_domain,
@ instance_name, host_name,
@ SYS_EXTRACT_UTC(
@ instance_startup_timestamp_tz(vi.startup_time)),
@ host_name, '==N/A=='
@ FROM v$instance vi;
@ ============================================================
*** TAKIBA 06/07/06 10:04 pm *** (CHG: Asg->NKOREHIS)
*** TAKIBA 06/07/06 10:04 pm ***
@ Assigning to Korehisa-san, to investigate from viewpoint of
@ Windows RAC.
*** NKOREHIS 06/13/06 11:20 pm *** (CHG: DevPri->2)
*** NKOREHIS 06/13/06 11:20 pm *** (CHG: Confirmed Flag->Y)
*** KEOCHI 07/30/06 02:39 pm ***
@ Sorry for rushing you..
@ My customer hits similar problem, and then BUG:5403213 for my customer is
@ marked as duplicate of this bug.
@ How the work is going? Thanks.
*** NKOREHIS 07/31/06 11:24 pm ***
@ slkmnm calls GetComputerName WIN32API. GetComputerName could fail if
@ buffer is too small to store the computer name.
@ .
@ on my local env, it is called with the following stack when select
@ from v$instance.
@ .
@ > oracommon10.dll!slkmnm(slerc * se=0x0b58d5e4,
@ unsigned char * mnm=0x0abd7ebc,
@ unsigned int mnm_l=0x00000040)
@ oracle.exe!_ksuxsrow() + 0x5a
@ oracle.exe!_qerfxFetch() + 0x3da
@ oracle.exe!_rwsfcd() + 0x5f
@ oracle.exe!_qerjotFetch() + 0xf3
@ oracle.exe!_rwsfcd() + 0x5f
@ oracle.exe!_qerjotFetch() + 0xf3
@ oracle.exe!_opifch2() + 0xc20
@ oracle.exe!_kpoal8() + 0xcf0
@ oracle.exe!_opiodr() + 0x44e
@ oracommon10.dll!_ttcpip() + 0x4fc
@ oracle.exe!_opitsk() + 0x3f9
@ oracle.exe!_opiino() + 0x444
@ oracle.exe!_opiodr() + 0x44e
@ oracle.exe!_opidrv() + 0x338
@ oracle.exe!_sou2o() + 0x32
@ oracle.exe!_opimai() + 0x171
@ oracle.exe!_opimai() + 0x61
@ oracle.exe!_OracleThreadStart@4() + 0x2c9
@ kernel32.dll!_BaseThreadStart@8() + 0x37
@ .
@ in ksuxsrow, call to slkmnm is made as follows.
@ .
@ DISCARD slkmnm(&se, xsc->ksuxshst, sizeof(xsc->ksuxshst)); /*
@ host name */
@ .
@ ksuxshst is defined as follows.
@ .
@ text ksuxshst[SKGP_HOSTNAME_LEN]; /* host
@ machine name */
@ .
@ in skgp0.h, SKGP_HOSTNAME_LEN is defined as follows.
@ .
@ #define SKGP_HOSTNAME_LEN SKGP_HOSTNAME_DEFAULT
@ /* operating system name
@ buffer size */
@ .
@ SKGP_HOSTNAME_DEFAULT is 64 as defined in skgp.h
@ .
@ #define SKGP_HOSTNAME_DEFAULT 64 /* host name
@ buffer size */
@ .
@ So, GetComputerName should not be failed due to the buffer size in this
@ case.
*** NKOREHIS 07/31/06 11:29 pm *** (CHG: Sta->30)
*** NKOREHIS 07/31/06 11:29 pm ***
@ As the slkmnm does not return any errors though GetComputerName fails,
@ we might need figure out what fails the function using debug patch.
@ Please let me know if the ct would agree to apply the debug patch.
*** RSERNA 08/01/06 06:19 am *** (CHG: Sta->11)
*** RSERNA 08/01/06 06:19 am ***
@ Yes, customer is willing to apply a debug patch.
*** NKOREHIS 08/10/06 04:55 am ***
@ working on the debug patch.
*** NKOREHIS 08/10/06 09:16 pm *** (CHG: Sta->30)
*** NKOREHIS 08/10/06 09:16 pm ***
@ the debug patch is available at bugftp:/upload/bug5173066/5173066.zip.
@ .
@ 1. How is the diagnostic information activated and de-activated:
@ Once installed the patch, it is automatically activated.
@ 2. What new information is being traced and when is it triggered?
@ Error information from GetComputerName will be traced in the
@ alert log.
@ 3. What is the performance impact of the new diagnostic tracing?
@ No performance impact is expected.
@ 4. What is expected from the generated diagnostic information and
@ where will this lead.
@ The patch is provided to confirm if GetComputerName is failed
@ in slkmnm. If so, it would show what is the OS error from the
@ call. Please provide alert log once the error reproduced.
*** RSERNA 08/18/06 07:15 am *** (CHG: Sta->11)
*** RSERNA 08/18/06 07:15 am ***
@ From customer:
@ "When i tried to start the Oracle database service, OracleDBService, it did
@ not start. It failed with a generic windows error,Couldn't start the service.
@ "
@ .
@ They followed the README that came with the patch and could not even start
@ services. Backing out the patch, everything was fine again.
*** NKOREHIS 08/22/06 04:06 am *** (CHG: Sta->30)
*** NKOREHIS 08/22/06 04:06 am ***
@ The patch worked fine on my development environment. Would you please
@ describe what error exactly the ct got?
*** RSERNA 08/28/06 08:12 am *** (CHG: Sta->11)
*** RSERNA 08/28/06 08:12 am ***
@ Customer attempted patch install again and same errors.
@ I've uploaded screenshots from the customer. File:patch_error_screenshot.doc
*** NKOREHIS 08/30/06 09:08 pm *** (CHG: Sta->30)
*** NKOREHIS 08/30/06 09:08 pm ***
@ rebuilt the diagnostic patch changing compiler options.
@ Would you please verify if it works on the customer's env?
@ It is available on bugftp:/upload/bug5173066/5173066_2.zip
*** NKOREHIS 08/31/06 06:16 pm ***
@ basically, why the patch failed to startup is still unknown. the
@ major difference between the debug version and the release version is
@ the compiler options for the patched file. The newer version is to
@ verify if it could lead to the failure. diagnostic code for the
@ original problem is present as well.
*** RSERNA 09/05/06 01:36 pm *** (CHG: Sta->11)
*** RSERNA 09/05/06 01:36 pm ***
@ The customer still had problems applying this Diag patch.
@ See file:screen_shot_of_second_patch_application.doc
@ that was uploaded which shows the customer's output following step-by-step
@ instructions from the readme.
*** NKOREHIS 09/13/06 01:38 am ***
@ built the patch on another development environment. regression
@ test is running.
*** NKOREHIS 09/13/06 07:18 am *** (CHG: Sta->30)
*** NKOREHIS 09/13/06 07:18 am ***
@ put the new diagnostic patch to bugftp:/upload/bug5173066/5173066_3.zip
@ This time, I zipped the patch with orageneric10.dll and
@ oraclient10.dll which exist on the development environment for
@ RDBMS_10.2.0.1.0_NT_RELEASE. I confirmed the patch worked on my local
@ testing env. Please try the new patch.
*** RSERNA 09/14/06 10:49 am *** (CHG: Sta->11)
*** RSERNA 09/14/06 10:49 am ***
@ Customer applied diag patch and started DB.
@ Alert log and mmon trace uploaded: alert_trace_14Sep06.ZIP
*** NKOREHIS 09/14/06 10:20 pm ***
@ alert log does not show any output from the diagnostic code. this
@ means that GetComputerName is not the culprit. we might need
@ another diagnostic.
*** NKOREHIS 09/26/06 04:33 am *** (CHG: Sta->30)
*** NKOREHIS 09/26/06 04:33 am ***
@ I added some more diagnostic code to ksuxsrow to see if xsc->ksuxshst
@ has a valid value after returning from slkmnm. the patch is in
@ bugftp:/upload/bug5173066/5173066_4.zip. Would you please have the
@ ct apply the patch? please provide alert log once the symptom reproduced.
*** RSERNA 09/26/06 05:35 am *** (CHG: Sta->11)
*** RSERNA 09/26/06 05:35 am ***
@ Hi, there is no file in: bugftp:/upload/bug5173066/5173066_4.zip
@ .
@ ftp> pwd
@ 257 "/upload/bug5173066"
@ ftp> get 5173066_4.zip
@ 200 PORT command successful. Consider using PASV.
@ 550 Failed to open file.
@ ftp> ls 5173066*.zip
@ 200 PORT command successful. Consider using PASV.
@ 150 Here comes the directory listing.
@ 5173066.zip
@ 5173066_2.zip
@ 5173066_3.zip
@ 226 Directory send OK.
*** NKOREHIS 09/26/06 06:13 pm *** (CHG: Sta->30)
*** NKOREHIS 09/26/06 06:13 pm ***
@ Sorry I re-uploaded the patch.
@ .
@ ftp> ls 5173066*.zip
@ 200 PORT command successful. Consider using PASV.
@ 150 Here comes the directory listing.
@ 5173066.zip
@ 5173066_2.zip
@ 5173066_3.zip
@ 5173066_4.zip
@ 226 Directory send OK.
*** RSERNA 09/28/06 10:10 am *** (CHG: Sta->11)
*** RSERNA 09/28/06 10:10 am ***
@ Diag Patch has been applied, latest alert log: alert_Sep28.log
@ has been uploaded.
*** NKOREHIS 09/28/06 06:10 pm ***
@ The following diagnostic trace indicates that MMON tid=1540 did not
@ fail to get hostname in ksuxsrow. the hostname was vaild until the
@ function finished. However the internal error singalled in the
@ same thread.
@ .
@ MMON:1540:ksuxsrow: ksuxshst = W8KKWR81
@ MMON:1540:ksuxsrow: ksuxshst = W8KKWR81
@ MMON:1540:ksuxsrow: ksuxshst = W8KKWR81
@ Thu Sep 28 08:05:04 2006
@ Errors in file c:oracleproduct10.2.0adminsfabdumpsfa_mmon_1540.trc:
@ ORA-600: internal error code, arguments: [kjhn_post_ha_alert0-862], [], [],
@ [], [], [], [], []
*** NKOREHIS 09/28/06 06:13 pm *** (CHG: Asg->RDBMSREP)
*** NKOREHIS 09/28/06 06:13 pm ***
@ back to generic RAC to investigate check_ha_resources.
*** BUGPATCH 09/28/06 06:18 pm *** (CHG: Asg->EMERITT)
*** BUGPATCH 09/28/06 06:18 pm ***
@ Assigned to RDBMS/RAC area queue owner
@ by NKOREHIS via the ST Bug Assignment Tool
@ .
@ No problem found in GetComputerName, slknmn, ksuxsrow. However the internal
@ error signalled. We should verify if check_ha_resources works as expected.
@ .
*** MMPANDEY 10/03/06 11:46 pm *** (CHG: Asg->MMPANDEY)
*** MMPANDEY 10/03/06 11:46 pm ***
@ This not exactly related to RAC, but as I have worked on a similar
@ problem, I would review.
*** RSERNA 10/25/06 06:19 am ***
@ Please provide an update. Thanks.
*** MMPANDEY 11/02/06 10:05 am ***
@ I am just back from a 2-week vacation and would be reviewing the bug
@ soon, alongwith others. Thanks.
*** RSERNA 11/28/06 05:56 am ***
@ Please provide an update. Thanks.
*** MMPANDEY 12/03/06 08:17 pm ***
@ I would do so, shortly. Thanks.
*** MMPANDEY 12/13/06 07:04 am ***
@ I am looking at the traces now. A similar problem is seen when the
@ oracle_sid name exceeds 17 characters, which does not seem to be
@ the case here. Looking at the possibilities of internal error being
@ raised in ksuxsrow().
*** RSERNA 01/03/07 08:41 am ***
@ Please provide an update. Thanks.
*** MMPANDEY 01/19/07 02:17 pm ***
@ The code to raise the error appears to have changed a bit in the later
@ labels, i.e. we are no longer checking the condition (*host_name)
@ .
@ KSEORAASSERTNM(instance_name && *instance_name && host_name,
@ OERINM("kjhn_post_ha_alert0-862"));
@ .
@ I am verifying this fact.
*** KNEEL 01/19/07 05:47 pm *** (CHG: Asg->RSERNA)
*** KNEEL 01/19/07 05:47 pm ***
@ The new row being inserted by check_ha_resources should not be the one
@ causing the assert; the assert is caused when attempting to delete an old row
@ with null hostname. Are new rows in recent_resource_incarnations$ still
@ getting null host name; if not, when is/are the bad row(s) from? Is this
@ still the row from Jan 2006?
@ .
@ Also, this is XP, and not an earlier version, right? Was this db ever run on
@ a pre-XP version, or on a system with a long host or network name?
@ .
@ We may never be able to solve where that original bad row came from, and it's
@ not clear that we should bother customer with
野比大雄的涅槃 最新版v0.8-625
野比大雄的涅槃是一款非常好玩的从端游移植而来的精品哆啦A梦同
汽车模拟器2内置菜单全车解锁版2024 最新版v1.54.2
汽车模拟器2内置涂装版是游戏的破解版本,在该版本中为玩家提供
快递到了亲内置菜单 安卓版v0.6.2
快递到了亲内置菜单是一款非常好玩的模拟经营类手游,内部有功能
我的世界为时已晚模组整合包 手机版v皮神木马
我的世界为时已晚模组整合包是一款剧情向的恐怖游戏像素风格沙盒
纸牌农庄内购版 v1.12.77
纸牌农庄无限道具版是一款将纸牌与模拟经营相结合的游戏,为玩家