我们都知道,在RAC环境中,如果kill ocssd.bin进程,会引起主机重启。
但是有时候系统已经异常了了,且CRS不能正常关闭,而主机可能是几年没重启的老系统,没人敢重启,现在怎么办?
我们只能尝试手工kill进程的方式,然后手工修复CRS(注意,在10.2 RAC中,只有3个d.bin进程)。
测试环境:操作系统是OEL 6.6
[root@lunar1 ~]# cat /etc/oracle-release
Oracle Linux Server release 6.6
[root@lunar1 ~]#
[root@lunar1 ~]# uname -a
Linux lunar1 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@lunar1 ~]#
这套RAC的CRS版本是11.2.0.4:
[root@lunar1 ~]# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.4.0]
[root@lunar1 ~]# crsctl query crs releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.4.0]
[root@lunar1 ~]# crsctl query crs softwareversion
Oracle Clusterware version on node [lunar1] is [11.2.0.4.0]
[root@lunar1 ~]#
注意,由于12.1普通RAC(非Flex Cluster)的情况根本文一样,处理思路和过程也一样。
查看当前CRS的状态:
[root@lunar1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
ONLINE ONLINE lunar1
ONLINE ONLINE lunar2
ora.DATADG1.dg
ONLINE ONLINE lunar1
ONLINE ONLINE lunar2
ora.DATADG2.dg
ONLINE ONLINE lunar1
ONLINE ONLINE lunar2
ora.LISTENER.lsnr
ONLINE ONLINE lunar1
ONLINE ONLINE lunar2
ora.asm
ONLINE ONLINE lunar1 Started
ONLINE ONLINE lunar2 Started
ora.gsd
OFFLINE OFFLINE lunar1
OFFLINE OFFLINE lunar2
ora.net1.network
ONLINE ONLINE lunar1
ONLINE ONLINE lunar2
ora.ons
ONLINE ONLINE lunar1
ONLINE ONLINE lunar2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE lunar2
ora.cvu
1 ONLINE ONLINE lunar2
ora.lunar.db
1 ONLINE ONLINE lunar1 Open
2 ONLINE OFFLINE STARTING
ora.lunar1.vip
1 ONLINE ONLINE lunar1
ora.lunar2.vip
1 ONLINE ONLINE lunar2
ora.oc4j
1 ONLINE ONLINE lunar1
ora.scan1.vip
1 ONLINE ONLINE lunar2
[root@lunar1 ~]#
查看当前所有的CRS进程:
[root@lunar1 ~]# ps -ef|grep d.bin
root 3860 1 0 19:31 ? 00:00:12 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid 3972 1 0 19:31 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid 3983 1 0 19:31 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid 3994 1 0 19:31 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
root 4004 1 0 19:31 ? 00:00:15 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid 4007 1 0 19:31 ? 00:00:12 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root 4019 1 0 19:31 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root 4032 1 0 19:31 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root 4051 1 0 19:31 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/cssdagent
grid 4063 1 0 19:31 ? 00:00:12 /u01/app/11.2.0.4/grid/bin/ocssd.bin
root 4157 1 0 19:31 ? 00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid 4180 1 0 19:31 ? 00:00:06 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid 4343 4180 0 19:32 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root 5385 1 1 19:39 ? 00:00:17 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid 5456 1 0 19:39 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 5473 1 0 19:39 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid 5475 1 0 19:39 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid 6535 1 0 19:50 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle 7132 1 0 20:04 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 7350 7273 0 20:04 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
这么多进程,他们的关系参见:11.2 RAC 的启动过程
好吧,我们开始模拟kill进程。首先kill 掉/u01/app/11.2.0.4/grid/bin/ohasd.bin(会自动重启,参见11.2 RAC 的启动过程)
[root@lunar1 ~]# kill -9 3860
[root@lunar1 ~]# ps -ef|grep d.bin
grid 3983 1 0 19:31 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid 3994 1 0 19:31 ? 00:00:03 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid 4007 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root 4019 1 0 19:31 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root 4032 1 0 19:31 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid 4063 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin
root 4157 1 0 19:31 ? 00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid 4180 1 0 19:31 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid 4343 4180 0 19:32 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root 5385 1 1 19:39 ? 00:00:19 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid 5456 1 0 19:39 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 5473 1 0 19:39 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid 5475 1 0 19:39 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid 6535 1 0 19:50 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle 7132 1 0 20:04 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid 7490 1 0 20:06 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root 7534 2487 14 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid 7571 1 6 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 7575 1 8 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root 7578 1 2 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root 7588 1 3 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root 7676 7273 0 20:07 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
然后,我们kill cssdmonitor:
[root@lunar1 ~]# kill -9 4032
-bash: kill: (4032) - No such process
[root@lunar1 ~]#
这里没有这个集成,表示cssdmonitor进程被重启过了:
(参见11.2 RAC 的启动过程)
[root@lunar1 ~]# ps -ef|grep d.bin
grid 3983 1 0 19:31 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid 3994 1 0 19:31 ? 00:00:03 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid 4007 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root 4019 1 0 19:31 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
grid 4063 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin
root 4157 1 0 19:31 ? 00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid 4180 1 0 19:31 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid 4343 4180 0 19:32 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root 5385 1 1 19:39 ? 00:00:19 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid 5456 1 0 19:39 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 5473 1 0 19:39 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid 5475 1 0 19:39 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid 6535 1 0 19:50 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle 7132 1 0 20:04 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid 7490 1 0 20:06 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root 7534 2487 3 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid 7571 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 7575 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root 7740 7273 0 20:07 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
上面进程启动时间在20:04~20:07之间的,都是被/u01/app/11.2.0.4/grid/bin/ohasd.bin进程重启后,自动后台重启的。
现在,我们kill mdnsd gpnpd gipcd osysmond。
这4个进程中,前面3个是CRS启动除了ohasd以外,最早启动的几个进程。
如果kill这些进程,ohasd都会重启的:
[root@lunar1 ~]# kill -9 3983 3994 4007 4019
[root@lunar1 ~]# ps -ef|grep d.bin
grid 4063 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin
grid 6535 1 0 19:50 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid 7490 1 0 20:06 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root 7534 2487 2 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid 7571 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 7575 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid 7756 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid 7758 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
root 7776 7273 0 20:07 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
这里我们看到,刚才kill 的4 进程都没起来,怎么回事?
别急,还没到时间,ohasd需要check后才启动,O(∩_∩)O哈哈~
然后,我们kill 监听:
[root@lunar1 ~]# kill -9 6535 7490
[root@lunar1 ~]# ps -ef|grep d.bin
grid 4063 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin
root 7534 2487 2 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid 7571 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 7575 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid 7756 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid 7758 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid 7783 1 2 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root 7785 1 2 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root 7844 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root 7853 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/octssd.bin
grid 7873 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
root 7874 1 14 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid 7944 7873 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid 7979 1 9 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid 7982 1 3 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
oracle 7986 1 4 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 8001 1 3 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid 8025 7979 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/lsnrctl status LISTENER
grid 8028 7979 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/lsnrctl status LISTENER_SCAN1
root 8083 7273 0 20:08 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
好吧,看看,刚才kill的进程都被重启了,11.2的RAC真强悍啊。
现在我们kill /etc/init.d/init.ohasd进程:
[root@lunar1 ~]# ps -ef|grep ohasd
root 2487 1 0 19:20 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 7534 2487 1 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
root 8191 7273 0 20:08 pts/2 00:00:00 grep ohasd
[root@lunar1 ~]# kill -9 2487 7534
[root@lunar1 ~]# ps -ef|grep ohasd
root 8239 1 0 20:08 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 8257 8239 0 20:08 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 8258 8257 0 20:08 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 8267 7273 0 20:08 pts/2 00:00:00 grep ohasd
[root@lunar1 ~]# ps -ef|grep ohasd
root 8239 1 0 20:08 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 8299 7273 0 20:08 pts/2 00:00:00 grep ohasd
[root@lunar1 ~]#
这里我们看到的就是/etc/init.d/init.ohasd被系统自动重启的过程。这些信息会记录在/var/log/message/中:
[root@lunar1 ~]# tail -f /var/log/messages
Jan 24 19:45:31 lunar1 kernel: e1000 0000:00:03.0 eth0: Reset adapter
Jan 24 20:03:50 lunar1 kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Jan 24 20:03:52 lunar1 kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Jan 24 20:07:01 lunar1 clsecho: /etc/init.d/init.ohasd: ohasd is restarting 1/10.
Jan 24 20:07:01 lunar1 logger: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunar1_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "restart"
Jan 24 20:08:26 lunar1 init: oracle-ohasd main process (2487) killed by KILL signal
Jan 24 20:08:26 lunar1 init: oracle-ohasd main process ended, respawning
Jan 24 20:13:58 lunar1 init: oracle-ohasd main process (8239) killed by KILL signal
Jan 24 20:13:58 lunar1 init: oracle-ohasd main process ended, respawning
Jan 24 20:14:12 lunar1 root: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunar1_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "reboot"
^C
[root@lunar1 ~]#
而且他进程都被自动重启了(注意这是crsd进程还没被重启):
[root@lunar1 ~]# ps -ef|grep d.bin
grid 4063 1 0 19:31 ? 00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin
root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid 7756 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid 7758 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid 7783 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root 7785 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root 7844 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root 7853 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/octssd.bin
grid 7873 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
root 7874 1 3 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid 7944 7873 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid 7979 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid 7982 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
oracle 7986 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 8001 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid 8119 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 8120 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
root 8321 8319 1 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin check has
root 8325 7273 0 20:08 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
现在我们依次kill:evmlogger.bin gpnpd.bin mdnsd.bin gipcd.bin evmd.bin oraagent.bin scriptagent.bin oraagent.bin orarootagent.bin和两个lisnterner
[root@lunar1 ~]# kill -9 7944 7756 7758 7783 7873 7979 7982 7986 8001 8119 8120
[root@lunar1 ~]# ps -ef|grep d.bin
grid 4063 1 0 19:31 ? 00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin
root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root 7785 1 1 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root 7844 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root 8593 8591 0 20:09 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin check has
root 8597 7273 0 20:09 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
然后,kill osysmond.bin ologgerd cssdmonitor cssdagent :
[root@lunar1 ~]# kill -9 7785 7844 7588 7578
[root@lunar1 ~]#
好吧,现在就剩下一个ocssd.bin了:
[root@lunar1 ~]# ps -ef|grep d.bin
grid 4063 1 0 19:31 ? 00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin
root 8629 7273 0 20:10 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
现在我们kill 传说中一旦被kill就会引起主机重启的进程 ocssd.bin :
[root@lunar1 ~]# kill -9 4063
[root@lunar1 ~]#
好了,我们的系统都还好好的,没有重启,资源也都释放干净了:
[root@lunar1 ~]# ipcs -ma
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
------ Semaphore Arrays --------
key semid owner perms nsems
0x00000000 0 root 600 1
0x00000000 65537 root 600 1
------ Message Queues --------
key msqid owner perms used-bytes messages
[root@lunar1 ~]#
[root@lunar1 ~]#
如果要恢复,很简单,只要直接重启crs就ok了:
[root@lunar1 ~]# ps -ef | grep -v grep|grep -E 'init|d.bin|ocls|evmlogger|UID'
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 19:20 ? 00:00:01 /sbin/init
root 2486 1 0 19:20 ? 00:00:00 /bin/sh /etc/init.d/init.tfa run
root 8924 1 0 20:13 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
[root@lunar1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@lunar1 ~]# ps -ef|grep ohasd
root 8924 1 0 20:13 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 8968 1 4 20:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root 9187 7273 0 20:14 pts/2 00:00:00 grep ohasd
[root@lunar1 ~]#
[root@lunar1 ~]# ps -ef|grep d.bin
root 8968 1 0 20:14 ? 00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid 9090 1 0 20:14 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid 9101 1 0 20:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid 9112 1 0 20:14 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
root 9122 1 0 20:14 ? 00:00:09 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid 9126 1 0 20:14 ? 00:00:08 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root 9139 1 0 20:14 ? 00:00:12 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root 9150 1 0 20:14 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root 9169 1 0 20:14 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/cssdagent
grid 9180 1 0 20:14 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/ocssd.bin
root 9212 1 1 20:14 ? 00:00:28 /u01/app/11.2.0.4/grid/bin/ologgerd -M -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root 9340 1 0 20:18 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid 9363 1 0 20:18 ? 00:00:03 /u01/app/11.2.0.4/grid/bin/evmd.bin
root 9455 1 0 20:18 ? 00:00:09 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid 9532 9363 0 20:18 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid 9569 1 0 20:18 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid 9572 1 0 20:18 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
root 9591 1 0 20:18 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid 9682 1 0 20:18 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid 9684 1 0 20:18 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
oracle 9774 1 0 20:19 ? 00:00:03 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root 10642 7273 0 20:38 pts/2 00:00:00 grep d.bin
[root@lunar1 ~]#
[root@lunar1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
ONLINE ONLINE lunar1
ora.DATADG1.dg
ONLINE ONLINE lunar1
ora.DATADG2.dg
ONLINE ONLINE lunar1
ora.LISTENER.lsnr
ONLINE ONLINE lunar1
ora.asm
ONLINE ONLINE lunar1 Started
ora.gsd
OFFLINE OFFLINE lunar1
ora.net1.network
ONLINE ONLINE lunar1
ora.ons
ONLINE ONLINE lunar1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE lunar1
ora.cvu
1 ONLINE ONLINE lunar1
ora.lunar.db
1 ONLINE ONLINE lunar1 Open
2 ONLINE OFFLINE
ora.lunar1.vip
1 ONLINE ONLINE lunar1
ora.lunar2.vip
1 ONLINE INTERMEDIATE lunar1 FAILED OVER
ora.oc4j
1 ONLINE ONLINE lunar1
ora.scan1.vip
1 ONLINE ONLINE lunar1
[root@lunar1 ~]#
这里只显示了节点1,因为节点2我关闭了。
测试证明,只要先kill cssdmonitor 和 cssdagent进程(准确的说是cssagent,从那张CRS启动的经典大图上也可以看到这个关系),再kill ocssd.bin进程,系统是不会重启的。
另外,12.1普通RAC(非Flex Cluster)的情况根本文一样,处理思路和过程也一样。