RAC | DBA Knowledge Base

Restart Exadata storage cell service without affecting ASM

October 23, 2017 Leave a comment

Brief history:

One week ago on our DR Exadata cell service hanged, which caused all databases located on Exadata to become inaccessible.

CellCLI> LIST ALERTHISTORY
9 2017-10-13T11:56:05+04:00 critical “RS-7445 [Serv CELLSRV hang detected] [It will be restarted] [] [] [] [] [] [] [] [] [] []”

In cell’s alert history there was written that the service would be restarted itself , but it did not and I restarted it by the following way:

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

The databases started to work correctly.

Today, the same problem happend on the HQ side which of course caused to stop everything for a while until I’ve restarted the service.

But identifying which cell was problematic was a little bit difficult, because there was no error in alerthistory.

BUT when I entered the following command on the third cell node – it hanged, other cells were OK.

CellCLI> LIST ACTIVEREQUEST

So I restarted the same service on that node and problem was resolved.

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

Of course, this is not a solution and cell service must not hang! , but this is the simple workaround when you have stopped PRODUCTION database.

I have created SR and waiting answer from them , if there is any usefull news will update this post.

===================================================================================================

Writing down the correct steps of restarting Cell Services without affecting ASM:

1. Run the following command to check if there are offline disks on other cells that are mirrored with disks on this cell:

CellCLI > LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != ‘Yes’

Warning : If any grid disks are listed in the returned output, then it is not safe to stop or re-start the CELLSRV process because proper Oracle ASM disk group redundancy will not be intact and will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.

If no grid disks are listed in the returned output, you can safely restart cellsrv or all services in step #2 below.

2. Re-start the cell services using either of the following commands:

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

CellCLI> ALTER CELL RESTART SERVICES ALL

BUT what is good news cell has self-defence on reduced redundancy, if you try to restart it when redundancy check is not satisfied you get:

CellCLI> ALTER CELL RESTART SERVICES ALL;

Stopping the RS, CELLSRV, and MS services…
The SHUTDOWN of ALL services was not successful.
CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be forced to dismount due to reduced redundancy.
Getting the state of CELLSRV services… running
Getting the state of MS services… running
Getting the state of RS services… running

Filed under Exadata, RAC Tagged with cell, cell storage, cellcli, cellsrv, Exadata, hanged

Rebuild RAC clusterware without deleting data

September 28, 2016 Leave a comment

As I have mentioned in my previous posts, I was applying interim patch on database which had post installation script (# <GI_HOME>/crs/install/rootcrs.pl -postpatch) .
The post script failed with permission denied error on ohasd file and left clusterware in a messy situation.

I have opened SR on metalink and one of their support after a huge amount of time of talking and troubleshooting together says:

“We do not know what happened or what steps you have taken to reach this situation. You should open an SR with us before you deconfigure the node.
Please, do bare metal restore as it is recommended by previous engineer.
Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment ( Doc ID 1084360.1 )”

This Bare Metal Restore is like wiping everything and after that I have had to configure RAC, DATAGUARD and everything from scratch. <<–Don’t like such solutions, this is like “if your windows works slowly then reinstall it”.. for windows this might be really true 🙂 nothing than reinstall helps 😀 but on Linux/Oracle you must troubleshoot first.
So I created another SR with another error(Errors at this time were lot) and for the second time I was lucky.
I was working 24/7 with support, the engineers were shifting. Three different engineers worked at different times on this SR.
I want to mention one “Venkata Pradeep Kumar” Oracle support engineer , he is so clever he helped me a lot and we rescued the system !:)

I want to share the steps with you , it should interesting.

Problem:

After applying patch post script on first node (which failed), clusterware on first node was not starting. At this time second node was fine.
I have deconfigured clusterware (write this step in solution section) on first node and it started but with some problems about oc4j service.

2016/09/27 06:56:15 CLSRSC-1003: Failed to start resource OC4J
2016/09/27 06:56:16 CLSRSC-287: FirstNode configuration failed

I have deconfigured clusterware on second node also and tried to run root.sh, but it said that root.sh could not be run because it was not successful on first node. 😦

So, until root.sh script is not completely successful on first node you should not deconfigure it on second. But if you did it do not panic if you have OCR backup.

Solution:

# Deconfigure crs on problematic node , note you may help the different solution , by just configuring one node. In my situation all nodes became problematic.
# Also please be careful, below steps assumes that you have separate group for OCR. Datafiles must be on different group. Or diskgroup will be wiped.

# From root on both nodes node1 , node2

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -deconfig -force

# run root.sh on node1 , it may not be completely successful

/u01/app/12.1.0.2/grid/root.sh

# We need to find a good OCR backup , for me it is week.ocr which was taken automatically in 2016/09/15 09:12:28.
# Patch was applied at 10:00AM in 2016/09/25. So we need week.ocr it is before patching.

[root@lbdm01-dr-adm grid]# ocrconfig -showbackup

lbdm02-dr-adm 2016/09/27 02:35:23 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup00.ocr 3351897854
lbdm02-dr-adm 2016/09/26 15:44:53 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup01.ocr 3351897854
lbdm02-dr-adm 2016/09/26 11:44:52 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup02.ocr 3351897854
lbdm02-dr-adm 2016/09/27 02:35:23 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/day.ocr 3351897854
lbdm01-dr-adm 2016/09/15 09:12:28 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/week.ocr 854493477
lbdm02-dr-adm 2016/09/25 15:29:18 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup_20160925_152918.ocr 3351897854
lbdm02-dr-adm 2016/09/25 10:34:56 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup_20160925_103456.ocr 2725022894
lbdm01-dr-adm 2015/07/29 19:46:28 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup_20150729_194628.ocr 854493477
lbdm01-dr-adm 2015/07/29 19:46:27 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup_20150729_194627.ocr 854493477

# Ensure that no process left
# node 1

crsctl stop crs -f
ps -ef|grep “/u01/app”

# if here is anything kill them!

#Start clusterware in exclusive mode with no ocr on node 1

crsctl start crs -excl -nocrs

#Restore OCR on node 1

ocrconfig -restore /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/week.ocr
ocrcheck

# Stop crs on node 1

crsctl stop crs -f
crsctl start crs

# Check the status

crsctl status res -t

# It should be OK

# Do the same steps on node 2 from root, but it may fail

/u01/app/12.1.0.2/grid/root.sh

# Failed

ORA-15160: rolling migration internal fatal error in module SKGXP,valNorm:not-native
. For details refer to “(:CLSN00107:)” in “/u01/app/oracle/diag/crs/lbdm02-dr-adm/crs/trace/ohasd_oraagent_oracle.trc”.
CRS-2883: Resource ‘ora.asm’ failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.
2016/09/28 09:11:00 CLSRSC-117: Failed to start Oracle Clusterware stack

# deconfig on both nodes
# node1 , node2

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -deconfig -force

#and run agin root.sh
# node 1

/u01/app/12.1.0.2/grid/root.sh

# It was completelly successful.

# On second there is still problem

# Read the following document ORA-15160: rolling migration internal fatal error in module SKGXP,valNorm:not-native (NOTE 1682591.1)

# Here problem was on protocols that was used by asm and rdbms.
# rdbms is using rds protocol and asm is using udp, see Oracle Clusterware and RAC Support for RDS Over Infiniband (NOTE 751343.1)
# problem was in libraries and we should relink them with right protocols
# As the ORACLE_HOME/GI_HOME owner, stop all resources (database, listener, ASM etc) that’s running from the home. When stopping database, use NORMAL or IMMEDIATE option.

# From problemtic node , where asm or database is not starting.

crsctl stop crs
ps -ef|grep d.bin
ps -ef|grep “/u01/app”

# Kill if any process left

# If relinking Grid Infrastructure (GI) home, as root, unlock GI home: <GI_HOME>/crs/install/rootcrs.pl -unlock

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -unlock

# As the ORACLE_HOME/GI_HOME owner, go to ORACLE_HOME/GI_HOME and cd to rdbms/lib
# As the ORACLE_HOME/GI_HOME owner, issue “make -f ins_rdbms.mk <protocol write here> ioracle”
#For rdbms

[root@lbdm02-dr-adm lib]# su – oracle
[oracle@lbdm02-dr-adm ~]$ cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ipc_rds ioracle

#For asm

. oraenv
+ASM2
[oracle@lbdm02-dr-adm ~]$ cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ipc_g ioracle

# From root

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -patch

# The last step should have configure clusterware also. And everything should be fine. And you can sleep now. 🙂

Filed under 12c, ASM, RAC, Real World Scenarios Tagged with CLSN00107, ins_rdbms.mk, ocrconfig -restore, ORA-15160, rootcrs.sh -deconfig

RAC: root.sh | CRS-2672: Attempting to start ‘ora.storage’ | ORA-01017: invalid username/password

September 27, 2016 Leave a comment

I was configuring clusterware on node1 and got the following error:

…

CRS-2672: Attempting to start ‘ora.storage’ on ‘node1’
ORA-01017: invalid username/password; logon denied
CRS-5017: The resource action “ora.storage start” encountered the following error:
Storage agent start action aborted. For details refer to “(:CLSN00107:)” in “/u01/app/oracle/diag/crs/node1/crs/trace/ohasd_orarootagent_root.trc”.
CRS-2883: Resource ‘ora.storage’ failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.
2016/09/27 05:41:01 CLSRSC-117: Failed to start Oracle Clusterware stack

Died at /u01/app/12.1.0.2/grid/crs/install/crsinstall.pm line 930.
The command ‘/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/rootcrs.pl ‘ exe ution failed

/u01/app/oracle/diag/crs/node1/crs/trace/ohasd_orarootagent_root.trc file says:

2016-09-27 05:40:56.787330*:kgfn.c@6018: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x3 unam=NULL password is NULL pstr=_ocr
2016-09-27 05:40:56.787330*:kgfn.c@6194: kgfnConnect2Int: cstr=(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle)(ARGV0=oracle+ASM1_ocr)(ENVS=’ORACLE_HOME=/u01/app/12.1.0.2/grid,ORACLE_SID=+ASM1′)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))’)(PRIVS=(USER=root)(GROUP=root)))(enable=setuser))
2016-09-27 05:40:57.273302 : AGENT:2583111424: {0:9:3} {0:9:3} Created alert : (:CRSAGF00113:) : Aborting the command: start for resource: ora.storage 1 1

So why user root???

See, when I connect using root I got ORA-01017

[root@node1 ~]# . oraenv
ORACLE_SID = [+ASM1] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@node1 ~]# sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 27 05:59:01 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied

If I connect through Oracle it is OK:

su – oracle

[oracle@node1 ~]$ . oraenv
ORACLE_SID = [LBTCI1] ? +ASM1

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 27 05:59:45 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL>

Look the connection string again there is “PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle”, so let’s check file permissions.

[oracle@node1 ~]$ ll /u01/app/12.1.0.2/grid/bin/oracle
-rwsr-s–x 1 root root 295054213 Sep 27 05:26 /u01/app/12.1.0.2/grid/bin/oracle

It must be oracle:oinstall not root:root

chown oracle:oinstall /u01/app/12.1.0.2/grid/bin/oracle
chmod 6751 /u01/app/12.1.0.2/grid/bin/oracle

deconfigure(rootcrs.pl -deconfig -verbose) crs and reconfigure(run root.sh) it again.

Filed under Linux, RAC, Real World Scenarios Tagged with CRS-2672, CRS-2883, CRS-5017, ORA-01017, Oracle 12c, RAC, root.sh

error: package cvuqdisk is not installed

September 27, 2016 Leave a comment

I have applied patch on RAC and after running postinstall script on the first node, it failed because of some file permission and the problem started…

I could not startup clusterware on first node.

I have deconfigured clusterware by:

[root@lbdm01-dr-adm ~]# $ORACLE_HOME/crs/install/rootcrs.pl -deconfig -force -verbose

And here I got error(cvuqdisk):

PRCR-1070 : Failed to check if resource ora.net1.network is registered
CRS-0184 : Cannot communicate with the CRS daemon.
PRCR-1070 : Failed to check if resource ora.helper is registered
CRS-0184 : Cannot communicate with the CRS daemon.
PRCR-1070 : Failed to check if resource ora.ons is registered
CRS-0184 : Cannot communicate with the CRS daemon.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘lbdm01-dr-adm’
CRS-2679: Attempting to clean ‘ora.cssd’ on ‘lbdm01-dr-adm’
CRS-2680: Clean of ‘ora.cssd’ on ‘lbdm01-dr-adm’ failed
CRS-2799: Failed to shut down resource ‘ora.cssd’ on ‘lbdm01-dr-adm’
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on ‘lbdm01-dr-adm’ has failed
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
2016/09/26 19:54:12 CLSRSC-463: The deconfiguration or downgrade script could not stop current Oracle Clusterware stack.

2016/09/26 19:54:12 CLSRSC-4006: Removing Oracle Trace File Analyzer (TFA) Collector.

2016/09/26 19:54:26 CLSRSC-4007: Successfully removed Oracle Trace File Analyzer (TFA) Collector.

error: package cvuqdisk is not installed
2016/09/26 19:54:26 CLSRSC-557: Oracle Clusterware stack on this node has been successfully deconfigured. There were some errors which can be ignored.

I have searched information about this package on documentation and I have found the following:

Link: https://docs.oracle.com/database/121/LADBI/pre_install.htm#LADBI7632

Installing the cvuqdisk RPM for Linux

If you do not use an Oracle Preinstallation RPM, then you must install the cvuqdisk RPM. Without cvuqdisk, the Cluster Verification Utility cannot find shared disks, and you receive a “Package cvuqdisk not installed” error when you run the Cluster Verification Utility. Use the cvuqdisk RPM for your hardware (for example, x86_64, or i386).

To install the cvuqdisk RPM, complete the following procedure:

Locate the cvuqdisk RPM package, which is in the directory rpm on the Oracle Database installation media. If you installed Oracle Grid Infrastructure, then it is in the directory oracle_home1/cv/rpm.
Log in as root.
Use the following command to find if you have an existing version of the cvuqdisk package:
```
# rpm -qi cvuqdisk
```
If you have an existing version, then enter the following command to deinstall the existing version:
```
# rpm -e cvuqdisk
```
Set the environment variable CVUQDISK_GRP to point to the group that owns cvuqdisk, typically oinstall, for example:
```
# CVUQDISK_GRP=oinstall; export CVUQDISK_GRP
```
In the directory where you have saved the cvuqdisk RPM, use the following command to install the cvuqdisk package:
```
rpm -iv package
```
For example:
```
# rpm -iv cvuqdisk-1.0.9-1.rpm
```

So I have found the mentioned package in the following directory and installed it.

cd /u01/app/12.1.0.2/grid/cv/rpm

yum install cvuqdisk-1.0.9-1.rpm

Note: The problem is strange but, I am not writing why this happened in this post, because I don’t know it yet 🙂
The aim of this post is that you should know where to find cvuqdisk package and what is it for 🙂

Good Luck!

Filed under Linux, RAC, Real World Scenarios Tagged with cvuqdisk, error: package cvuqdisk is not installed, Oracle 12c

Oracle Clusterware is restarting database “” | shut down instance “” of database “” | start up instance “” of database “”

July 4, 2016 Leave a comment

When I used data guard broker to switchover primary database to standby database sometimes broker writes “Oracle Clusterware is restarting database” … hangs and times out.

The problem was that database was not registered with srvctl.

srvctl add database -d db_unique_name -o oracle_home
srvctl add instance -d db_unique_name -i instance_name1 -n node_name1
srvctl add instance -d db_unique_name -i instance_name2 -n node_name2

instead of db_unique_name, oracle_home , instance_name1, node_name1, instance_name2, node_name2 enter values according to your database.

Filed under 12c, Data Guard Broker, DB, RAC, STANDBY Tagged with broker, data guard, Data Guard Broker, Oracle 12c, switchover, switchover problem

SESSIONS WAITING ON INACTIVE TRANSACTION BRANCH, GLOBAL HASH COLLISION 12c, 11g Oracle RAC(distributed transaction)

January 6, 2016 Leave a comment

When performing XA transactions against a multi-node Oracle RAC configuration, some branches of the transaction may not commit.. this is a known bug, but to tell the truth no bug fix helped me to solve this problem until I came across IBM technote. https://www-304.ibm.com/support/docview.wss?uid=swg21460967

There are several workarounds but I prefer Work around 1. I have used it and works perfectly.

1. If using pfile (init.ora) files, add the following line to the file:

_clusterwide_global_transactions=false

2. If using an spfile, issue the following command from SQL*Plus:

alter system set “_clusterwide_global_transactions”=false scope=spfile

3. Restart the database (you can restart nodes , one by one)

Problem should dissapear.

Filed under 12c, DB, RAC, Real World Scenarios Tagged with Distributed transaction, GLOBAL HASH COLLISION, INACTIVE TRANSACTION BRANCH, Oracle 12c, Oracle RAC

Add/Drop ASM disks to DISKGROUP on RAC(or Standalone)

January 17, 2014 Leave a comment

Note: The steps are described for RAC, but you can easily guess what are the steps for the standalone database.

1. First of all find the disk or partition name, that should be added to the ASM.

fdisk -l

My disk partition name is /dev/sdi1.

2. Assign the disk to ORACLEASM.

–On node1

/etc/init.d/oracleasm createdisk DISK7 /dev/sdi1

3. Scan disks in ALL NODES and list them to check if is presented.

–On node1

/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks

–On node2

/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks

4. Change the environment to the grid infrastructure, by setting ORACLE_SID to +ASM and so on :

$ . oraenv
ORACLE_SID = [media1] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle

# Connect as an SYSASM

sqlplus / as sysasm

Note: If you don’t remember the password for the sysasm user see How to reset SYSASM password.

# Find the diskgroup name

SQL> select name from v$asm_diskgroup;

NAME
——————————
DATA01

# Increase power limit, if you want, to complete rebalance operation in a short time.

SQL> alter system set asm_power_limit=10

# Indicate disks location by the parameter asm_diskstring

SQL> alter system set asm_diskstring=’ORCL:DISK*’

SQL> alter diskgroup DATA01 add disk ‘ORCL:DISK7’;

It will do the rebalance automatically.

# To drop the disk , do the following:

SQL > alter diskgroup DATA01 drop disk DISK7;

It will rebalance first and then drops the disk automatically.

You can see the current operation in v$asm_operation view.

Note: Until the view v$asm_operation contains a record you are able to undrop the disks by the following way:

SQL> alter diskgroup DATA01 undrop disks;

If the operation is already completed , you are not able to undrop the disk . But you can re-add the disk , if you want.

That is all.

Filed under DB, RAC Tagged with add disk, ASM, asm disk, drop disk, oracle 11g, sysasm

How to reset SYSASM password

January 17, 2014 Leave a comment

By sysasm user can maintain ASM instance. The main idea is to separate storage administrator and database administrator responsibilities. To reset its password, do the following:

[oracle@r1n1 ~]$ . oraenv
ORACLE_SID = [orcl1] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle

[oracle@r1n1 ~]$ asmcmd
ASMCMD> orapwusr –modify –password sys
Enter password: ********

Filed under DB, RAC Tagged with password, reset, sysasm

Transaction recovery: lock conflict caught and ignored

April 30, 2013 1 Comment

ALERT.LOG:

..... Transaction recovery: lock conflict caught and ignored
.....

And also some incident files are being created in $ORACLE_BASE/diag/rdbms/dbname/instancename/incident folder.

In my case the error started after SUPPLEMENTAL LOGGING enabled in a RAC environment. After disabling it the messages have not disappeared, but incident files are no longer being created.

1. Dead Trasaction

SQL> select b.name useg, b.inst# instid, b.status$ status, a.ktuxeusn

xid_usn, a.ktuxeslt xid_slot, a.ktuxesqn xid_seq, a.ktuxesiz undoblocks,

a.ktuxesta txstatus

from x$ktuxe a, undo$ b

where a.ktuxecfl like ‘%DEAD%’

and a.ktuxeusn = b.us#;

USEG INSTID STATUS XID_USN XID_SLOT XID_SEQ UNDOBLOCKS TXSTATUS

_SYSSMU7_881277423$ 1 3 7 13 1829999 1 ACTIVE

_SYSSMU8_4204495590$ 1 3 8 32 3045564 1 ACTIVE

_SYSSMU10_1314081219$ 1 3 10 3 11844457 1 ACTIVE

Transaction id is XID_USN.XID_SLOT.XID_SEQ

So in our case, for the first row it will be 7.13.1829999

2. Read transaction table from undo header.

ALTER SYSTEM DUMP UNDO HEADER ‘_SYSSMU7_881277423$’;

….

TRN TBL::

index state cflags wrap#    uel         scn            dba            parent-xid    nub     stmt_num    cmt

————————————————————————————————

   0x00    9    0x03 0x1bf45c 0x000b 0x0000.789de808 0x00c242eb 0x0000.000.00000000 0x00000001   0x00c242eb 1367258143

   0x01    9    0x00 0x1c031b 0x0014 0x0000.789e6018 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258225

   0x02    9    0x00 0x1c147a 0x000e 0x0000.789e694b 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258230

   0x03    9    0x00 0x1c06f9 0x0016 0x0000.789e601c 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258225

   0x04    9    0x00 0x1c06c8 0x0009 0x0000.789e3566 0x00c242f9 0x0000.000.00000000 0x00000001   0x00000000 1367258192

   0x05    9    0x00 0x1c1167 0x0015 0x0000.789e357f 0x00c242ec 0x0000.000.00000000 0x00000001   0x00000000 1367258192

   0x06    9    0x00 0x1c2716 0x0017 0x0000.789e69e1 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258230

   0x07    9    0x00 0x1c1045 0x000c 0x0000.789e1bdb 0x00c242eb 0x0000.000.00000000 0x00000001   0x00000000 1367258170

   0x08    9    0x00 0x1c2614 0x0005 0x0000.789e357e 0x00c242ec 0x0000.000.00000000 0x00000001   0x00000000 1367258192

   0x09    9    0x00 0x1bfa03 0x0021 0x0000.789e3574 0x00c242f9 0x0000.000.00000000 0x00000001   0x00000000 1367258192

   0x0a    9    0x00 0x1bf712 0x001e 0x0000.789e3246 0x00c242f1 0x0000.000.00000000 0x00000001   0x00000000 1367258190

   0x0b    9    0x00 0x1c1e01 0x0007 0x0000.789e1bd9 0x00c242eb 0x0000.000.00000000 0x00000001   0x00000000 1367258170

   0x0c    9    0x00 0x1c08e0 0x000a 0x0000.789e3244 0x00c242f1 0x0000.000.00000000 0x00000006   0x00000000 1367258190

   0x0d   10    0x90 0x1bec6f 0x0038 0x0000.789e783e 0x00c242fb 0x0000.000.00000000 0x00000001   0x00c242fb 0

   0x0e    9    0x00 0x1c068e 0x0010 0x0000.789e694d 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258230

   0x0f    9    0x00 0x1c151d 0x0012 0x0000.789e3578 0x00c242ec 0x0000.000.00000000 0x00000001   0x00000000 1367258192

   0x10    9    0x00 0x1c26bc 0x0006 0x0000.789e69df 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258230

   0x11    9    0x00 0x1c16eb 0x0000 0x0000.789cbd77 0x00c242eb 0x0000.000.00000000 0x00000001   0x00000000 1367257923

   0x12    9    0x00 0x1c082a 0x001d 0x0000.789e357c 0x00c242ec 0x0000.000.00000000 0x00000001   0x00000000 1367258192

   0x13    9    0x00 0x1c1459 0x001f 0x0000.789e7891 0x00c242fc 0x0000.000.00000000 0x00000001   0x00000000 1367258238

   0x14    9    0x00 0x1c14b8 0x0003 0x0000.789e601a 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258225

   0x15    9    0x00 0x1c0457 0x0020 0x0000.789e39d3 0x00c242ec 0x0000.000.00000000 0x00000001   0x00000000 1367258195

   0x16    9    0x00 0x1c1326 0x0002 0x0000.789e601d 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258225

   0x17    9    0x00 0x1c0db5 0x001c 0x0000.789e788a 0x00c242fc 0x0000.000.00000000 0x00000001   0x00000000 1367258238

   0x18    9    0x00 0x1bffe4 0x001b 0x0000.789e400d 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258200

   0x19    9    0x00 0x1c16e3 0x0001 0x0000.789e5fd2 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258225

   0x1a    9    0x00 0x1bdbb2 0x0018 0x0000.789e400b 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258200

   0x1b    9    0x00 0x1c1141 0x0019 0x0000.789e453a 0x00c242fa 0x0000.000.00000000 0x00000001   0x00000000 1367258204

   0x1c    9    0x00 0x1bc9a0 0x0013 0x0000.789e788e 0x00c242fc 0x0000.000.00000000 0x00000001   0x00000000 1367258238

   0x1d    9    0x00 0x1c02ef 0x0008 0x0000.789e357d 0x00c242ec 0x0000.000.00000000 0x00000001   0x00000000 1367258192

   0x1e    9    0x00 0x1c0b6e 0x0004 0x0000.789e3250 0x00c242f9 0x0000.000.00000000 0x00000009   0x00000000 1367258190

   0x1f    9    0x00 0x1c00ad 0xffff 0x0000.789e78a1 0x00c242fc 0x0000.000.00000000 0x00000001   0x00000000 1367258238

   0x20    9    0x00 0x1c166c 0x001a 0x0000.789e39dd 0x00c242fa 0x0000.000.00000000 0x00000002   0x00000000 1367258195

   0x21    9    0x00 0x1c160b 0x000f 0x0000.789e3576 0x00c242ec 0x0000.000.00000000 0x00000001   0x00000000 1367258192

EXT TRN CTL::

usn: 7

State# 10 means active transaction.

dba points to starting UNDO block address.

usn: Undo segment number

usn.index.wrap# gives transaction id.

An active transaction 0x0007.00d.001bec6f is available in slot 0x0d which has a dba of 0x00c242fb (12731131 in decimal)

3. Reading UNDO Block:

Identify fileID and blockID:

fileID:

select DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(12731131) from x$dual;

3

blockID:

select DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(12731131) from x$dual;

148219

Dumping block

alter system dump datafile 3 block 148219;

UNDO BLK:
xid: 0x0007.00d.001bec6f seq: 0x41f9 cnt: 0x6   irb: 0x5   icl: 0x0   flg: 0x0000

Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset

—————————————————————————

0x01 0x1f98     0x02 0x1f2c     0x03 0x1d7c     0x04 0x1d10     0x05 0x1ca0
0x06 0x1bfc

*—————————–

* Rec #0x1 slt: 0x0d objn: 0(0x00000000) objd: 0 tblspc: 0(0x00000000)

*       Layer:   5 (Transaction Undo)   opc: 7 rci 0x00
Undo type: Regular undo    Begin trans    Last buffer split: No

Temp Object: No

Tablespace Undo: No

rdba: 0x00000000Ext idx: 0

flg2: 0

*—————————–

uba: 0x00c242fa.41f9.37 ctl max scn: 0x0000.789b7668 prv tx scn: 0x0000.789bb8d7

txn start scn: scn: 0x0000.789e783e logon user: 88

prev brb: 12731116 prev bcl: 0

*—————————–

* Rec #0x2 slt: 0x0d objn: 110769(0x0001b0b1) objd: 110769 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1 rci 0x00
Undo type: Regular undo    User Undo Applied Last buffer split: No

Temp Object: No

Tablespace Undo: No

rdba: 0x00000000

*—————————–

KDO undo record:

KTB Redo

op: 0x04 ver: 0x01
compat bit: 4 (post-11) padding: 1

op: L itl: xid: 0x0012.01c.00322281 uba: 0x0102c5f0.3fa9.0a

                      flg: C—    lkc: 0     scn: 0x0000.789ca3f4

KDO Op code: LKR row dependencies Disabled

xtype: XA flags: 0x00000000 bdba: 0x038180fc hdba: 0x018d64e2

itli: 1 ispac: 0 maxfr: 4858

tabn: 0 slot: 14 to: 0

*—————————–

* Rec #0x3 slt: 0x0d objn: 110769(0x0001b0b1) objd: 110769 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1   rci 0x02
Undo type: Regular undo    User Undo Applied Last buffer split: No

Temp Object: No

Tablespace Undo: No

rdba: 0x00000000

*—————————–

KDO undo record:

KTB Redo

op: 0x02 ver: 0x01
compat bit: 4 (post-11) padding: 1

op: C uba: 0x00c242fb.41f9.02

KDO Op code: URP row dependencies Disabled

xtype: XA flags: 0x00000000 bdba: 0x038180fc hdba: 0x018d64e2

itli: 1 ispac: 0 maxfr: 4858

tabn: 0 slot: 14(0xe) flag: 0x2c lock: 1 ckix: 0

ncol: 9 nnew: 6 size: 0

col 1: [ 7] 78 71 04 1d 13 01 01

col 2: [ 2] c1 13

col 3: [ 1] 80

col 4: [16] 10 e5 00 2e 10 d1 10 d0 10 d7 10 e3 10 db 10 d8

col 5: [174]

10 d0 10 ed 10 d0 10 e0 10 d8 10 e1 00 20 10 d0 00 2e 10 e0 00 2e 00 20 10

de 10 e0 10 dd 10 d9 10 e3 10 e0 10 d0 10 e2 10 e3 10 e0 10 d8 10 e1 00 20

10 e1 10 d0 10 d2 10 d0 10 db 10 dd 10 eb 10 d8 10 d4 10 d1 10 dd 00 20 10

dc 10 d0 10 ec 10 d8 10 da 10 d8 10 e1 00 20 10 e3 10 e4 10 e0 10 dd 10 e1

00 20 10 d2 10 d0 10 db 10 dd 10 db 10 eb 10 d8 10 d4 10 d1 10 d4 10 da 10

e1 00 20 10 d1 10 d0 10 e2 10 dd 10 dc 00 20 10 d2 10 d8 10 dd 10 e0 10 d2

10 d8 00 20 10 de 10 d4 10 e0 10 d0 10 dc 10 d8 10 eb 10 d4 10 e1 00 2e

col 6: [36]

00 54 00 01 04 0c 00 00 00 02 00 00 00 01 00 00 09 07 b0 63 00 10 09 00 00

00 00 00 00 00 00 00 00 00 00 00

*—————————–

* Rec #0x4 slt: 0x0d objn: 89834(0x00015eea) objd: 93214 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1   rci 0x03
Undo type: Regular undo    User Undo Applied Last buffer split: No

Temp Object: No

Tablespace Undo: No

rdba: 0x00000000

*—————————–

KDO undo record:

KTB Redo

op: 0x04 ver: 0x01
compat bit: 4 (post-11) padding: 1

op: L itl: xid: 0x000c.017.000d65d6 uba: 0x0103df2c.22a5.20

                      flg: C—    lkc: 0     scn: 0x0000.789c4694

KDO Op code: LKR row dependencies Disabled

xtype: XA flags: 0x00000000 bdba: 0x03833994 hdba: 0x0181f832

itli: 1 ispac: 0 maxfr: 4858

tabn: 0 slot: 7 to: 0

*—————————–

* Rec #0x5 slt: 0x0d objn: 89834(0x00015eea) objd: 93214 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1 rci 0x04
Undo type: Regular undo   Last buffer split: No

Temp Object: No

Tablespace Undo: No

rdba: 0x00000000

*—————————–

KDO undo record:

KTB Redo

op: 0x02 ver: 0x01
compat bit: 4 (post-11) padding: 1

op: C uba: 0x00c242fb.41f9.04

KDO Op code: LMN row dependencies Disabled

xtype: XA flags: 0x00000000 bdba: 0x03833994 hdba: 0x0181f832

itli: 1 ispac: 0 maxfr: 4858

*—————————–

* Rec #0x6 slt: 0x0d objn: 89703(0x00015e67) objd: 92020 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1 rci 0x05
Undo type: Regular undo    User Undo Applied Last buffer split: No

Temp Object: No

Tablespace Undo: No

rdba: 0x00000000

*—————————–

KDO undo record:

irb points to last UNDO RECORD in UNDO block.

rci points to previous UNDO RECORD. if rci=0, it’s the first UNDO RECORD.

Recovery operation starts from irb and chain is followed by rci until rci is zero.

The transaction starts recovery from UNDO RECORD of 0x5.

4. Reading UNDO Records:

* Rec #0x5 slt: 0x0d objn: 89834(0x00015eea) objd: 93214 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1   rci 0x04
….

* Rec #0x4 slt: 0x0d objn: 89834(0x00015eea) objd: 93214 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1   rci 0x03
….

* Rec #0x3 slt: 0x0d objn: 110769(0x0001b0b1) objd: 110769 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1   rci 0x02
…

* Rec #0x2 slt: 0x0d objn: 110769(0x0001b0b1) objd: 110769 tblspc: 6(0x00000006)

*       Layer: 11 (Row)   opc: 1   rci 0x00
…

objn means object id.

5. Find these objects

The following objects need recovery:

select * from dba_objects

where object_id in (89834,110769);

………………………………………………………..

This problem is Oracle Bug:9857702:

.....
Affects:
Product (Component) Oracle Server (Rdbms)  
Range of versions believed to be affected Versions >= 11.1 but BELOW 12.1  
Versions confirmed as being affected
•11.2.0.1 
•11.1.0.7 
 
Platforms affected Generic (all / most platforms affected)  

Fixed:
This issue is fixed in
•12.1 (Future Release) 
•11.2.0.2 (Server Patch Set) 
•11.1.0.7.8 Patch Set Update 
•11.1.0.7 Patch 40 on Windows Platforms  
.....

6. Workaround:

Recreate objects that need recovery.
Or drop them

Filed under DB, RAC, Real World Scenarios Tagged with 9857702, bug, lock conflict caught and ignored, oracle, oracle 11g, RAC, SUPPLEMENTAL LOGGING, Transaction recovery

Recreate Oracle 11g OEM DBConsole manually for RAC

January 22, 2013 Leave a comment

If you have problems with existing OEM, the best way is to reconfigure it. Here are the steps, how to do it correctly:

$ emca -config dbcontrol db -repos recreate -cluster

STARTED EMCA at Jan 22, 2013 6:04:10 PM
EM Configuration Assistant, Version 11.2.0.3.0 Production
Copyright (c) 2003, 2011, Oracle. All rights reserved.

Enter the following information:
Database unique name: orcl
Service name: orcl
Listener ORACLE_HOME [ /u01/app/11.2.0/grid ]:
Password for SYS user:
Database Control is already configured for the database orcl
You have chosen to configure Database Control for managing the database orcl
This will remove the existing configuration and the default settings and perform a fresh configuration
———————————————————————-
WARNING : While repository is dropped the database will be put in quiesce mode.
———————————————————————-
Do you wish to continue? [yes(Y)/no(N)]: y
Password for DBSNMP user:
Password for SYSMAN user:
Cluster name: oracle-db

!!!Stop here for a while: if you don’t know your cluster name run the following command:

$ su – grid
cemutlo -n

…continuing configuration

Email address for notifications (optional):
Outgoing Mail (SMTP) server for notifications (optional):
ASM ORACLE_HOME [ /u01/app/11.2.0/grid ]:
ASM port [ 1521 ]:
ASM username [ ASMSNMP ]:
ASM user password:
Jan 22, 2013 6:05:02 PM oracle.sysman.emcp.util.GeneralUtil initSQLEngineRemotely
WARNING: Error during db connection : ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

—————————————————————–

You have specified the following settings

Database ORACLE_HOME ……………. /u01/app/oracle/product/11.2.0/db_1

Database instance hostname ……………. Listener ORACLE_HOME ……………. /u01/app/11.2.0/grid
Listener port number ……………. 1521
Cluster name ……………. oracle-db
Database unique name ……………. orcl
Email address for notifications …………… mariam.kupa@gmail.com
Outgoing Mail (SMTP) server for notifications …………… mail.tbilisi.gov.ge
ASM ORACLE_HOME ……………. /u01/app/11.2.0/grid
ASM port ……………. 1521
ASM user role ……………. SYSDBA
ASM username ……………. ASMSNMP

—————————————————————–
———————————————————————-
WARNING : While repository is dropped the database will be put in quiesce mode.
———————————————————————-
Do you wish to continue? [yes(Y)/no(N)]: y
Jan 22, 2013 6:05:18 PM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /u01/app/oracle/cfgtoollogs/emca/orcl/emca_2013_01_22_18_04_10.log.
Jan 22, 2013 6:05:20 PM oracle.sysman.emcp.util.PortManager isPortInUse
WARNING: Specified port 5540 is already in use.
Jan 22, 2013 6:05:20 PM oracle.sysman.emcp.util.PortManager isPortInUse
WARNING: Specified port 5520 is already in use.
Jan 22, 2013 6:05:20 PM oracle.sysman.emcp.util.PortManager isPortInUse
WARNING: Specified port 1158 is already in use.
Jan 22, 2013 6:05:20 PM oracle.sysman.emcp.util.DBControlUtil stopOMS
INFO: Stopping Database Control (this may take a while) …
Jan 22, 2013 6:06:01 PM oracle.sysman.emcp.EMReposConfig invoke
INFO: Dropping the EM repository (this may take a while) …
Jan 22, 2013 6:08:00 PM oracle.sysman.emcp.EMReposConfig invoke
INFO: Repository successfully dropped
Jan 22, 2013 6:08:01 PM oracle.sysman.emcp.EMReposConfig createRepository
INFO: Creating the EM repository (this may take a while) …
Jan 22, 2013 6:11:39 PM oracle.sysman.emcp.EMReposConfig invoke
INFO: Repository successfully created
Jan 22, 2013 6:11:44 PM oracle.sysman.emcp.EMReposConfig uploadConfigDataToRepository
INFO: Uploading configuration data to EM repository (this may take a while) …
Jan 22, 2013 6:12:17 PM oracle.sysman.emcp.EMReposConfig invoke
INFO: Uploaded configuration data successfully
Jan 22, 2013 6:12:18 PM oracle.sysman.emcp.EMDBCConfig instantiateOC4JConfigFiles
INFO: Propagating /u01/app/oracle/product/11.2.0/db_1/oc4j/j2ee/OC4J_DBConsole_oracle-node1_orcl to remote nodes . ..
Jan 22, 2013 6:12:20 PM oracle.sysman.emcp.EMDBCConfig instantiateOC4JConfigFiles
INFO: Propagating /u01/app/oracle/product/11.2.0/db_1/oc4j/j2ee/OC4J_DBConsole_oracle-node2_orcl to remote nodes . ..
Jan 22, 2013 6:12:26 PM oracle.sysman.emcp.EMAgentConfig deployStateDirs
INFO: Propagating /u01/app/oracle/product/11.2.0/db_1/oracle-node1_orcl to remote nodes …
Jan 22, 2013 6:12:28 PM oracle.sysman.emcp.EMAgentConfig deployStateDirs
INFO: Propagating /u01/app/oracle/product/11.2.0/db_1/oracle-node2_orcl to remote nodes …
Jan 22, 2013 6:12:31 PM oracle.sysman.emcp.util.DBControlUtil secureDBConsole
INFO: Securing Database Control (this may take a while) …
Jan 22, 2013 6:13:00 PM oracle.sysman.emcp.util.DBControlUtil startOMS
INFO: Starting Database Control (this may take a while) …
Jan 22, 2013 6:13:20 PM oracle.sysman.emcp.EMDBPostConfig performConfiguration
INFO: Database Control started successfully
Jan 22, 2013 6:13:20 PM oracle.sysman.emcp.EMDBPostConfig performConfiguration
INFO: >>>>>>>>>>> The Database Control URL is https://oracle-node1.mr.gov.ge:1158/em <<<<<<<<<<<
Jan 22, 2013 6:13:22 PM oracle.sysman.emcp.EMDBPostConfig showClusterDBCAgentMessage
INFO:
**************** Current Configuration ****************
INSTANCE NODE DBCONTROL_UPLOAD_HOST
———- ———- ———————

orcl oracle-node1 oracle-node1.mr.gov.ge
orcl oracle-node2 oracle-node1.mr.gov.ge
Jan 22, 2013 6:13:22 PM oracle.sysman.emcp.EMDBPostConfig invoke
WARNING:
************************ WARNING ************************

Management Repository has been placed in secure mode wherein Enterprise Manager data will be encrypted. The encry ption key has been placed in the file: /u01/app/oracle/product/11.2.0/db_1/oracle-node1_orcl/sysman/config/emkey.o ra. Ensure this file is backed up as the encrypted data will become unusable if this file is lost.

***********************************************************
Enterprise Manager configuration completed successfully
FINISHED EMCA at Jan 22, 2013 6:13:22 PM

For me to access OEM the URL is https://oracle-node1.mr.gov.ge:1158/em

Good Luck!

Filed under DB, EM, RAC Tagged with cluster, DBConsole, enterprise manager, oracle, RAC, repository

← Older posts

Newer posts →

DBA Knowledge Base

Restart Exadata storage cell service without affecting ASM

Rebuild RAC clusterware without deleting data

RAC: root.sh | CRS-2672: Attempting to start ‘ora.storage’ | ORA-01017: invalid username/password

error: package cvuqdisk is not installed

Oracle Clusterware is restarting database “” | shut down instance “” of database “” | start up instance “” of database “”

SESSIONS WAITING ON INACTIVE TRANSACTION BRANCH, GLOBAL HASH COLLISION 12c, 11g Oracle RAC(distributed transaction)

Add/Drop ASM disks to DISKGROUP on RAC(or Standalone)

How to reset SYSASM password

Transaction recovery: lock conflict caught and ignored

Recreate Oracle 11g OEM DBConsole manually for RAC

Database Administrator Jobs

Recent Posts

Archives

Categories

Meta

Subscribe to Blog via Email

USEG	INSTID	STATUS	XID_USN	XID_SLOT	XID_SEQ	UNDOBLOCKS	TXSTATUS
_SYSSMU7_881277423$	1	3	7	13	1829999	1	ACTIVE
_SYSSMU8_4204495590$	1	3	8	32	3045564	1	ACTIVE
_SYSSMU10_1314081219$	1	3	10	3	11844457	1	ACTIVE