How to find remote session executing over a database link

Select /*+ ORDERED */
substr(s.ksusemnm,1,10)||'-'|| substr(s.ksusepid,1,10) "ORIGIN",
substr(g.K2GTITID_ORA,1,35) "GTXID",
substr(s.indx,1,4)||'.'|| substr(s.ksuseser,1,5) "LSESSION" ,
substr(decode(bitand(ksuseidl,11),1,'ACTIVE',0,
decode(bitand(ksuseflg,4096),0,'INACTIVE','CACHED'),
2,'SNIPED',3,'SNIPED', 'KILLED'),1,1) "S",
substr(event,1,10) "WAITING"
from x$k2gte g, x$ktcxb t, x$ksuse s, v$session_wait w
where g.K2GTDXCB =t.ktcxbxba
and g.K2GTDSES=t.ktcxbses
and s.addr=g.K2GTDSES
and w.sid=s.indx;

GTXID is the same on both databases.

################################### Sample output ###################################

##Destination

3   LBREPDB01-51715  LBREP.aa2c0b4f.94.11.4694801  5447.62951   I   SQL*Net me

##Source

2   LB\MARIAMI-41196:4058  LBREP.aa2c0b4f.94.11.4694801 87.36231  I  SQL*Net me

More Details:

SID – 87
SERIAL – 36231

Proxy User Authentication: Create DBLINK in another schema

Problem

We need to create private database link in another schema. But we don’t know the password for this user.

Solution (Proxy user)

For example, my username is marik and database link should be created in dbcleanup’s schema.

SQL> alter user DBCLEANUP grant connect through marik;
SQL> connect marik[DBCLEANUP]/123@orcl
SQL> show user
USER is "marik[DBCLEANUP]"
SQL> create database link MYLINK 
connect to LINKUSER identified by "password" using 'ORCL2';

Proxy users can be identified using the PROXY_USERS view.

SQL> select * from proxy_users;

The proxy authentication can be revoked using the following command.

SQL> alter user DBCLEANUP revoke connect through marik ;

RMAN: Displaying current backup progress

To check the progress of your current RMAN backup use this script:

#This is my favorite script

select recid
 , output_device_type
 , dbsize_mbytes
 , input_bytes/1024/1024 input_mbytes
 , output_bytes/1024/1024 output_mbytes
 , (output_bytes/input_bytes*100) compression
 , (mbytes_processed/dbsize_mbytes*100) complete
 , to_char(start_time + (sysdate-start_time)/(mbytes_processed/dbsize_mbytes),'DD-MON-YYYY HH24:MI:SS') est_complete
 from v$rman_status rs
 , (select sum(bytes)/1024/1024 dbsize_mbytes from v$datafile) 
 where status='RUNNING'
 and output_device_type is not null

Alert log table x$dbgalertext

Prior to 11g you had to create external table to query alert log information.

In 11g Oracle introduced X$DBGALERTEXT, which is mapped to the alert log file located at  $ORACLE_BASE/diag/rdbms/{DB_UNIQUE_NAME}/{SID}/alert/log_XX.xml.

Note that table is located in SYS schema and you need a permission to see it’s content.

So, please keep in mind this table name, it is really useful to search errors in alert log.

Send Oracle Audit to rsyslog

In our database there is turned on auditing on some operations and audit records go to OS.

SYS> show parameter audit_file_dest

NAME                TYPE        VALUE
------------------ ----------- ------------------------------
audit_file_dest  string       /u01_log/audit/orcl

SYS > show parameter audit_trail

NAME        TYPE         VALUE
------------- ----------- -----------
audit_trail string        OS

Our security administrators are using SIEM to monitor suspicious activities and they want database to send audit records to this third party tool.

I thought that I could somehow indicate directory “/u01_log/audit/orcl” from where *.aud files would be uploaded to SIEM, but I was wrong. Some tools may be able to use these *.aud files but not SIEM and let’s configure our database to be able to send audit records to it.

1. Connect to a database instance as sysdba user

SQL> connect / as sysdba

2. Set audit trail to OS

SQL> alter system set audit_trail=OS;

3. Enable auditing for system users if you need to audit activities of sys user(optional)

SQL> alter system set audit_sys_operations=TRUE;

4. Set rsyslog facility and severity(needs database restart)

SQL> alter system set audit_syslog_level=local5.info scope=spfile sid='*';

5.  Restart database

SQL> shutdown immediate;
SQL> startup;

6. Edit rsyslog.conf file

#Saving oracle database audit records
local5.info          /u01_log/audit/RSYSLOG/dbaudit.log
#Send oracle database audit trail to remote rsyslog server
local5.info          @192.168.0.15

7. Restart rsyslog service

# service rsyslog restart
Shutting down system logger: [ OK ]
Starting system logger: [ OK ]

8. It is better to limit the size for audit log, or it may fill the space:

# vi /etc/logrotate.d/oracle.audit

#Created by MariK

/u01_log/audit/RSYSLOG/dbaudit.log {
 rotate 3
 compress
 missingok
 notifempty
 size 40G
 postrotate
 service rsyslog restart
 endscript
}

To check the syntax run :

# logrotate /etc/logrotate.d/oracle.audit

It will say if you have an error. If syntax is ok then output is nothing.

Configure resource manager to kill sessions automatically after maximum idle time is passed

Problem:

Our applications are opening too many connections and moreover are not closing them at all 🙂 .  Because of this to many sessions stay idle and after IDLE_TIME is passed they become SNIPED.
As you know SNIPED session still holds session counter and it is completely cleaned out just after SNIPED session tries to execute something(it of course errors out). But if SNIPED session never tries to execute anything then the session stays forever in database.  And after a while database throws ORA-00018 maximum number of sessions exceeded.

My old solution: 

Created script file /u01/app/oracle/dba_scripts/kill_sniped.sh, with content:

#!/bin/ksh

#Written by MK

cd /u01/app/oracle/dba_scripts
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1
export ORACLE_SID=orcl1
export ORACLE_USER=oracle
$ORACLE_HOME/bin/sqlplus -s / as sysdba <<EOF

SET SERVEROUTPUT ON SIZE 1000000;
CALL DBMS_JAVA.SET_OUTPUT(1000000);

DECLARE
snum NUMBER;
BEGIN
FOR i IN (SELECT ‘alter system kill session ”’||a.SID||’,’||a.serial#||’,@’||inst_id||”’ immediate’ killSniped FROM gv\$session a
WHERE (a.status=’SNIPED’ or a.status=’KILLED’)
and a.username is not null
)
LOOP
begin
execute immediate i.killSniped;
exception when others then null;
end;
END LOOP;
END;
/
EOF

You will easily guess what does it do. It finds sessions with status SNIPED and KILLED and executes alter system kill session script for them.

Created crontab entry:

$ crontab -l
*/10 * * * * /u01/app/oracle/dba_scripts/kill_sniped.sh > /u01/app/oracle/dba_scripts/logs/kill_sniped.log 2>&1

Script was working fine about one year, without any problem 🙂 but yesterday my script was not able to handle all of these sessions and it was killing slower than SNIPED sessions were appearing in our database so database raised ORA-00018 error.

New and better solution:

Created consumer group , set plan directive with MAX_IDLE_TIME 900sec for this group and moved problematic app user in this group.

After MAX_IDLE_TIME is passed user session is automatically killed by resource manager and it is the quickest.

BEGIN
DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();

DBMS_RESOURCE_MANAGER.CREATE_PLAN(PLAN => ‘RESTRICTIVE_PLAN’, COMMENT => ”);

DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(CONSUMER_GROUP => ‘RSGROUP’, COMMENT =>”);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => ‘RESTRICTIVE_PLAN’
, GROUP_OR_SUBPLAN => ‘RSGROUP’
, COMMENT => ”
, MAX_IDLE_TIME => 900);

DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();

DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP(GRANTEE_NAME => ‘RSAPP’
, CONSUMER_GROUP => ‘RSGROUP’
, GRANT_OPTION => FALSE);

DBMS_RESOURCE_MANAGER.SET_INITIAL_CONSUMER_GROUP( ‘RSAPP’, ‘RSGROUP’);

DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_USER( ‘RSAPP’, ‘RSGROUP’);

DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
end;

ALTER SYSTEM SET RESOURCE_MANAGER_PLAN=’RESTRICTIVE_PLAN’;

Note: RSAPP user had IDLE_TIME 15min in its profile, that is why I have set MAX_IDLE_TIME to 900sec(15min). Be careful for this decision , you should set this value appropriate to profile IDLE_TIME value. Or first discuss it with developers, they may not want you to kill their app session after 15min.. but after 20min.

To check how many sessions were killed by resource manager check:

SELECT IDLE_SESSIONS_KILLED
FROM V$RSRC_CONSUMER_GROUP
WHERE NAME=’RSGROUP’;

Hope post was useful. 🙂


Best Practices for Configuring Redo Transport for Data Guard and Active Data Guard 12c

I have three standby databases db01 located in HQ, db02 located in DR, db03 located in DR and it should be late standby with delay 15 days.

My task is to configure the following standby architecture:

When db01 is primary it should send logs in SYNC mode to db02 and at the same time db02 should send logs in ASYNC mode to db03.

When db02 is in primary role it should send logs in SYNC mode to db01 and at the same time db01 should send logs in ASYNC mode to db03.

So db01 and db02 database should be in sync mode with real-time apply and db03 should be late standby with delay 15 days and it should receive logs from standby database in ASYNC mode.

I have underlined the above sentence, because for now this cannot be achieved with cascading standby. Read bellow…

I have found very useful documentation so here is the link: http://www.oracle.com/technetwork/database/availability/broker-12c-transport-config-2082184.pdf

It introduces data broker new feature that is available in 12c. Property RedoRoutes.

So in my broker configuration I will set RedoRoutes property by the following way:

DGMGRL> show configuration

Configuration – DB_HQ_DR

Protection Mode: MaxPerformance
Members:
db01 – Primary database
db02 – Physical standby database
db03- Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> DGMGRL> edit database ‘db01′ set property RedoRoutes='(LOCAL:db02)(db02:db03)’;
Property “redoroutes” updated
DGMGRL> edit database ‘db02′ set property RedoRoutes='(LOCAL:db01)(db01:db03)’;
Property “redoroutes” updated

Normally, delayed apply can be configured by property DelayMins:

DGMGRL> edit database ‘db03′ set property DelayMins=’21600’;
Property “delaymins” updated

21600 is 15 days.

BUT, I must tell you a bad news:  according to this article https://docs.oracle.com/database/121/SBYDB/log_arch_dest_param.htm#SBYDB01105

“The DELAY value that a cascaded standby uses is the value that was set for the LOG_ARCHIVE_DEST_n parameter on the primary that shipped the redo to the cascading standby.”

So I cannot have the following architecture:

db01 —-real_time_apply—-db02—-delayed_apply—–db03

because db03 will take delay parameter from db02 that is no delay.

Async/Sync mode can be configured by property LogXptMode:

DGMGRL> edit database ‘db01′ set property LogXptMode=’SYNC’;
DGMGRL> edit database ‘db02′ set property LogXptMode=’SYNC’;
DGMGRL> edit database ‘db03′ set property LogXptMode=’ASYNC’;

If I want to achieve my goal I should not use cascading standby but primary must be the sender for db02(with DelayMins=0) and db03(with DelayMins=21600)

I hope it helps.

Rebuild RAC clusterware without deleting data

As I have mentioned in my previous posts, I was applying interim patch on database which had post installation script (# <GI_HOME>/crs/install/rootcrs.pl -postpatch) .
The post script failed with permission denied error on ohasd file and left clusterware in a messy situation.

I have opened SR on metalink and one of their support after a huge amount of time of talking and troubleshooting together says:

“We do not know what happened or what steps you have taken to reach this situation. You should open an SR with us before you deconfigure the node.
Please, do bare metal restore as it is recommended by previous engineer.
Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment ( Doc ID 1084360.1 )”

This Bare Metal Restore is like wiping everything and after that I have had to configure RAC, DATAGUARD and everything from scratch. <<–Don’t like such solutions, this is like “if your windows works slowly then reinstall it”.. for windows this might be really true 🙂 nothing than reinstall helps 😀 but on Linux/Oracle you must troubleshoot first.
So I created another SR with another error(Errors at this time were lot) and for the second time I was lucky.
I was working 24/7 with support, the engineers were shifting. Three different engineers worked at different times on this SR.
I want to mention one “Venkata Pradeep Kumar” Oracle support engineer , he is so clever he helped me a lot and we rescued the system !:)

I want to share the steps with you , it should interesting.

Problem:

After applying patch post script on first node (which failed), clusterware on first node was not starting. At this time second node was fine.
I have deconfigured clusterware (write this step in solution section) on first node and it started but with some problems about oc4j service.

2016/09/27 06:56:15 CLSRSC-1003: Failed to start resource OC4J
2016/09/27 06:56:16 CLSRSC-287: FirstNode configuration failed

I have deconfigured clusterware on second node also and tried to run root.sh, but it said that root.sh could not be run because it was not successful on first node. 😦

So, until root.sh script is not completely successful on first node you should not deconfigure it on second. But if you did it do not panic if you have OCR backup.

Solution:

# Deconfigure crs on problematic node , note you may help the different solution , by just configuring one node. In my situation all nodes became problematic.
# Also please be careful, below steps assumes that you have separate group for OCR. Datafiles must be on different group. Or diskgroup will be wiped.

# From root on both nodes node1 , node2

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -deconfig -force

# run root.sh on node1 , it may not be completely successful

/u01/app/12.1.0.2/grid/root.sh

# We need to find a good OCR backup , for me it is week.ocr which was taken automatically in 2016/09/15 09:12:28.
# Patch was applied at 10:00AM in 2016/09/25. So we need week.ocr it is before patching.

[root@lbdm01-dr-adm grid]# ocrconfig -showbackup

lbdm02-dr-adm 2016/09/27 02:35:23 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup00.ocr 3351897854
lbdm02-dr-adm 2016/09/26 15:44:53 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup01.ocr 3351897854
lbdm02-dr-adm 2016/09/26 11:44:52 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup02.ocr 3351897854
lbdm02-dr-adm 2016/09/27 02:35:23 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/day.ocr 3351897854
lbdm01-dr-adm 2016/09/15 09:12:28 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/week.ocr 854493477
lbdm02-dr-adm 2016/09/25 15:29:18 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup_20160925_152918.ocr 3351897854
lbdm02-dr-adm 2016/09/25 10:34:56 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup_20160925_103456.ocr 2725022894
lbdm01-dr-adm 2015/07/29 19:46:28 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup_20150729_194628.ocr 854493477
lbdm01-dr-adm 2015/07/29 19:46:27 /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup_20150729_194627.ocr 854493477

# Ensure that no process left
# node 1

crsctl stop crs -f
ps -ef|grep “/u01/app”

# if here is anything kill them!

#Start clusterware in exclusive mode with no ocr on node 1

crsctl start crs -excl -nocrs

#Restore OCR on node 1

ocrconfig -restore /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/week.ocr
ocrcheck

# Stop crs on node 1

crsctl stop crs -f
crsctl start crs

# Check the status

crsctl status res -t

# It should be OK

# Do the same steps on node 2 from root, but it may fail

/u01/app/12.1.0.2/grid/root.sh

# Failed

ORA-15160: rolling migration internal fatal error in module SKGXP,valNorm:not-native
. For details refer to “(:CLSN00107:)” in “/u01/app/oracle/diag/crs/lbdm02-dr-adm/crs/trace/ohasd_oraagent_oracle.trc”.
CRS-2883: Resource ‘ora.asm’ failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.
2016/09/28 09:11:00 CLSRSC-117: Failed to start Oracle Clusterware stack

# deconfig on both nodes
# node1 , node2

 /u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -deconfig -force

#and run agin root.sh
# node 1

/u01/app/12.1.0.2/grid/root.sh

# It was completelly successful.

# On second there is still problem

# Read the following document ORA-15160: rolling migration internal fatal error in module SKGXP,valNorm:not-native (NOTE 1682591.1)

# Here problem was on protocols that was used by asm and rdbms.
# rdbms is using rds protocol and asm is using udp, see Oracle Clusterware and RAC Support for RDS Over Infiniband (NOTE 751343.1)
# problem was in libraries and we should relink them with right protocols
# As the ORACLE_HOME/GI_HOME owner, stop all resources (database, listener, ASM etc) that’s running from the home. When stopping database, use NORMAL or IMMEDIATE option.

# From problemtic node , where asm or database is not starting.

crsctl stop crs
ps -ef|grep d.bin
ps -ef|grep “/u01/app”

# Kill if any process left

# If relinking Grid Infrastructure (GI) home, as root, unlock GI home: <GI_HOME>/crs/install/rootcrs.pl -unlock

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -unlock

# As the ORACLE_HOME/GI_HOME owner, go to ORACLE_HOME/GI_HOME and cd to rdbms/lib
# As the ORACLE_HOME/GI_HOME owner, issue “make -f ins_rdbms.mk <protocol write here> ioracle”
#For rdbms

[root@lbdm02-dr-adm lib]# su – oracle
[oracle@lbdm02-dr-adm ~]$ cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ipc_rds ioracle

#For asm

. oraenv
+ASM2
[oracle@lbdm02-dr-adm ~]$ cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ipc_g ioracle

# From root

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -patch

# The last step should have configure clusterware also. And everything should be fine. And you can sleep now. 🙂

 

ORA-01184: logfile group 5 already exists

Command:  

ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 11 (‘+DATA’) size 52428800;

Error:

ORA-01184: logfile group 5 already exists

Troubleshoot:

SYS> SELECT thread#, group#
FROM gv$log;

THREAD# GROUP#
———- ———-
1                 1
1                 2
1                 3
1                 4
1                 5
1                 6
1                 7
1                 8
1                 9
1                 10

As we see there are just 10 groups.

Check standby redo logs:

SYS> SELECT group#
FROM v$standby_log;

GROUP#
———-
11

So we have standby redo log with group id 11.

Solution: 

Add logfile group with number more than 11.

I prefer to save numbers for standby logs and add new log with number 22(10 redo log groups+(10+1) recommended number of standby logs+ 1)

SYS> ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 22 (‘+DATA’) SIZE 52428800;

Database altered.

Install Oracle 12.1.0.1 Error in invoking target ‘links proc gen_pcscfg procob’ of makefile …ins_precomp.mk

If you are installing 12.1.0.1 there are some problems. Oracle 12c (12.1.0.2) installs cleanly.

 

During the linking phase, you will see the following error.

Error in invoking target 'links proc gen_pcscfg procob' of makefile
'/u01/app/oracle/product/12.1.0.2/db_1/precomp/lib/ins_precomp.mk'.
See
'/u01/app/oraInventory/logs/installActions2014-04-26_08-07-04PM.log'
for details.

To fix it, do the following:

rm -rf $ORACLE_HOME/lib/stubs/*
cp $ORACLE_HOME/rdbms/lib/env_rdbms.mk $ORACLE_HOME/rdbms/lib/env_rdbms.mk.orig

Perform the following modifications to the “$ORACLE_HOME/rdbms/lib/env_rdbms.mk” file.

# Line 176
# FROM:
LINKTTLIBS=$(LLIBCLNTSH) $(ORACLETTLIBS) $(LINKLDLIBS)
# TO  :
LINKTTLIBS=$(LLIBCLNTSH) $(ORACLETTLIBS) $(LINKLDLIBS) -lons

# Line 279-280
# FROM:
LINK=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS) $(COMPSOBJS)
LINK32=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS32) $(COMPSOBJS)
# TO  :
LINK=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS) $(COMPSOBJS) -Wl,--no-as-needed
LINK32=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS32) $(COMPSOBJS) -Wl,--no-as-needed

# Line 3041-3042
# FROM:
TG4PWD_LINKLINE= $(LINK) $(OPT) $(TG4PWDMAI) \
        $(LLIBTHREAD) $(LLIBCLNTSH) $(LINKLDLIBS)
# TO  :
TG4PWD_LINKLINE= $(LINK) $(OPT) $(TG4PWDMAI) \
        $(LLIBTHREAD) $(LLIBCLNTSH) $(LINKLDLIBS) -lnnz12

Click the “Retry” button.