What’s in a Name (renaming ASM Diskgroup in 11.2)

Introduction:

Once again  it showed that the old  quote .. what’s  in a name .. is still valid even in an Oracle 11.2.0.3.0 Oracle restart environment  so credits to William S. for that. Well  you might wonder where is he heading with the article with title like this again. Frankly i had set up an Oracle restart environment where during the Installation  you are asked about one ( or more )  Asm Disks to be added to a Diskgroup that will hold the Spfile of the ASM instance.  During that install i made a typo in the name of the Diskgroup and only noticed of course after finishing install.

I searched the Web and came across a real cool scenario which i have explored and guess what  it works !  Below you find the steps i have performed.   Environment was: 11.2.0.3 Enterprise Edition on Red  Hat Linux.

Details:

On the web  I found bloggers sharing the idea that you can indeed rename a diskgroup in 11gr2. The tool for that is on the OS level and is part of you GridInfrastructure software. So it could very well look pretty much like this:

 which renamedg
/opt/crs/product/11203_ee_64/crs/bin/renamedg

Anyhow i had a bit of a struggle with the syntax  but in the end this is the command that does the magic. Note first few attempts failed since  I did not add the asm_diskstring in the command line . Be wise and please use it in a modified if necessary way from the beginning:

 

Step 1 Preparations

As root i stopped the Oracle restart environment :

 ./crsctl stop has

Step 2 Renaming the Diskgroup

Syntax

renamedg phase=both dgname=DUMM_YDATA newdgname=DUMMY_DATA verbose=true asm_diskstring=’/dev/mapper/asm*p1′

As you can see i am renaming my diskgroup DUMM_YDATA to DUMMY_DATA  this way.

 

Output coming from that was :
NOTE: No asm libraries found in the system
Parsing parameters..
Parameters in effect:
Old DG name       : DUMM_YDATA
New DG name          : DUMMY_DATA
Phases               :
Phase 1
Phase 2
Discovery str        : /dev/mapper/asm*p1
Clean              : TRUE
Raw only           : TRUE
renamedg operation: phase=both dgname=DUMM_YDATA newdgname=DUMMY_DATA verbose=true asm_diskstring=/dev/mapper/asm*p1
Executing phase 1
Discovering the group
Performing discovery with string:/dev/mapper/asm*p1
Identified disk UFS:/dev/mapper/asm-spfile01p1 with disk number:0 and timestamp (33001295 453578752)
Checking for hearbeat...
Re-discovering the group
Performing discovery with string:/dev/mapper/asm*p1
Identified disk UFS:/dev/mapper/asm-spfile01p1 with disk number:0 and timestamp (33001295 453578752)
Checking if the diskgroup is mounted or used by CSS
Checking disk number:0
Generating configuration file..
Completed phase 1
Executing phase 2
Looking for /dev/mapper/asm-spfile01p1
Modifying the header
Completed phase 2

Step 3 Where to find the Spfile for ASM in Grid Infra

 

With this first success it was time to move on to step 2. I modified the information in the cluster layer with this once i had looked up the exact name of the spfile with ASMCMD:

Syntax
srvctl modify asm  -p +DUMMY_DATA/asm/asmparameterfile/registry.253.843232019

 

Step 4 Restarted the Has deamon:

As Root:

 ./crsctl start has

 

Step 5 Final activities: Check and cleanup:

 

SQL> show parameter spfile

NAME               TYPE VALUE
------------ ----------- ------------------------------
spfile       string      +DUMMY_DATA/asm/asmparameterfile/registry.253.843232019

 

Checked also the information in the Cluster:

oracle@mysrvr10hr:/opt/oracle [+ASM]# crsctl status resource -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DUMMY_DATA.dg
               ONLINE  ONLINE       mysrvr10hr
ora.DUMM_YDATA.dg
               ONLINE  OFFLINE      mysrvr10hr
ora.LISTENER.lsnr
               ONLINE  ONLINE       mysrvr10hr
ora.asm
               ONLINE  ONLINE       mysrvr10hr                 Started
ora.ons
               OFFLINE OFFLINE      mysrvr10hr
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       mysrvr10hr
ora.diskmon
      1        OFFLINE OFFLINE
ora.evmd
      1        ONLINE  ONLINE       mysrvr10hr

## checked in asmcmd

asmcmd
[Option  -p will be used ]
ASMCMD [+] > spget
+DUMMY_DATA/asm/asmparameterfile/registry.253.843232019

Ok last step, clean up since the cluster still knows about the Diskgroup with the wrong name:

srvctl remove diskgroup -g DUMM_YDATA

 

Mission completed and again a smiling dba here. Please Test before do !

 

Happy reading,

 

Mathijs

Moving from the Asmlib to multi-pathing / creating a shared spfile in ASM – Grid Infra

Introduction

No this is not a note in defense or offense for the ASMLIB. It is merely a registration of the fact that in the past Real application clusters  have been setup with ASMLIB  and these Clusters have turned into trouble(d) environments recently due to wrong allocation of ASMLIB Disks. Since the company standards have changed over the year in favor of the Multi-pathing  it has been decided that the troubled cluster would  have to start using the Multi-pathing . For a general overview of ASMLIB and Multi-pathing :  http://oracle.su/docs/11g/server.112/e10500/asmprepare.htm.

Also it has been decided to alter the standards a that ruled couple of years  ago ( implementing Rac clusters with Asm with ASMLIB) to now a days company standards which embraced the Multi-pathing. So far no one ever did this kind of convert on live systems so it would need a proper scenario a dress-rehearsal test before we would implement this on production environment .

In general terms it was explained to me that ASMLIB is like a layer on TOP of the Multi-Pathing  presenting Disks in ASM with a specific labeling. The asm_diskstring  in an ASMLIB environment would point to those disks with  ‘ORCL:1′ . Where the asm_diskstring would be like  the key which disks should be discovered by the ASM instance to work with. The scenario turned out to be remarkable simple to the Oracle Dbas. Bottom line was that even when the disks have been labeled for ASMLIB  if we would alter the asm_diskstring to point to the disks via ‘/dev/mapper/asm*p1′ that would already do the job since the ASMLIB labels would simply be ignored by multi-path. But as always do not  try this at home unless  you alter it in a controlled way as described below.

Details:

As a preparation i ran this scenario on a preproduction box  and with  that being a good dress-rehearsal implemented this on the production as well.  More details will be described below:

Plan of Approach
1 alter system set asm_diskstring = ‘/dev/mapper/asm*p1′ scope = spfile; Oracle
1 Stopping Crs on all 4 Nodes Oracle
1 Started Crs on mysrvr33r Oracle
1 Started Crs on mysrvr34r Oracle
1 Started Crs on mysrvr35r Oracle
1 Started Crs on mysrvr36r Oracle
2 Stopping Crs on all 4 Nodes Oracle
2 Removing ASMLIB RPMS 33r 34r 35r 36r Linux
2 Server reboot of mysrvr33r Linux
2 Check Cluster on mysrvr33r , all up Oracle
2 Restart of mysrvr34r 35r 36r Linux
2 Check  33r 34r 35r 36r Oracle
3 Stopping Crs on 34r 35r 36r Nodes Oracle
3 Create spfile in asm diskgroup on 33r Oracle
3 Restart of  CRS  on 33r Oracle
3 Check gpnp profile / spfile on 33r Oracle
3 Restart of crs on mysrvr34r 35r 36r Oracle
3 Check gpnp profile / spfile on 33r – 36r Oracle
  1. I have used three waves of activities . In the first wave As as preparation i altered the asm_diskgstring in the spfile of the asm instances already ( of course not yet active till after the next restart ) .  After that i stopped the full cluster on all 4 nodes and started with my first node to see the effects ( and to see the asm instance and database instances to be started ) all nodes had been restarted. And it showed all was running well.
  2. In this action i worked together with the Linux admins. The boxes in scope running on RedHat  they wanted to get rid of the ASMLIB in the kernel as well so in the second wave i shutdown the full cluster one more time, they removed the rpms from the Linux and rebooted  the first box and all was well after that. So after my checks the other three boxes were started in parallel and the end result was a happy  and running cluster again !
  3. In wave three i had to fix some old hurt that these 4 boxes were still working with local spfiles instead of a shared spfile in the asm instance. When this Cluster was build by me some three years ago it was born as a 11.1 cluster environment and it had been set up with local copies of  the spfile:

Setting up a (lost) Spfile in ASM in a Grid infrastructure environment:

##  First step  i thought would be to stop the cluster since i did that on another scenario. Much to my surprise i was recommended to do this in the running environment so i  did  this after i prepared a valid init.ora:

 SQL> create spfile ='+CLUSTERDATA'  from pfile = '/opt/oracle/+ASM1/admin/pfile/init+ASM1.NEW' ;
create spfile ='+CLUSTERDATA'  from pfile = '/opt/oracle/+ASM1/admin/pfile/init+ASM1.NEW'
*
ERROR at line 1:
ORA-29780: unable to connect to GPnP daemon [CLSGPNP_ERR]

## As you can see that did not work.  Investigation brought  following note (Environment Variable ORA_CRS_HOME MUST be UNSET in 11gR2/12c GI (Doc ID 1502996.1) . I checked and indeed that environment Variable was present in my  .profile:

#### bad practice to have ORA_CRS_HOME  set in your .profile if you are using GI so we unset it !!!!!!!!!!!!!!!!!!!

oracle@mysrvr33r:/opt/oracle/+ASM1/admin/pfile [+ASM1]# unset  ORA_CRS_HOME
oracle@mysrvr33r:/opt/oracle/+ASM1/admin/pfile [+ASM1]# echo $ORA_CRS_HOME
## after that  the create of the spfile worked.
SQL> create spfile ='+CLUSTERDATA'  from pfile = '/opt/oracle/+ASM1/admin/pfile/init+ASM1.NEW' ;
File created.

##  Lets do some checks  ( in asmcmd)  it showed:

ASMCMD [+] > spget
+CLUSTERDATA/mysrvr3_cluster/asmparameterfile/registry.253.841967299

## Second check using the gpnptool:

gpnptool get

Warning: some command line parameters were defaulted. Resulting command line:          /opt/crs/product/112_ee_64/crs/bin/gpnptool.bin get -o- http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:gpnp=”http://www.grid-pnp.org/2005/11/gpnp-profile” xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp-profile” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd” ProfileSequence=”58″ ClusterUId=”2b7266b0d5797f65ff0fcf4c8e7931d6″ ClusterName=”mysrvr3_cluster” PALocation=””><gpnp:Network-Profile>SPFile=”+CLUSTERDATA/mysrvr3_cluster/asmparameterfile/registry.253.841967299″/><orcl:OCR-Profile id=”ocr” OCRId=”372102285″/>http://www.w3.org/2000/09/xmldsig#“><ds:CanonicalizationMethod Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#“/><ds:SignatureMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1“/><ds:Reference URI=””><ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature“/><ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#“> <InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#” PrefixList=”gpnp orcl xsi”/>http://www.w3.org/2000/09/xmldsig#sha1“/>3Tjuts50Gi92r42OMa4Pb17PiYc=B941IphE6D1FqVhc1u/+NwhAM3QXbBRiMT0plxhXyptUnj4mu1T1UFP/5yG+yBIzblquOy4aqxNBthMy7aQW0lyS4QfMZbjWYhYH2nvbrnnyqY/ZoYXOY0QaAYciboALXxJxCzup6ZGxCnsgtT8G/b08z679j8NlMvykdE2pmWY=

## in asmcmd:

ls -l +CLUSTERDATA/mysrvr3_cluster/asmparameterfile/registry.253.841967299

[Option  -p will be used ]

Type              Redund  Striped  Time             Sys  Name

ASMPARAMETERFILE  UNPROT  COARSE   MAR 11 23:00:00  Y    registry.253.841967299

## alert log shows:

Tue Mar 11 23:48:18 2014

NOTE: updated gpnp profile ASM diskstring: /dev/mapper/asm*p1

NOTE: updated gpnp profile ASM diskstring: /dev/mapper/asm*p1

## looks good  on other servers as well ( checked them all and  they showed similar like below:

gpnptool get

## As a next step i moved the existing spfile in the $ORACLE_HOME/dbs

mv spfile+ASM1.ora spfile+ASM1.ora.20140311.old

## then edited a  the init.ora to make it point to the ASM-Diskgroup as well.

vi init+ASM1.ora

## i restarted the cluster and after that i checked in the asm Instance.

SQL> show parameter spfile
NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile                                                       string               +CLUSTERDATA/mysrvr3_cluster/as
mparameterfile/registry.253.84
1967299

## checks are asmcmd spget  and gpnptool get

##  copied  the init.ora to otherboxes

cat
oracle@mysrvr33r:/opt/crs/product/112_ee_64/crs/dbs [+ASM1]# cat init+ASM1.ora
spfile='+CLUSTERDATA/mysrvr3_cluster/asmparameterfile/registry.253.841967299'
1062      scp init+ASM1.ora oracle@mysrvr34r:/opt/crs/product/112_ee_64/crs/dbs/init+ASM2.ora
1063      scp init+ASM1.ora oracle@mysrvr35r:/opt/crs/product/112_ee_64/crs/dbs/init+ASM3.ora
1064      scp init+ASM1.ora oracle@mysrvr36r:/opt/crs/product/112_ee_64/crs/dbs/init+ASM4.ora

Mission completed. As always  happy reading And DO test before you  implement!

Mathijs .

ORA-01031: insufficient privileges with logon as sydba

Introduction:

I guess we all sometimes have that famous face palm moment where we do not have a direct success story to offer when a customer asks something . In my case I was asked to offer the sys password to a third-party. I did grumble about it but hey that is the way it is sometimes.  And yes during the time where they have sys-password  not responsible for that database AND I will  have to audit the system after  I take control of it again but that is another story.

Details:

In this case I had altered the password of sys , I tested it locally on the machine and I had given it to the customer and well he came back to me telling me it did not work …

The error they got and the one I researched for:

sqlplus "sys/password@MYB.prod.nl as sysdba"

ERROR:
ORA-01031: insufficient privileges

Well That one made me frown because : 
sqlplus system@MYDB.prod.nl

Connected to:

And then do this locally without connect string:

SQL> connect sys as sysdba 
Enter password: 
Connected.

So with the Web on my side  I was looking for clues-on this one. And again love the on-line community of Oracle because soon I found out  that I should check if password files existed and if the password on that file was set according to the sys password.

I checked and frowned because there was no password file present at all in the $ORACLE_HOME/dbs ! So I created a password file. Tested it and  it worked !

Surely that was a face-palm moment to me like hey I have been there – done that :)  but still nice got it working asap after all .

Syntax  IN $ORACLE_HOME/dbs: 
orapwd FILE=orapwMYDB1 password=
After that I copied it to the second node with scp and of course renamed  orapwMYDB1 to  orapwMYDB2.

See. Sometimes it is not that complex a matter to keep customer and Dba happy :)

Happy reading,

Mathijs

Are we or Are we not going to relink Oracle software after A Linux Upgrade

Introduction

Recently i have talked to  my managers with this one big topic on my mind . The Linux department is rapidly patching their servers to higher levels of Linux and they ask for support during and after them patching the OS. From management side I needed clear directive  about Oracle team following those activities closely. With Oracle support  I have started to gather information about the should or should not  relink the software . Experience will tell which way was the right one.  Cause of course there is also risks involved when relinking the Oracle  software . Examples from past seem to tell that relink brought trouble with it too.

As a central note i was given  Mos Note Is It Necessary To Relink Oracle Following OS Upgrade? (Doc ID 444595.1)

In there you see two great references :

Note 883299.1 Oracle 11gR2 Relink New Feature

and

Note 220970.1 RAC: Frequently Asked Questions
Note 284785.1 How to Check Whether Oracle Binary/Instance is RAC Enabled and Relink Oracle Binary in RAC

From the rac note ) 220970.1  this is a interesting quote as with regard to  the Grid infra structure environment:

Do I need to relink the Oracle Clusterware / Grid Infrastructure home after an OS upgrade?

Using Oracle Clusterware 10g and 11.1, Oracle Clusterware binaries cannot be relinked. However, the client shared libraries, which are part of the home can be relinked, in most cases there should not be a need to relink them. See Note:743649.1 for more information. 
 
 Using Oracle Grid Infrastructure 11.2 and higher, there are some executables in the Grid home that can and should be relinked after an OS upgrade. The following steps describe how to relink an Oracle Grid Infrastructure for Clusters home: 
 
 As root:
 
 # cd Grid_home/crs/install
 # perl rootcrs.pl -unlock
 
 As the grid infrastructure for a cluster owner:
 
 $ export ORACLE_HOME=Grid_home
 $ Grid_home/bin/relink
 
 As root again:
 
 # cd Grid_home/crs/install
 # perl rootcrs.pl -patch 
 
 Note: If using Oracle Grid Infrastructure for Standalone Environments (Oracle Restart), see the Oracle Documentation for more information: http://docs.oracle.com/cd/E11882_01/install.112/e24321/oraclerestart.htm#BABJCHIJ

So should you relink ? Well frankly I would follow recommends  of Oracle in this matter  with the idea that it needs to be done on low priority boxes like test or preprod first before touching production . And should you decide not to relink due to experiences in the past , well i would recommend to be standby once the servers are patched Os wise and the grid infra structure  and databases are starting .

 

happy reading,

 

Mathijs

Cloning your 11.2 Oracle home from an existing server to a new server

Introduction:

At the moment I am reading a great book on 12c (Pro Oracle Database 12c administration) by Darl Kuhn (http://www.apress.com/9781430257288) . And I really like his style and elaborations on topics . While i was  reading it is like being held a mirror. Indeed most times I would also use the runInstaller interactively to run the installation of the Oracle binaries and yes would even frown if I would not be allowed to use a xterm to do installations. Well Darl you gave me an inspiration for set up of environment i need shortly.  So this is a big applause for him already!

Details:

For a data migration environment i am asked to set up Oracle on a Linux server identical to the production servers that are part of the scope of the project.  After installation I will be working with BCVS to bring copies of a number of databases to this box. The production servers are 11.2.2.0 with  one-off patches and a psu patch ..  And as an important add-on information i have installed the Grid Infrastructure software already on the new box so i have Oracle Restart with an ASM instance running. ( that also means that  i have an OraInst location and an existing Oracle Inventory in place too). So which way would i head.

  1. Install Oracle binaries from scratch , and apply all the patches
  2. Clone  the Rdbms software from a production box.

Since i got time i investigated both options. But in the end i loved the scenario 2 below because I always had done the normal interactive way installation:

First $ORACLE_HOME was checked:

$ echo $ORACLE_HOME
/opt/oracle/product/112_ee_64/db

Next step was to move to the ORACLE_HOME

$ cd $ORACLE_HOME

Almost there. Moved back  ( up ) one level ( so I was in /opt/oracle/product/112_ee_64 directory).

$ cd ..

Was aiming  for the tar utility to copy everything in the subdirectory db (and below)  in the  /opt/oracle/product/112_ee_64/.

Note if you get error messages of files not being included ( happened to me too make sure you do a nohup  tar -cvf myorahome.tar>Mylogfile.lst) Most likely you either ran out of space or did not have enough privileges to run the tar for all files ( so i ran  tar as root in the end).

$ tar -cvf myorahome.tar db
  • tar -cvf is the command  to run with the options -cvf
  • myorahome.tar will be holding all the files in all the subdirectories below the /opt/oracle/product/112_ee_64/
  • db is the subdirectory ( part of my ORACLE_HOME) which I need to be added to the tar file.

That created for a full-sized Rdbms Installation with all patches included a Tarred file  of about  5G. After that  myorahome.tar file was copied to the new server with scp.

The tar file was copied to the already existing /opt/oracle/product/112_ee_64 on the new machine. There the tar file was extracted and the extract created the subdirectory db again ( and all others below 2) as part of the extract:

$ cd /opt/oracle/product/112_ee_64
Note: make sure you have plenty space on the new server ( better check that before extraction.
Next, extract the files:
$ tar -xvf myorahome.tar

Next step was needed to include, to register this cloned installation in the Oracle Inventory. In my case I used the runInstaller inside the new Oracle home. First to make sure that the new Oracle Home was not known in the Inventory by detaching it:

./runInstaller -detachHome ORACLE_HOME="/opt/oracle/product/112_ee_64/db"

Then I added the cloned software to the Inventory:

./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/opt/oracle/product/112_ee_64/db" ORACLE_HOME_NAME="OraDb11g_home1" LOCAL_NODE="" CLUSTER_NODES="

But of course the proof of the pudding is in tasting it once this had finished:

So  it was needed to check with Opatch if the Orainventory now was aware about all installed patches.

export PATH=/opt/oracle/product/112_ee_64/db/OPatch:$PATH

I checked  opatch was the correct one:

which opatch

Then I checked the patches with the command below:

opach lsinventory -oh /opt/oracle/product/112_ee_64/db

And it showed all the correct patches . So indeed I had cloned the Oracle software in a correct way to this machine . Yah ! And a  big thank you to Darl  for a great book which pointed me in the right direction.

Happy reading,

Mathijs

The dreaded ORA-12154: TNS:could not resolve the connect identifier specified.

Introduction

Actually  I wanted to start this one with  … and a funny thing happened on the way to the circus , but lets save that one for another occasion okay ? Last week  I got a mail from one of the users  who tried to connect  via ezconnect to an existing database from new client that i had set up for him.  AS always a challenge to see what is going on and of course it takes time to find out the real deal.

Details:

## This case occurred in a 11.2 environment on one of the test boxes . They were trying to connect via ezconnect to one of the existing databases which failed:

 [10:30:23] [ INFO] SQL Runner: Starting runing script on database SCOTT/SCOTT@MYSRVR1:1521/MYDB1
[10:30:26] [ INFO] INPUT> ERROR:
[10:30:26] [ INFO] INPUT> ORA-12154: TNS:could not resolve the connect identifier specified
[10:30:26] [ INFO] INPUT>
[10:30:26] [ INFO] INPUT>
[10:30:26] [ INFO] INPUT> SP2-0306: Invalid option.
[10:30:26] [ INFO] INPUT> Usage: CONN[ECT] [{logon|/|proxy} [AS {SYSDBA|SYSOPER|SYSASM}] [edition=value]]
[10:30:26] [ INFO] INPUT> where  ::= [/][@<connect_identifier>]
[10:30:26] [ INFO] INPUT>        ::= [][/][@<connect_identifier>]
[10:30:26] [ INFO] INPUT> ERROR:
[10:30:26] [ INFO] INPUT> ORA-12162: TNS:net service name is incorrectly specified
[10:30:26] [ INFO] INPUT>
[10:30:26] [ INFO] INPUT>

##However a  Tnsping  was working correctly

tnsping MYDB1
TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 10-FEB-2014 10:31:14
 Copyright (c) 1997, 2011, Oracle.  All rights reserved.
 Used parameter files:
/opt/SP/STORAGE/TNS_ADMIN/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ENABLE=BROKEN)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=MYSRVR1.vfnl.dc-ratingen.de)(PORT = 1521)))(CONNECT_DATA=(SERVICE_NAME=MYDB1.test.nl)))
OK (130 msec)

## Since this was a 11.2 test environment I was able to play a bit with the environment. What puzzled me most  was that  a different database showed both the short service name and the fully qualified name of the service ( with the domain name in it ) in the listener. And when added the short  service name  MYDB1  next to the qualified service name  MYDB1.test.nl  listener would still not pick up both  services , even when I read  that pmon was supposed to register the services automatically  in frequent intervals (60 seconds) .

Added some entries to my tnsnames.ora and started testing . And indeed the full service name worked  and the short service refused to !  Performed the test that I bounced the database ( again this was test so not that much harm done) with no better effect . Even the restart of listener  did not bring the solution.

So it was clear that  I needed to see what was different between the two environments  since I had one other database automatically registered in the listener  with both the services I was looking for.

Bottom-line  after investigation is it works now after the restart of the database and setting some parts different.  :)   Lets check .

Oh and I performed three  actions in the database  to make it work. And yes the database and not the listener cause  the 11.2 environment lets the database register the services automatically with a listener  (well as long as one plays by the rules):

##First I  added the short service name to the database ( this is not a Rac environment so I did not set up a service in the clusterware using srvctl  ( and ok  I admit it i tried and clusterware replied that you cannot add a service with the same name as the database).

T his is how my services look now:

SQL> show parameter  service

NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
service_names                                      string               MYDB1.test.nl, MYDB1

##Made sure that if the local listener is in place it is pointing to the correct listener

## Just wanted to make sure that  the correct listener would be used so in this test i added both my listeners:

SQL>show parameter  listener
NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
listener_networks                                string
local_listener                                         string               (DESCRIPTION=(ADDRESS_LIST=(AD
                                                                                          DRESS=(PROTOCOL=TCP)(HOST=195.
                                                                                          233.124.139)(PORT=1522))(ADDRE
                                                                                          SS=(PROTOCOL=TCP)(HOST=195.233
                                                                                          .124.139)(PORT=1521))))
remote_listener                                    string

## And yes the  Domain was set:

SQL> show parameter domain
 NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
db_domain                                             string               test.nl

## The environment that was working did not have set the db_domain parameter so in this test I removed it too:  It is now

SQL> show parameter domain
 NAME                                                      TYPE VALUE
------------------------------------ ----------- ------------------------------
db_domain                                             string

Now the listener ,  it is showing both services :

 Service "MYDB1" has 1 instance(s).
  Instance "MYDB1", status READY, has 6 handler(s) for this service...
Service "MYDB1.test.nl" has 1 instance(s).
  Instance "MYDB1", status READY, has 6 handler(s) for this service...

I tested it myself with two entries in tnsnames ( short name and long one and both work

MYDB1MBK1 =
   (DESCRIPTION =
     (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = TCP)(HOST = MYSRVR1.vfnl.dc-ratingen.de)(PORT = 1522))
     )
     (CONNECT_DATA =
       (SERVICE_NAME = MYDB1.test.nl)
     )
   )

 MYDB1MBK2 =
   (DESCRIPTION =
     (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = TCP)(HOST = MYSRVR1.vfnl.dc-ratingen.de)(PORT = 1522))
     )
     (CONNECT_DATA =
       (SERVICE_NAME = MYDB1)
     )

This workaround worked so I informed customer to test it.  Will have to test on another environment the effect if i keep db_domain ( and of course db_unique_name) and leave the services to blank.

Happy reading ,

Mathijs

When Rman shows errors like RMAN-20220 and RMAN-06004

Introduction:

On one of the product ion databases we are using BCVS  (Business Continuity Volume) ) in the Emc boxes to make  Rman backups.  Basically done by putting  the source (production) Database into begin backup mode then split the mirror. After the split the source environment is put to end backup mode and the Bcv is being mounted on a different server to make the Rman level backups.  It is also important to mention the we are using an Rman catalog  database where every source database present has its own Schema with the catalog for that database only.

On the Source database  the archives are put to tape.

For a great explain of the concept I recommend following blog  by one of my Heroes:  Martin Bach:

http://martincarstenbach.wordpress.com/2011/05/24/offloading-production-backups-to-a-different-storage-array/

In my environment  i got alarmed that the backup of the control file on the backup server was no longer  running. So it was time again to gear up and go out there and investigate .

Details:

On the backup server we use a tailored script to do the level backup , and in that script aso a backup of the control file  is included .  When I was looking at the log files I see that this Rman (bcv)  level backup is  doing two things :

  • Produce a level backup which is successful ,
  • After that  a copy of the control file is registered in the catalog and is put to  tape . That part is failing all the time now ( even though  the log files show the  backup as success) .

In the log files on the backup server I see after a successful level backup  following error:

 RMAN> run {
2> debug off;
3> allocate channel d1 type disk;
4> catalog controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck';
5> release channel d1;
6> allocate channel t1  type  'SBT_TAPE';
7> send 'NSR_ENV=(NSR_SERVER=MYNSR_SERVER,NSR_CLIENT=MYDB_SERVER, NSR_DATA_VOLUME_POOL=REP#0032#G02#LTO3, NSR_GROUP=DAT_0032_G02_
LTO3_TSR_local_01, NSR_SAVESET_BROWSE="0032 Days", NSR_SAVESET_RETENTION="0032 Days")';
8> backup
9> format 'ctrl_level1_%d_201402012208_3605724822_%p_%U'
10> controlfilecopy '/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck';
11> release channel t1;
12> }
Debugging turned off

allocated channel: d1
channel d1: sid=593 devtype=DISK

cataloged control file copy
control file copy filename=/opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck recid=1280 stamp=83
8423230

released channel: d1

allocated channel: t1
channel t1: sid=593 devtype=SBT_TAPE
channel t1: NMO v5.0.0.0

sent command to channel: t1

Starting backup at 01-FEB-14
released channel: t1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 02/01/2014 23:20:35
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20220: control file copy not found in the recovery catalog
RMAN-06090: error while looking up control file copy: /opt/oracle/admin/backup/MYDB1/backup_controlfile_MYDB1_2014_02_01_23:20.bck

At  first moment I don’t see it why this is failing so it is under investigation but it already feels like a nice puzzle.

When I checked the production side  to get a complete  overview  that was a shock to me. Because in the  log files of the Archive backups i found a clue what was going on but i also noticed that the backups where failing . In the log files I saw:

...
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of crosscheck command at 02/02/2014 23:23:30
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

Hmm wait a minute did we just see a valuable clue to  solve the case ?

So After that i started investigating after surfing the web. After connecting to the catalog ( schema) in the Rman database I started following query:

SQL> SELECT name, DBID, RESETLOGS_TIME FROM rc_database;
NAME           DBID RESETLOGS_TIME
-------- ---------- -------------------
MYDB1   3605724822 31.01.2014 07:05:51

That was making me frown  because in the catalog it was known that there had been a resetlog ???

Then I checked the incarnation:

SQL> SELECT dbid, name, dbinc_key, resetlogs_change#, resetlogs_time FROM rc_database_incarnation;
DBID NAME      DBINC_KEY RESETLOGS_CHANGE# RESETLOGS_TIME
---------- -------- ---------- ----------------- -------------------
3605724822 MYDB1            2        7.9895E+12 21.07.2009 17:18:42
3605724822 MYDB1           73                 1 04.04.2008 15:11:18
3605724822 MYDB1           74           2461875 19.05.2008 16:28:20
3605724822 MYDB1      2553841        1.0587E+13 31.01.2014 07:05:51

Not good at  all  Cause apparently the Catalog  had other information (  showing  a reset logs )  then the production database  ( which had its last reset logs  somewhere back in 2008 ) . And that was  also  the information I saw when I queried the production database ( v$database view is showing  information about a reset logs in

RESETLOGS_CHANGE# NUMBER System change number (SCN) at open resetlogs
RESETLOGS_TIME DATE Timestamp of open resetlogs

In Rman environment   the command list incarnation only returned an empty  line and a prompt which was unexpected too.

After giving it some thought and consulting a colleague I decided to stick to the easy scenario where:

  • I performed an export of the Rman schema for that Database on the Rman Catalog database server and  I will keep that export the upcoming  4 weeks for when an old restore would be needed.
  • I dropped the Rman  user  for that specific database in the Rman catalog database ( drop user  rman_Mydb1 cascade).
  • I registered the database again as a new Rman_schema in the Rman catalog database .

## After that it was time to check things>

rman TARGET  / RCVCAT rman_MYDB1/*****@RMAN
Recovery Manager: Release 10.2.0.3.0 - Production on Mon Feb 3 17:15:36 2014
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: MYDB1 (DBID=3605724822)
connected to recovery catalog database
RMAN> list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       73      MYDB1   3605724822       PARENT  1          04.04.2008 15:11:18
1       74      MYDB1   3605724822       PARENT  2461875    19.05.2008 16:28:20
1       2       MYDB1   3605724822       CURRENT 7989501317585 21.07.2009 17:18:42

Looked  OK to me. After that  I ran an Archive backup and i checked the result of the scheduled Rman level backup the very next day.   Once again the concept worked as a charme  so happy dba again.

Happy reading,

Mathijs