The good the bad and the corrupted (ORA-01578, ORA-26040, ORA-01110)

Introduction:

 

At the moment I am part of a team that will upgrade a billing environment from 10.2.0.5 to 11.2.0.4 on Solaris. After a restore and point in time  recovery  on a preproduction environment the team has upgraded all databases using the manual upgrade method ( still not that confident that the upgrades would run as smooth with the dbua….. ). As always after the upgrade there is a time of hyper care where the alert files should be monitored even more closely for funky issues. This blog is giving you some more details on the funky stuff we came across and how we analyzed it. The one thing that did make my frown during this action however was the fact that various tools showed various results ( not necessarily showing all the same results.

Conclusions:

When upgrading  important environments a DBA  should really have eyes and  ears everywhere before performing that upgrade because once upgraded   it will be a tough call to proof that something was broken in the old environment already or that it was caused by the upgrade. So I would really recommend to scrutinize  your environment like someone studies a creature under a microscope before upgrading questioning broken jobs , invalid objects and as  I learned the hard way during this upgrade .. objects with no logging option (DBA_TABLES, DBA_INDEXES, DBA_LOBS, DBA_TAB_PARTITIONS, DBA_LOB_PARTITIONS, DBA_TAB_SUBPARTITIONS, etc.
LOGGING=’NO’ indicates NOLOGGING. ).  In my case e a number of objects had been created with the no logging option  and indeed only after  upgrading I had to fight my way back to find out what was going on , what objects  were involved ( in my case  it was related to a number of objects  , staging tables for a specific user  where a drop  and recreate and exp/imp dealt with it). I  did recommend to make sure that there will be force logging for the complete database ( but was told that customer wants to be able to create objects with no logging option  for performance reasons).

As I mentioned in the blog  in the End  I was plain lucky that this  was caused by  staging tables  which made the scenario more easy since we could  drop and recreate them easily. Once again I learned something ( and found proof again that success  just loves preparation). Of course this is a slightly different scenario  then the one we will run in production since we will  run the Upgrade directly but still would need to bear in mind the objects in production  created with the no logging option if one of the databases in scope would need a restore if the upgrade would break the database. Well in  this case all databases where upgraded with success  so all DBAS  happy and confident for the production migration which is scheduled.

Details:

SERVER: mysrvr

Database: OPTMYDB

Timeline:

  • All Databases of the billing environment have been restored  to the new OPT server called mysrvr.
  • We gave upgraded all Databases from 10.2.0.5 to 11.2.0.4 on Solaris.
  • After opening  the database OPTMYDB  and running statistics jobs she started throwing  error messages in alert about block corruptions  as seen in Example 1 below. Yesterday together with a colleague from Adba  team the gather statistics issue was resolved  and three objects had been identified to have corrupt blocks ( see Example2) .  These three objects have been dropped and recreated and there were no more corruption messages in the alert file.
  • After that I have been asked by my manager to run a dbv for the full database checking all datafiles .
  • I also have used rman in Parallel to find possible corruption  which is still running but first output is in example3 :

Example 1

## In  the alertfile this was the information We came across setting off the first alarm bells:

Mon Aug 04 22:00:43 2014

DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.

Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc:

ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout

ORA-29400: data cartridge error

error opening file /home/tmcon/oracle/UNMATCHED_SUBS_3640.log

Errors in file /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/trace/OPTMYDB_j002_3640.trc  (incident=8665):

ORA-01578: ORACLE data block corrupted (file # 620, block # 391690)

ORA-01110: data file 620: ‘/db/MYDB/data/tbsdata_e/tbsdip01_4m_14.dbf’

ORA-26040: Data block was loaded using the NOLOGGING option

Incident details in: /opt/oracle/OPTMYDB/diag/rdbms/MYDB/OPTMYDB/incident/incdir_8665/OPTMYDB_j002_3640_i8665.trc

 

Example2

I have grepped  the ORA-01578 from the alert file and added them to a table to get an overview of datafiles and blocks having the issues. And since the alertfile is a growing  objects this is what I got over t he time interval since starting up the database as a 11020.4 environment.

grep ORA-01578 alert_OPTMYDB.log AFN BL
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 391690) 620 391690
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 620, block # 392202) 620 392202
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946
ORA-01578: ORACLE data block corrupted (file # 661, block # 423946) 661 423946

 

## On mos I retrieved a note that offered more information on the object that was located around the corrupted block:

select file_id AFN, relative_fno, tablespace_name
from dba_data_files
where relative_fno=&RFN;
Enter value for rfn: 620
old   3: where relative_fno=&RFN
new   3: where relative_fno=620
       AFN RELATIVE_FNO TABLESPACE_NAME
———- ———— ——————————
      3689          620 UNDO_TBS_01
       620          620 TBSDIP01_4M
##  Used this statement to find the object that was on that spot in the tablespace:
select *
from dba_extents
where file_id = &AFN 620
and &BL between block_id AND block_id + blocks – 1; 391690
620
392202
SQL> select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1;
  2    3    4  Enter value for afn: 620
old   3: where file_id = &AFN
new   3: where file_id = 620
Enter value for bl: 391690
old   4: and &BL between block_id AND block_id + blocks – 1
new   4: and 391690 between block_id AND block_id + blocks – 1
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_BPS_05062014000_NOBP
                               TABLE              TBSDIP01_4M
         0        620     391689    4194304        512          620
SQL> select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1;
  2    3    4  Enter value for afn: 620
old   3: where file_id = &AFN
new   3: where file_id = 620
Enter value for bl: 392202
old   4: and &BL between block_id AND block_id + blocks – 1
new   4: and 392202 between block_id AND block_id + blocks – 1
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_BPS_05062014000_NORUEB
                               TABLE              TBSDIP01_4M
         0        620     392201    4194304        512          620
SQL> select file_id AFN, relative_fno, tablespace_name
from dba_data_files
where relative_fno=&RFN;
  2    3  Enter value for rfn: 661
old   3: where relative_fno=&RFN
new   3: where relative_fno=661
       AFN RELATIVE_FNO TABLESPACE_NAME
———- ———— ——————————
       661          661 TBSDIP01_4M
select *
from dba_extents
where file_id = &AFN
and &BL between block_id AND block_id + blocks – 1; 661 423946
OWNER
——————————
SEGMENT_NAME
——————————————————————————–
PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME
—————————— —————— ——————————
EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
———- ———- ———- ———- ———- ————
SCOTT
SCOTT_SII_05062014000R_BAN
                               TABLE              TBSDIP01_4M
         0        661     423945    4194304        512          661

 

Example3

On Mos  I also retrieved a note that offered RMAN as a tool to find  logical corruption. Since I am curious by nature of course had to use that goodie too !

 

RMAN> run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup validate check logical database;
}

 

##  Running this in Rman will put rows if any in the v$database_block_corruption.  When i asked for more details on this view i was told that it should return to 0 rows when rerunning.  I restarted the rman process after some time  (so maybe since it did not complete it did not wrap up properly but at second run I noticed still rows present there).

SQL> select * from V$DATABASE_BLOCK_CORRUPTION

 

FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO

———- ———- ———- —————— ———

235      34825         92         1.3760E+13 NOLOGGING

235     323083        510         1.3760E+13 NOLOGGING

548     315914        833         1.3760E+13 NOLOGGING

648     148489        512         1.3760E+13 NOLOGGING

 

## Well the good thing was that this showed in one view what most likely was  the cause of the issue : Objects being created  with nologging option  thus invalidating a 100 %  recovery after a restore  ( and recovery of a number of archives ) and upgrade to 11G after that.

From Oracle support I got this as a good note as it turned out  that the corruption was caused  by no logging operations:

The Gains and Pains of Nologging Operations in a Data Guard Environment (Doc ID 290161.1)

ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING – Error explanation and solution (Doc ID 794505.1)

A quote from this Note really made me both frown and giggle:

…… Begin Quote

Note that the data inside the affected blocks is not salvageable. Methods like “Media Recovery” or “RMAN blockrecover” will not fix the problem unless the data file was backed up after the NOLOGGING operation was registered in the Redo Log.

…… END  Quote

I also learned that the way for Oracle identifies that the block was before invalidated due to NOLOGGING  is by updating most of the bytes in that block with 0xff but only if that “invalidate” redo is applied in a Recovery. If the associated redo/archived log file is used to RECOVER the data files ( which was the case in this preproduction environment) , Oracle invalidates such blocks and the error ORA-26040 along with error ORA-1578 are reported by SQL statements in the next block reads. So in this case the recovery  was done  but blocks were touched by NOLOGGING operations were marked soft corrupt during the recovery as those redo logs with flag “this block was nologging” were applied to the block.

http://docs.oracle.com/cd/E11882_01/backup.112/e10642/osadvsce.htm#BRADV90047

The recommendation is : Always performs backup after NOLOGGING operations, don’t touch those blocks again, use that backup for recovery.

 

As always  happy reading and till we meet again,

 

Mathijs

 

Dropping redo groups in Another Thread

Introduction:

Quite some time ago I had to turn a Rac environment into a single Instance / Database. However I did notice there where still redo groups of the second instance present which I wanted to drop. This is just a quick blog  to show steps I have followed :

 

## Checking about the redologs and the thread information:

SQL> select * from v$log;

GROUP#   THREAD# SEQUENCE#     BYTES BLOCKSIZE      MEMBERS ARC STATUS          FIRST_CHANGE# FIRST_TIME      NEXT_CHANGE# NEXT_TIME

———- ———- ———- ———- ———- ———- — —————- ————- ——————- ———— ——————-

1         1      1931 2147483648       512         2 YES INACTIVE         1.3789E+13 14.08.2014 18:46:19   1.3789E+13 18.08.2014 07:08:18

2         1      1933 2147483648       512         2 YES INACTIVE         1.3789E+13 18.08.2014 07:08:20   1.3789E+13 18.08.2014 07:08:22

3         1      1934 2147483648       512         2 NO CURRENT          1.3789E+13 18.08.2014 07:08:22   2.8147E+14

4         2        128 2147483648       512         2 YES INACTIVE            146319194 19.04.2012 23:21:10   146397537 20.04.2012 09:19:04

5         2        127 2147483648       512         2 YES INACTIVE            146311735 19.04.2012 22:35:29   146319194 19.04.2012 23:21:10

6         2        126 2147483648       512         2 YES INACTIVE            146169044 18.04.2012 23:20:54   146311735 19.04.2012 22:35:29

 

## Now lets try to drop a group that is part of 2nd Thread

SQL> alter database drop logfile group 4;

alter database drop logfile group 4

*

ERROR at line 1:

ORA-01623: log 4 is current log for instance MYDBP12 (thread 2) – cannot drop

ORA-00312: online log 4 thread 2: ‘+MYDBP1_REDO/MYDBP1/onlinelog/group_4.260.772812657′

ORA-00312: online log 4 thread 2: ‘+MYDBP1_DATA/MYDBP1/onlinelog/group_4.267.772812685′

 

 ## Checking Threads

SQL> select THREAD#, STATUS, ENABLED from v$thread;

THREAD# STATUS ENABLED

———- —— ——–

1 OPEN   PUBLIC

2 CLOSED PUBLIC

 

## And as we saw in first query which groups belong to the thread we want to drop: 

SQL> select group# from v$log where THREAD#=2;

GROUP#

———-

4

5

6

 

## First we need to make sure  that the thread will not interfere anymore so we disable it

SQL>

SQL> alter database disable thread 2;

Database altered.

 

## Now we can drop the logfile group:

SQL> alter database drop logfile group 4;

Database altered.

SQL> alter database drop logfile group 5;

Database altered.

SQL> alter database drop logfile group 6;

Database altered.

##  Believe is one thing , checking is better to see if thread is gone:

SQL> select THREAD#, STATUS, ENABLED from v$thread;

THREAD# STATUS ENABLED

———- —— ——–

1 OPEN   PUBLIC

## Lets check the groups from redo:

SQL> select * from v$log;

GROUP#   THREAD# SEQUENCE#     BYTES BLOCKSIZE      MEMBERS ARC STATUS          FIRST_CHANGE# FIRST_TIME      NEXT_CHANGE# NEXT_TIME

———- ———- ———- ———- ———- ———- — —————- ————- ——————- ———— ——————-

1         1      1931 2147483648       512         2 YES INACTIVE         1.3789E+13 14.08.2014 18:46:19   1.3789E+13 18.08.2014 07:08:18

2         1      1933 2147483648       512         2 YES INACTIVE         1.3789E+13 18.08.2014 07:08:20   1.3789E+13 18.08.2014 07:08:22

3         1      1934 2147483648       512         2 NO CURRENT          1.3789E+13 18.08.2014 07:08:22   2.8147E+14

 

Happy Dba again !

 

Enjoy reading and till we meet again,

 

Mathijs

 

Is your backup strategy safe ( a story of BCV – Snap restores with EMC)

Introduction:

As a dba there should always be some room to improve your backup and recovery scenarios.  This blog is about a  judgment  day scenario – losing a 3TB database in full and your bosses are asking  what will you do and when will the database be back on its feet. In such scenarios I do tend to debate when do you really lose the full database with a need to restore it ..  proper responses might be maybe during an upgrade  to a newer Oracle Release or when you lose the server where the  database is running (however  given the fact that  a lot of sites use SAN  that would even  offer other options than a restore of such a big database not that likely).  In my experience it would be much more common to come across scenarios to restore single datafiles , tablespaces . But OK  for the argument’s sake and to proof we have a scenario when we lose the full database ( means all data files and control files) let us go into more detail with that.

Details:

On my most important databases this has been the setup in the past when implementing the database(s) in ASM to give us a save feeling.

  • Every database has three disk groups   called SID_DATA , SID_FRA and SID_REDO .
  • Of course database files are in the SID_DATA , Control files  and  REDOLOG group members are  in   SID_DATA  and SID_REDO and the archives are put to the SID_FRA.
  • Fortunately we are using  EMC technology , both Snap and BCV. We use them to  set up  clone – reporting environments, pre-production – and  do our backups with it.

As with regard to the backups an interesting debate has risen this week which triggered this blog. In the setup  of the Snap and BCV it has always been considered that  there have to be two of them for each database. One holding the SID_DATA disk group ( means the database files)  and one Snap or BCV holding SID_FRA  and SID_REDO.  When a backup is triggered,  the following actions are performed on the production side:

  • In the database it is brought to begin backup ,
  • In the database a copy of the controlfile is pushed to SID_FRA  disk group ,
  • In the EMC tools the  mirror is broken,
  • In the database an end backup is issued.
  • On the backup server the database is mounted with the copied version of the controlfile and a level backup is running with rman (both level0 and level 1 ( indeed also using block change tracking ).
  • After a backup  the snap / bcv remains mounted on the backup server for when of a true disaster to be used to restore  the full database from disk..
  • On the production server  the archives are saved in intervals to tape directly .

 

Soo  far so good right ? .. We have it all covered (or not ?)  . ….

If you take a closer look at a judgment day disaster – recovery ask  where customer will say we need a restore of the FULL database  ( of 3TB) would we be happy  when seeing this ?

If we check the redos  and control files we will see this:

1* select a.GROUP#,MEMBER,BYTES/1024/1024 from v$logfile a, v$log b where a.group#= b.group#  order by group#;

 

GROUP# MEMBER                                                                                                                                   BYTES/1024/1024

———- ——————————————————————————– —————

1 +MYDBP1_REDO/mydbp1/onlinelog/group_1.258.772722547                                                                       2048

1 +MYDBP1_DATA/mydbp1/onlinelog/group_1.262.772722575                                                                       2048

2 +MYDBP1_REDO/mydbp1/onlinelog/group_2.259.772723069                                                                       2048

2 +MYDBP1_DATA/mydbp1/onlinelog/group_2.263.772723099                                                                       2048

3 +MYDBP1_REDO/mydbp1/onlinelog/group_3.260.772723879                                                                       2048

3 +MYDBP1_DATA/mydbp1/onlinelog/group_3.266.772723909                                                                       2048

4 +MYDBP1_REDO/mydbp1/onlinelog/group_4.261.772723247                                                                       2048

4 +MYDBP1_DATA/mydbp1/onlinelog/group_4.267.772723277                                                                       2048

5 +MYDBP1_REDO/mydbp1/onlinelog/group_5.262.772723627                                                                       2048

5 +MYDBP1_DATA/mydbp1/onlinelog/group_5.534.772723657                                                                       2048

6 +MYDBP1_REDO/mydbp1/onlinelog/group_6.263.772722935                                                                       2048

6 +MYDBP1_DATA/mydbp1/onlinelog/group_6.535.772722963                                                                       2048

 

12 rows selected.

 

## Control files are in :

SQL> show parameter control_files

 

NAME                                                      TYPE VALUE

———————————— ———– ——————————

control_files                                          string               +MYDBP1_REDO/mydbp1/controlfile/current.257.771667625, +MYDBP1_DATA/mydbp1/controlfile/current.261.771667625

 

So if all hell breaks loose and you are asked to restore the backup from the  Snap or BCV  what will happen …..

First scenario was : well it is all  taken care of since we have two  bcv – snaps  remember, so   the dba would contact the storage colleague that he should  restore the  SID_DATA  snap( he should sync it back to production) and after that we would do a recovery  using the archives and the online redo logs  who are still present on the production box . And we feel very good with this cause we thought of the  _FRA and _REDO should not be synced back to production .. right ? ( actually the proper answer is wrong).

As we were discussing this scenario.. And after looking at the locations in the v$logfile we did notice  a week spot in that idea.. If we restore the  SID_DATA from BCV that would also mean that the first member of every online redolog would be wiped out since we have a redo member in  the SID_DATA diskgroup ( Oracle does not like online redologs NOT being in sync)..  And  we would also lose a current control file since that is too  in the same diskgroup ( Neither not synced control files btw are like by Oracle).  Of course such a scenario will only be needed  between dusk till dawn  after 4 nights of being on call .. and all shouting we need this issue fixed as soon as possible  so there would be little time to investigate  and set up a scenario.

Looking at this scenario  at hand,  when being asked to restore asap from BCV, losing one of the control files well that could easily be fixed by starting the database with  one control file ( since we did not touch the _FRA and _REDO diskgroups (we did not restore the second BCV  !! lucky us)   that would offer us still a current control file and after  doing our activities to restore and recover , we could use rman to copy the controlfile again to a second – third place and amend the spfile) .

The online redo member however  that would hurt bad if it really would occur that one online redo log member would be overwritten ( since we Restore from the  SID_DATA SNAP – BCV) to be fastest possible way back on our feet from disk once  the full database would be destroyed.  In a lab test  I could consider perhaps  a scenario where mounting the database and dropping the first member from each group ….. but not on a production highly important database like in this scenario..

So  that means that we will need to make sure that when a SID_DATA  snap- bcv  is  used (resynced to production) to restore from disk  it should only hold  Datafiles  and ok a control file (and not  also a online redo log member as happens at the moment) .

In this case I will go and set up new online redo log groups and  I will set them up in the SID_FRA and SID_REDO diskgroup. Then I will  do log switches  and in the end will drop the groups that had members both in _DATA and _REDO.   After that .. I would be much more relaxed for my judgment day restore request to recover the FULL 3 TB database from the snap – bcv at maximum speed. It is not that much work to carry out and I am sure I feel a lot more confident that even a scenario where a 3TB database needs to be restored  from a BCV can be done quit quickly and with less  risks.

 

As Always  happy reading and till we meet again,

 

Mathijs

 

The return of the relink Grid Infrastructure and Rdbms relink

Introduction:

This week  I have been part of the debate again , do we or don’t we relink when major activities like  Upgrade of Linux Kernel is performed . I  have been asked to do the relink after the Rac cluster was upgraded on Linux. So as always thought it would be wise to make notes during the day as a plan to be performed during the night . In this blog you will find the  steps i have performed on a two node Rac cluster with 11.2.0.4 Grid Infrastructure and two Oracle software trees  holding 11.2.0.4 Rdbms and 11.1. Rdbms.

With regard to relinking  discussion in team had been like .. 1) we might break things in relinking  and 2) we don’t have the resources to do that for every server. My recommendation is to follow Oracle in this  and do deal with relink of  the Grid Infra right after  OS has been relinked . Cause if something is broken during the Upgrade and your relinking there after well at least you know where it came from and can deal with things as from there . Where as  if you do not relink your Software right after such a major change on OS you might still be hit in the dark in the upcoming weeks and  you would need to figure out then what might have caused things.

You can even debate on the fact if it is needed to stop the resources like listeners and databases gracefully before shutting down the cluster  or  to perform a checkpoint in your database and just shutdown the crs .  I have been doing both approaches and never had issues so far. But i can imagine that heavy used , busy systems  might prefer the grace shutdown before shutting down GI.

 

Below you will find my steps . As always happy reading  and till we meet again ,

Mathijs.

 

Detailed Plan:

 

mysrvrar / mysrvrbr Steps 1 – 8 will be performed on all two nodes in my cluster, in a sequential order with some delay to make sure no cluster panic will occur.
1 crsctl status resource -t>/tmp/BeforeWork.lst Check your cluster in order to be able to compare it to what it looks like after the relinking. Maybe it is even a good idea to put it into a file. Often i end up on clusters which i am not that familiar with on a daily basis. So i tend to make this overview before i start working on the cluster.
2 cSpfile.ksh This is a home made script in which several activities are performed. It will perform a create a spfile , do a checkpoint and do switch logfile right before shutting down the cluster node.
3 emctl stop agent
4 srvctl stop home -o $ORACLE_HOME -s /tmp/statusRDBMS -n mysrvrar This will stop all resources that started from 1120.4 home and keep a record of them in the file in /tmp/status RDBMS. This will be convenient when starting again .
5
6 srvctl stop instance -d MYDBCM -i MYDBCM1 This is a shared cluster so we have customers requiring the 1120.4 software and some the 11.1 software . The 11.1 databases have to be stopped individually.
srvctl stop instance -d MYDBCMAC -i MYDBCMAC1
7 srvctl stop listener -n mysrvrar -l listener_MYDBCM1 It is common to have a listener per database so i will stop the 11.1 listener   in proper way as well.
srvctl stop listener -n mysrvrar -l listener_MYDBCMAC1
8 As root: Dealing with the cluster means you have to logon or perform sudo su – as the ORACLE user to become ROOT to perform the needed task to stop the cluster-ware on the cluster node.
9 cd /opt/crs/product/11204/crs/bin
10 ./crsctl disable crs During this maintenance Linux will be patching and rebooting various times so i was asked to make sure that the Grid Infra structure is not starting at each reboot till we are ready.
11 ./crsctl stop crs Last step as preparation for the Linux guys to patch the Machines . Shutting down the Grid Infra structure. Time to take a 2hr sleep.
Time to Relink the software on the two nodes Starting relink on the first node. Performing steps   9 and following . I will complete all steps needed on the first node and see to it that the Grid Infrastructure is started before moving on to the second node.
12 CHECK IF CRS IS DOWN otherwise REPEAT step 4 After Returning to the cluster still check if crs is down.   Because it is better to be safe then sorry.
13 As root: In order to relink the Grid Infra you have to become the root user again.
14 cd /opt/crs/product/11204/crs/bin as root
15 cd /opt/crs/product/11204/crs/crs/install
16 perl rootcrs.pl -unlock Earlier this night the GI was shutdown for Linux patching. When you perform this perl rootcrs .pl -unlock it will try to shutdown the GI. So in my case i got a message that the system was not able to stop the crs ..
17 As the grid infrastructure for a cluster owner: This was a bit tricky. Cause the owner of the Grid Infra in my case is Oracle so dont try this as root . Better to open a second window as Oracle for the steps below.
18 export ORACLE_HOME=/opt/crs/product/11204/crs As the Oracle user.
cd /opt/crs/product/11204/crs/bin As the Oracle user.
19 relink Relink will also write a relink log which you can tail.
20 [Step 1] Log into the UNIX system as the Oracle software owner: Once the GI software has been relinked it is time for relinking the Oracle Homes( in my case an 11.1 and 11.2. software tree). In my case i logged on as the oracle user.
21 [STEP 2] Verify that your $ORACLE_HOME is set correctly:
22 For all Oracle Versions and Platforms, perform this basic environment check first:
export $ORACLE_HOME= /opt/oracle/product/11204_ee_64/db Oracle 11.2.0.4
export $ORACLE_HOME= /opt/oracle/product/111_ee_64/db Oracle 11.1
cd $ORACLE_HOME
pwd Check the environment.
23 [Step 3] Verify and/or Configure the UNIX Environment for proper relinking:
Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib LD_LIBRARY_PATH needs to be in place so when relinking both ORACLE versions make sure you set the environment in a correct way.
export LD_LIBRARY_PATH=/opt/oracle/product/11204_ee_64/db/lib
echo $LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/oracle/product/111_ee_64/db/lib
echo $LD_LIBRARY_PATH
24 [Step 4] For all Oracle Versions and UNIX Platforms:
Verify that you performed Step 2 correctly: Check , check and check again
env | grep -i LD_ ….make sure that you see the correct absolute path for $ORACLE_HOME in the variable definitions.
25 [Step 5] For all Oracle Versions and UNIX Platforms:
Verify umask is set correctly:
umask This must return 022. If it does not, set umask to 022.
umask 022
umask
26 [Step 6] Run the OS Commands to Relink Oracle:
Important Notes:
* Before relinking Oracle, shut down both the database and the listener.
* The following commands will output a lot of text to your session window. To capture this output for upload to support, redirect the output to a file.
* If relinking a client installation, it’s expected that some aspects of the following commands will fail if the components were not originally installed.
27 For all UNIX platforms:
Oracle 8.1.X, 9.X.X, 10.X.X or 11.X.X
————————————-
$ORACLE_HOME/bin/relink all Oracle 11.1
$ORACLE_HOME/bin/relink oracle 11.2
writing relink log to: /opt/oracle/product/11204_ee_64/db/install/relink.log
28 How to Tell if Relinking Was Successful: If relinking was successful, the make command will eventually return to the OS prompt without an error. There will NOT be a ‘Relinking Successful’ type message. I performed a tail on the logfiles as relink was running in a second window and did not see any issues. And as the note says wait for the prompt to return ( with no comments – messages ) and you are good to go
29 As root again: Since i am relinking both the GI and the RDBMS i have moved this step ( starting the GI again till after the RDBMS relinking has finished because of   course during the relink of RDBMS the environment ( Databases , listeners ) have to be down !
30 cd /opt/crs/product/11204/crs/crs/install/
31 perl rootcrs.pl -patch This perl rotcrs.pl -patch wil also start the cluster on this node again.NOTE we had issues that this was hanging on the first Node . It appeared that the second node was up and running after all ( my Linux Colleague had issued a crsctl disable crs from an old not active cluster-ware software which was still present on the box) . So in this specific scenario on second node i stopped crs again   then the script continued on first node.
32 crsctl enable crs If you have used the disable crs . Enable it again so after a node reboot the GI will start.
33 As Oracle
emctl start agent Agent was already running so no manual action needed.
34 srvctl start home -o $ORACLE_HOME -s /tmp/statusRDBMS -n mysrvrar This will start all resources started from 1120.4 home. The resources had been saved previously in the /tmp/statusRDBMS file
35 srvctl start instance -d MYDBCM -i MYDBCM1 Starting the 11.1 Resources.
srvctl start instance -d MYDBCMAC -i MYDBCMAC1
36 srvctl start listener -n mysrvrar -l listener_MYDBCM1 Starting the 11.1 Resources.
srvctl start listener -n mysrvrar -l listener_MYDBCMAC1
37 As Oracle User on the second node once it is relinked:
38 srvctl start instance -d MYDBCM -i MYDBCM2 Starting the 11.1 Resources.
srvctl start instance -d MYDBCMAC -i MYDBCMAC2
39 srvctl start listener -n mysrvrbr -l listener_REQMOD2 Starting the 11.1 Resources.
srvctl start listener -n mysrvrbr -l listener_MYDBCM2
srvctl start home -o $ORACLE_HOME -s /tmp/statusRDBMS -n mysrvrbr
crsctl status resource -t Check your cluster again and compare the result with the status before. Hopefully all resources will appear online online   or at least show the situation as it was before . There might be an extra activity if you are using   services that have been relocated   during the action. In such case you will have to relocate them again to the original location.

 

 

The Patch and the Furious ( Patching Grid Infra 11.2.0.3. and Rdbms 11.2.0.3 with PSU April 2014)

Introduction:

In  June 2014   I have been upgrading quite a few environments  in our billing environment which was still 11.2.0.5.0. for both for Grid infra structure and Rdbms on RedHat Linux.  My first scenario after reading and setting up a first environment  was to install  a mandatory 12539000 (well mandatory for that patch level)  One off  patch  and then upgraded to 11.2.0.3 with PSU April 2014. But of course a Dba should be flexible in his approaches so  as a form of adapting while doing I decided that it would be best to use PSU Oct 2012 in full and then Migrate to 11.2.0.3. on both my Grid Infra and Rdbms environment. Since  I had an agreement with customer that the latest PSU patch would be applied  I aimed for implementing April 2014.

By request I got the following Documents on Mos as a must read before going to the upgrade:

———————————————————————————————————————————
Things to Consider Before Upgrading to 11.2.0.3/11.2.0.4 Grid Infrastructure/ASM (Doc ID 1363369.1)
ORAchk Upgrade Readiness Assessment (Doc ID 1457357.1)
Before upgrade database check this note: Complete Checklist for Manual Upgrades to 11gR2 (Doc ID 837570.1)
Upgrade / Downgrade Assistant: Oracle Database/Client (Doc ID 1561791.2)
———————————————————————————————————————————-

 Detailed  Plan and comments:

Installation and Patching  the software
Current setup :
“Grid Infrastructure Patch Set Update 11.2.0.2.5 (includes DB PSU 11.2.0.2.5)
“Database Patch Set Update : 11.2.0.2.5 (13343424)”
Before beginning, use an Oracle provide tool called “orachk” utility to check your environment for Upgrade Readiness.  Reference:  ORAchk 11.2.0.3 Upgrade Readiness Assessment (Doc ID 1457357.1).Command: orachk -u -o pre Things to Consider Before Upgrading to 11.2.0.3/11.2.0.4 Grid Infrastructure/ASM (Doc ID 1363369.1)
orachk.zip
Of course cluvfy is you friend so you should use it as part of preparationruncluvfy.sh stage -pre crsinst -upgrade -n mysrvr01hr,mysrvr02hr -rolling -src_crshome /opt/crs/product/112_ee_64/crs -dest_crshome /opt/crs/product/11203_ee_64/crs -dest_version 11.2.0.3.0 -fixup -fixupdir /tmp -verbose>/tmp/11203Upgrade.lst Things to Consider Before Upgrading to 11.2.0.3/11.2.0.4 Grid Infrastructure/ASM (Doc ID 1363369.1)
New Oracle Home :   /opt/crs/product/11203_ee_64/crs
Orachk script pointed out I would need this One off patch before  migrate:PATCH 12539000
The patch must be applied to GI home and is recommended for 11.2.0.2 database homes. It’s available on top of all DB PSUs which are part of corresponding GI PSU.
It’s recommended to be applied with “opatch auto” with latest 11.2 opatch (patch 6880880)as the ROOT user:export PATH=/opt/crs/product/112_ee_64/crs/OPatch/:$PATH
which opatch$GRID_HOME/OPatch/opatch auto /opt/oracle/stage -oh /opt/crs/product/112_ee_64/crs$GRID_HOME/OPatch/opatch auto   /opt/oracle/stage -oh /opt/oracle/product/112_ee_64/db

When opatch asks the following question, enter ‘yes’ without quote:
Enter ‘yes’ if you have unzipped this patch to an empty directory to proceed (yes/no):yes

If opatch auto fails, apply it manually:
1. Stop all databases manually as database users: /opt/oracle/product/112_ee_64/db/bin/srvctl stop database -d <dbname>
2. Unlock GI home as root user:For GI Cluster environment:
# $GRID_HOME/crs/install/rootcrs.pl -unlock

For GI Standalone (Oracle Restart):
# $GRID_HOME/crs/install/roothas.pl -unlock

3. Apply the patch to GI and database homes:

As oracle user: /opt/crs/product/112_ee_64/crs/OPatch/opatch napply -oh /opt/crs/product/112_ee_64/crs -local /opt/oracle/stage/12539000

As database user:   /opt/crs/product/112_ee_64/crs/OPatch/opatch napply -oh /opt/oracle/product/112_ee_64/db   -local /opt/oracle/stage/12539000

4. Lock GI home as root user:

For GI Cluster environment:

# $GRID_HOME/rdbms/install/rootadd_rdbms.sh
# $GRID_HOME/crs/install/rootcrs.pl -patch

For GI Standalone (Oracle Restart):

# $GRID_HOME/rdbms/install/rootadd_rdbms.sh
# $GRID_HOME/crs/install/roothas.pl -patch

5. Start all databases manually as database users: <DB_HOME>/bin/srvctl start database -d <dbname>

Since my ORACLE_BASE (/opt/oracle) is the same as my Home of the Oracle User (/opt/oracle)  I had issue that after the patching the owner of /opt/oracle became root!!:
drwxr-x— 31 root dba   4096 May 23 09:21 oracleThis meant trouble if i would end session and perform new ssh session from my admin box(being unable to log in) so I added a check before sign off , cause this has to be oracle:dba
 Patching the Grind Infra structure:
As they say always better to be safe the sorry so make a copy of the Crs software before start
## as root:
tar -c –recursion -pP –file=/opt/oracle/export/CRSbackup.tar /opt/crs/product/
If you have had an existing installation on your system, and you are using the same user account to install this installation, then unset the following environment variables: ORA_CRS_HOME; ORACLE_HOME; ORA_NLS10; TNS_ADMIN; and any other environment variable set for the Oracle installation user that is connected with Oracle software homes. unset ORA_CRS_HOME
unset ORACLE_HOME
unset ORA_NLS10
unset TNS_ADMIN
In RAC environments Upgrade the Grid infra structure Node wise.
Checks you can perform before and after upgrade:
crsctl query has softwareversion
crsctl query crs softwareversion
crsctl query has releaseversion
crsctl query crs releaseversion
/opt/oracle/admin/tools/cSpfile.ksh This shell script is creating a copy of spfile to Init.ora.
After that I created Migration pfiles with larger settings for shared_pool. Recommended setting Min. Value Shared_pool_size > 600M ( prefer 1024M).
/opt/oracle/admin/tools/cSrvctlActNew.ksh status / stop / config This is a shell script to check the current status in the clusterware.
/opt/oracle/admin/tools/cSrvctlAct.ksh config Check the current setup in the clusterware
Dryrun . Start ./runInstaller to check Prerequisites before install and correct if needed. Of course clufvy is part of the runInstaller activities. Check for warnings and errors and correct them
in ./runInstaller in the GUI use the option : upgrade an existing Oracle Clusterware and Oracle ASM installation Note. 11.2.0.3 is an OUT of Place installation that means that you will have to Install it in separate Oracle Home.
Install 11.2.0.3 in   /opt/oracle/product/11203_ee_64/db ./opt/oracle/stage/database/runInstaller
unzip /opt/oracle/stage/p6880880_112000_Linux-x86-64.zip in your new GRID_HOME directory. Copy and install the Latest Opatch to both GI and Rdbms since after this Installation we will have to apply patch.
/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin/ocm.rsp Create a response File needed during the opatch in the subdirectory  /opt/crs/product/11203_ee_64/crs/OPatch/ocm/bin/
/opt/crs/product/11203_ee_64/crs/OPatch/ocm/bin/ocm.rsp
export PATH=/opt/crs/product/11203_ee_64/crs/OPatch:$PATH Opatch runs as root , Set Path
which opatch Check version of Opatch
opatch auto /opt/oracle/stage -ocmrf /opt/crs/product/11203_ee_64/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/11203_ee_64/crs Run Opatch
Note on dry run the system would not come down since instances still open . I used my tools to stop them after which patching continued
Opatch had ISSUE with Fact that rdbms 11203 had no resources ( it is software only so the patching crashed). Recommendation is to run this part of PSU after the mig. DO NOT run During the GI upgrade !!!!
You may check  results with opatch ls inventory when you are pointing to the GRID_HOME or with  “orachk” utility.  Reference:  ORAchk 11.2.0.3 Upgrade Readiness Assessment (Doc ID 1457357.1)./opt/oracle/product/orachk -u -o post
Upgrade the Databases:
Perform MANUAL upgrade for the databases in scope. During an earlier upgrade DBUA messed up by adding local listener to Init.ora and continued altering the oratab by the Grid agent. That is why I recommend against the DBU for bulk upgrades . I would script the Upgrade using a Fixed Oracle_HOME ( the new one ) and a dedicated initora / spfile for the MIG.
Steps for Manual Upgrade: Preferred WAY !
utlu112i.sql AND dbupgdiag.sql ( both located in /opt/oracle/admin/tools  dbupdiag.sql can be obtained from MOS.
Create a new pfile from spfile and check if there are parameteres that need increase ( shared_pool_size might be a good candidate, and job_queue_processes needs 0 during the migration.  OR if all is fine well then simply  make job_queue_processes = 0 in your spfile. spfile has production size no alter needed for shared_pool_size. Job_queue_processes = 20 is needed.
1) Start sqlplus and run catupgrd.sql script from the NEW $ORACLE_HOME/rdbms/admin  As a preparatation make sure you have listener.ora , all init.oras ( spfiles) password file present in the proper Subdirectories on your NEW ORACLE_HOME.
sqlplus ” / as sysdba “
spool /tmp/upgrade<DB>.log
startup upgrade;  If you needed to prepare a init<sid>.mig   as a migration pfile you need to say: startup upgrade pfile=’/ / /init<sid>.mig .When you perform a startup upgrade these parameters are altered by Oracle:ALTER SYSTEM enable restricted session;ALTER SYSTEM SET _system_trig_enabled=FALSE SCOPE=MEMORY;Autotune of undo retention is turned off.

ALTER SYSTEM SET _undo_autotune=FALSE SCOPE=MEMORY;

ALTER SYSTEM SET undo_retention=900 SCOPE=MEMORY;

ALTER SYSTEM SET aq_tm_processes=0 SCOPE=MEMORY;

ALTER SYSTEM SET enable_ddl_logging=FALSE SCOPE=MEMORY;

Resource Manager disabled during database migration: plan ” not set

ALTER SYSTEM SET resource_manager_plan=” SCOPE=MEMORY;

ALTER SYSTEM SET recyclebin=’OFF’ DEFERRED SCOPE=MEMORY;

Resource Manager disabled during database migration

replication_dependency_tracking turned off (no async multimaster replication found)

set echo on
@?/rdbms/admin/catupgrd.sql; After Catupgrd.sql finishes it will shutdown the database.
2) Check catupgrd.sql spool file for errors.
3) Restart the database in normal mode.
4)   @$ORACLE_HOME/rdbms/admin/catuppst.sql; Post steps for the migration.
5)   @$ORACLE_HOME/rdbms/admin/utlrp.sql;
alter system set “_use_adaptive_log_file_sync”=FALSE scope = both; Requested by customer.
set lines 2000
select instance_name from v$instance; Check sanity of upgrade.
select * from v$version; Check sanity of upgrade.
select COMP_NAME,VERSION,STATUS,MODIFIED from dba_registry order by 1; Check sanity of upgrade all the installed components should be valid !
select * from DBA_REGISTRY_HISTORY order by action_time desc Check if Catbundle Ran (shows most recent Entry first.
Check $ORACLE_HOME/dbs for the presence / correct init.ora Should point to the Asm Diskgroup to an Spfile.
srvctl upgrade database -d <Db> -o /opt/oracle/product/11203_ee_64/db Inform clusterware about altered Oracle home.
srvctl modify database -d<Db> -p ‘+DATA/<Db>/spfile<Db>.ora’ make sure Clusterware knows about the Spfile alter if needed.
srvctl modify database  -d <Db> -o ‘/opt/oracle/product/11203_ee_64/db’ make sure Clusterware knows about the New Oracle Home.
If you have a listener per Database make sure it is started from the NEW oracle Home with the correct listener.ora
/opt/oracle/admin/tools/cSrvctlAct.ksh status Check status of db in cluster with shell script.
/opt/oracle/admin/tools/cSrvctlAct.ksh config Check configuration of db in cluster.
srvctl stop database -d   & start database -d as check . /opt/oracle/admin/tools/cSrvctlAct.ksh stop /opt/oracle/admin/tools/cSrvctlAct.ksh start As a test stop and start via srvctl stop/start database -d <Db>
ln -s /opt/networker/lib/libnwora.so libobk.so in new oracle home Check For the Networker Lib present in the new Oracle Home.
run an archive or control file   backup as check Run an archive backup as a test
rman catalog username/password@alias
RMAN> UPGRADE CATALOG;
 Apply PSU patch to rdbms:
/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin/ocm.rsp
export PATH=/opt/oracle/product/11203_ee_64/db/OPatch:$PATH Opatch runs as root , Set Path
which opatch Check version of Opatch
opatch auto /opt/oracle/stage -ocmrf /opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin/ocm.rsp -oh /opt/oracle/product/11203_ee_64/db
@catbundle.sql psu apply
2. cd $ORACLE_HOME/rdbms/admin
3. sqlplus /nolog
4. SQL> CONNECT / AS SYSDBA
5. SQL> STARTUP
6. SQL> @catbundle.sql psu apply
7. SQL> QUIT
 After applying a PSU patch this is needed For each database using the patched ORACLE_HOME. BUT it was not needed in my scenario where I had installed and patched the empty ORACLE_HOME 11.2.0.3 and migrated the databases using the new ORACLE_HOME  after that.
FallBack scenarios:
Grid Infra fall back :
Downgrading to a release 11.2.0.1 or later release:
Use the command syntax Grid_home/perl/bin/perl rootcrs.pl -downgrade -oldcrshome oldGridHomePath -version oldGridversion, where oldGridhomepath is the path to the previous release Oracle Grid Infrastructure home, and oldGridversion is the release to which you want to downgrade. For example:
# /u01/app/11.2.0/grid/perl/bin/perl rootcrs.pl -downgrade -oldcrshome /u01/app/11.2.0/grid -version 11.2.0.1.0
If you want to stop a partial or failed 11g release 2 (11.2) installation and restore the previous release Oracle Clusterware, then use the -force flag with this command.
After the rootcrs.pl -downgrade script has completed on all remote nodes, on the local node use the command syntax Grid_home/crs/install/rootcrs.pl -downgrade -lastnode -oldcrshome pre11.2_crs_home -version pre11.2_crs_version [-force], where pre11.2_crs_home is the home of the earlier Oracle Clusterware installation, and pre11.2_crs_version is the release number of the earlier Oracle Clusterware installation.
For example:
# /u01/app/11.2.0/grid/perl/bin/perl rootcrs.pl -downgrade -lastnode -oldcrshome
/u01/app/crs -version 11.1.0.6.0
This script downgrades the OCR. If you want to stop a partial or failed 11g Release 2 (11.2) installation and restore the previous release Oracle Clusterware, then use the -force flag with this command.
Log in as the Grid infrastructure installation owner, and run the following commands, where /u01/app/grid is the location of the new (upgraded) Grid home (11.2):
.Grid_home/oui/bin/runInstaller -nowait -waitforcompletion -ignoreSysPrereqs -updateNodeList -silent CRS=false ORACLE_HOME=/u01/app/grid
As the Grid infrastructure installation owner, run the command ./runInstaller -nowait -waitforcompletion -ignoreSysPrereqs -updateNodeList -silent CRS=true ORACLE_HOME=pre11.2_crs_home, where pre11.2_crs_home represents the home directory of the earlier Oracle Clusterware installation.
For example:
.Grid_home/oui/bin/runInstaller -nowait -waitforcompletion -ignoreSysPrereqs -updateNodeList -silent CRS=true ORACLE_HOME=/u01/app/crs
DownGrade Rdbms – Databases
The database software 11.2.0.2 should still be in place so no need for extra action.
If you are downgrading to Oracle Database 11g Release 1 (11.1), then the COMPATIBLE initialization parameter must be set to 11.0.0 or lower.
Remove Unsupported Parameters from Server Parameter File (SPFILE)
Perform a Full Backup of Oracle Database Before Downgrading
Downgrading Oracle Database to an Earlier Release
1. If you have enabled Oracle Database Vault on your database, then you must:
Grant the Database Vault DV_PATCH_ADMIN role for the SYS account.
Disable Database Vault before downgrading the database.
  1. If you previously had upgraded the database and then used the DBMS_DST PL/SQL package to update the database time zone version, then you must apply the patch for the same time zone file version into the earlier release’s Oracle home before downgrading.
As an example scenario, assume that a release 10.2.0.4 database on Linux x64 using DSTv4 had been upgraded to release 11.2.0.2, and DBMS_DST was then run to update this database to DSTv14. Then, before downgrading from release 11.2.0.3 to 10.2.0.4, you need to apply on the release 10.2.0.4 side the DSTv14 patch for 10.2.0.4 for Linux x64. This ensures that your TIMESTAMP WITH TIME ZONE data is not logically corrupted during retrieval.
To find which time zone file version your database is using, run:
SELECT value$ FROM sys.props$ WHERE NAME = ‘DST_PRIMARY_TT_VERSION';
  1. If you had set the ORA_TZFILE environment variable to the full path name of the timezone.dat file when you upgraded to Oracle Database 11g Release 2 (11.2), then you must unset it if you subsequently downgrade your database.
Two time zone files are included in the Oracle home directory:
The default time zone file at
$ORACLE_HOME/oracore/zoneinfo/timezonelrg.dat
A smaller time zone file at
$ORACLE_HOME/oracore/zoneinfo/timezone.dat
If you do not unset the ORA_TZFILE variable, then connecting to the database using the smaller time zone file might produce the following errors:
SP2-1503: Unable to initialize Oracle call interface
SP2-0152: ORACLE may not be functioning properly
4. If you have Oracle Application Express on your database, then you must copy the apxrelod.sql file from the Oracle Database 11g Release 2 (11.2) ORACLE_HOME/apex/ directory to a directory outside of the Oracle home, such as the temporary directory on your system.
Make a note of the new location of this file.
5. If you have created objects based on fixed objects, then drop these objects to avoid possible ORA-00600 errors. You can re-create these objects after the downgrade.
6. Log in to the system as the owner of the Oracle Database 11g Release 2 (11.2) Oracle home directory.
  1. This step is required only if Enterprise Manager Database Control is already configured for the database.
Stop Database Control, as follows:
  1. Set the ORACLE_UNQNAME environment variable to the database unique name.
  1. Run the following command:
ORACLE_HOME/bin/emctl stop dbconsole
If the database being downgraded is an Oracle RAC database, then perform this step on all instances.
  1. If you are downgrading an Oracle RAC database to 10g Release 1 (10.1), then you must remove extra voting disks before shutting down the Oracle Clusterware stack.
1. To see the number of voting disks used and to list voting disk paths, run the following command:
Oracle_Clusterware_Home/bin/crsctl query css votedisk
2. Remove each additional voting disk you find in the previous step by running the following command, where path is the voting disk path listed in the File Name column displayed in the previous step:
Oracle_Clusterware_Home/bin/crsctl delete css votedisk path
For example:
Oracle_Clusterware_Home/bin/crsctl delete css votedisk /share/cluster2/vote_cluster2.dbf
Do not delete the last voting disk.
9. At a system prompt, change to the ORACLE_HOME/rdbms/admin directory.
Note:
If you are downgrading a cluster database, then shut down the instance completely and change the CLUSTER_DATABASE initialization parameter to FALSE. After the downgrade, you must set this parameter back to TRUE.
10. Using SQL*Plus, connect to the database instance as a user with SYSDBA privileges.
11. Start up the instance in DOWNGRADE mode:
SQL> STARTUP DOWNGRADE
You might be required to use the PFILE option to specify the location of your initialization parameter file.
12. If you have Enterprise Manager configured in your database, then drop the Enterprise Manager user:
DROP USER sysman CASCADE;
13. Set the system to spool results to a log file so you can track the changes and issues:
SQL> SPOOL downgrade.log
14. Run catdwgrd.sql:
SQL> @catdwgrd.sql
The following are notes about running the script:
You must use the version of the script included with Oracle Database 11g Release 2 (11.2).
You must run the script in the Oracle Database 11g Release 2 (11.2) environment.
The script downgrades all Oracle Database components in the database to the major release or Oracle Database 11g patch release from which you originally upgraded.
If you encounter any problems when you run the script, or any of the scripts in the remaining steps, then correct the causes of the problems and rerun the script. You can rerun any of the scripts described in this chapter as many times as necessary.
If the downgrade for a component fails, then an ORA-39709 error is displayed and the SQL*Plus session terminates without downgrading the Oracle Database data dictionary. All components must be successfully downgraded before the Oracle Database data dictionary is downgraded. You must identify and fix the problem before rerunning the catdwgrd.sql script.
15. Turn off the spooling of script results to the log file:
SQL> SPOOL OFF
Then, check the spool file and verify that there were no errors generated during the downgrade. You named the spool file in Step 13 and the suggested name was downgrade.log. Correct any problems you find in this file and rerun the downgrade script if necessary.
Note:
If you want to save the results of the first time you ran the downgrade script, then before you rerun it be sure to rename downgrade.log to something else.
16. Shut down the instance:
SQL> SHUTDOWN IMMEDIATE
17. Exit SQL*Plus.
18. If your operating system is Linux or UNIX, then change the following environment variables to point to the directories of the release to which you are downgrading:
ORACLE_HOME
PATH
You should also check that your oratab file and any client scripts that set the value of ORACLE_HOME point to the downgraded Oracle home.
See Also:
Oracle Database Installation Guide for your operating system for information about setting other important environment variables on your operating system
  1. If your operating system is Windows, then complete the following steps:
  1. Stop all Oracle services, including the OracleServiceSID Oracle service of the Oracle Database 11g Release 2 (11.2) database, where SID is the instance name.
For example, if your SID is ORCL, then enter the following at a command prompt:
C:\> NET STOP OracleServiceORCL
See Also:
Oracle Database Net Services Administrator’s Guide
  1. Delete the Oracle service at a command prompt by issuing the ORADIM command. For example, if your SID is ORCL, then enter the following command:
C:\> ORADIM -DELETE -SID ORCL
  1. Create the Oracle service of the database that you are downgrading at a command prompt using the ORADIM command.
C:\> ORADIM -NEW -SID SID -INTPWD PASSWORD -MAXUSERS USERS
     -STARTMODE AUTO -PFILE ORACLE_HOME\DATABASE\INITSID.ORA
This syntax includes the following variables:
Variable
SID
PASSWORD
USERS
ORACLE_HOME
For example, if you are downgrading to Oracle Database 10g Release 2 (10.2), if your SID is ORCL, your PASSWORD is TWxy5791, the maximum number of USERS is 10, and the ORACLE_HOME directory is C:\ORANT, then enter the following command:
C:\> ORADIM -NEW -SID ORCL -INTPWD TWxy5791 -MAXUSERS 10
     -STARTMODE AUTO -PFILE C:\ORANT\DATABASE\INITORCL.ORA
Note:
You are not required to change any Windows Registry settings when downgrading a database. The ORADIM utility makes all necessary changes automatically.
20. Restore the configuration files (for example, parameter files, password files, and so on) of the release to which you are downgrading.
If this is an Oracle RAC database, execute the following command to return the database to single instance mode:
set CLUSTER_DATABASE=FALSE
Note:
If you are downgrading a cluster database, then perform this step on all nodes in which this cluster database has instances configured. Set the CLUSTER_DATABASE initialization parameter to FALSE. After the downgrade, you must set this initialization parameter back to TRUE.
21. At a system prompt, change to the ORACLE_HOME/rdbms/admin directory of the previous release.
22. Start SQL*Plus.
23. Connect to the database instance as a user with SYSDBA privileges.
24. Start up the instance:
SQL> STARTUP UPGRADE
25. Set the system to spool results to a log file to track changes and issues:
SQL> SPOOL reload.log
26. Run catrelod.sql:
SQL> @catrelod.sql
The catrelod.sql script reloads the appropriate version of all of the database components in the downgraded database.
27. If you are downgrading to Oracle Database 11g Release 1 (11.1.0.6), run the xsrelod.sql script:
SQL> @xsrelod.sql
Running the xsrelod.sql script avoids the following error:
PLS-00306: wrong number or types of arguments in call
to ‘INVALIDATE_DSD_CACHE’ DBMS_XS_DATA_SECURITY_EVENTS
PL/SQL: Statement ignored
Note:
The PLS-00306 error is not an issue for Oracle Database release 11.2.0.7 or later.
28. If you are downgrading to Oracle Database 10g Release 1 (10.1.0.5) and you have XDB in your database, then run the dbmsxdbt.sql script:
@dbmsxdbt.sql
Running the dbmsxdbt.sql script avoids the following errors when reloading XDB.DBMS_XDBT package:
PLS-00201: identifier ‘CTXSYS.CTX_OUTPUT’ must be declaredPLS-00201: identifier ‘CTX_DDL’ must be declared
SQL> @apxrelod.sql
Running the apxrelod.sql script avoids package APEX_030200.WWV_FLOW_HELP being INVALID due to the following error:
PLS-00201: identifier ‘CTX_DDL’ must be declared
30. Turn off the spooling of script results to the log file:
SQL> SPOOL OFF
31. Shut down and restart the instance for normal operation:
SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP
You might be required to use the PFILE option to specify the location of your initialization parameter file.
  1. Perform this step if the database is configured for Oracle Label Security and you are downgrading to Oracle Database 10g Release 1 (10.1).
  1. Copy the olstrig.sql script from the Oracle Database 11g Release 2 (11.2) Oracle home to the Oracle home of the version to which the database is to be downgraded.
  1. Run olstrig.sql to re-create DML triggers on tables with Oracle Label Security policies.
SQL> @olstrig.sql
See the Oracle Label Security Administrator’s Guide for more information.
33. Run the utlrp.sql script:
SQL> @utlrp.sql
The utlrp.sql script recompiles all existing PL/SQL modules that were previously in an INVALID state, such as packages, procedures, types, and so on.
34. Exit SQL*Plus.
35. If you are downgrading a cluster database, then you must run the following command to downgrade the Oracle Clusterware database configuration:
$ srvctl downgrade database -d db-unique-name -o old_ORACLE_HOME -t to_relnum
[where db-unique-name is the database name (not the instance name), old_ORACLE_HOME is the location of the old Oracle home in which the downgraded database will be running. In this example, to_relnum is the database release number to which the database is being downgraded. (For example: 11.2.0.1.0.)]
Caution:
By default, any named user may create a server pool. To restrict the operating system users that have this privilege, Oracle strongly recommends that you add specific users to the CRS Administrators list.
See Also: Oracle Clusterware Administration and Deployment Guide for more information about adding users to the CRS Administrators list.

Looking back at these environments now they are patches my recommendation can only be that these environments should be patched more often but as we all know that is sometimes easier said then done.
As always don’t believe it  and test it first !!!

 

PS. the fallback scenario has not been used nor implemented it has been added here for  reference .

Happy reading and till next time.

Mathijs

 

 

 

Rman Archive backup failing with RMAN-03002, RMAN-06059, ORA-19625 in Rac

Introduction:

As member of the  team every so many days I am part of hot line , taking care of Tickets and indeed being the (hopefully friendly) voice in your ear when  you call  the  hot line number of Oracle team at the Company. One of the tickets drew my attention ( hmm maybe cause it was about RMAN because for years I love to work with RMAN) .  An archive backup had failed various times to I was asked to investigate and fix. This Blog will show you the analyses and steps I have performed  to fix this issue.

Details:

At  work we run a shell script to run both level and archive backups and in the logfile that was produced for the Archive backup this was my first clue:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 07/11/2014 07:27:45
RMAN-06059: expected archived log not found, loss of archived log compromises recoverability
ORA-19625: error identifying file /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47588_2.arc
ORA-27037: unable to get file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

###    So  it was my interpretation that the specific archive was not  present in the ASM diskgroup but it was local  present on the other Node in the  RAC  in the $ORACLE_HOME/dbs directory.  My investigations  on the three nodes showed me:

## On first node :

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled <<– wrong cause the archives are supposed  to be in the +MYDB2_FRA1 diskgroup as a ground rule.
Oldest online log sequence     51210
Next log sequence to archive   51213
Current log sequence           51213

On the second node :

SQL> archive log list
Database log mode                 Archive Mode
Automatic archival                  Enabled
Archive destination                 MYDB2_DATA1  <<– wrong cause the + is missing.
Oldest online log sequence     47585
Next log sequence to archive   47588
Current log sequence             47588

## on the 3rd node:

SQL> archive log list
Database log mode                 Archive Mode
Automatic archival                  Enabled
Archive destination                 MYDB2_DATA1 <<– wrong cause the + is missing .
Oldest online log sequence     52542
Next log sequence to archive   52545
Current log sequence             52545

So after that my conclusion was  that  due to the wrong archive_destination on two out of three nodes in my rac the archives where being written on a local filesystem on those two boxes  and on top of that in the $ORACLE_HOME/dbs directory. Well that was not good  !!

## So first steps to be performed  was to correct the archive_destination to:

ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB21′;
ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB2′;
ALTER SYSTEM SET log_archive_dest_1=’LOCATION=+MYDB2_FRA1′ SCOPE=BOTH SID=’MYDB23′;

## Once that was corrected I performed a couple of log switches to see archives being born in the +MYDB2_FRA1 diskgroup. Which was the case so already a bit happy me.

## Since I had noticed that the missing +  in the archive destination caused Oracle to create the archives.
## in $ORACLE_HOME/dbs.
## I had to copy  the archives present there to asm disk group  in asmcmd in the correct sub-folder ( I noticed that the archives had been of the last 2 recent days by the creation date of them in Linux so  I was aware where to put them in ASM):

cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47523_2.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47525_2.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_47526_2.arc .

## and I did the same on the other server.

oracle@mysrvr:/opt/oracle/product/11202_ee_64/db/dbs/ [MYDB23]# ls -ltr *.arc .

cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52480_3.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52481_3.arc .
cp /opt/oracle/product/11202_ee_64/db/dbs/MYDB2_DATA1MYDB2_776281254_52482_3.arc .

## Next step was how to find out how to register these archives after they have moved to ASM diskgroup in rman. And again fellow bloggers did not let me down so after spending some time on Google surfing   I came up with this:

I connected to the rman environment both target  and catalog and I performed this which was nice since  I only had to point to the directory where all the archives were located now in ASM after my copy action:

catalog start with  ‘+MYDB2_FRA1/MYDB2/ARCHIVELOG/2014_07_10/';
catalog start with  ‘+MYDB2_FRA1/MYDB2/ARCHIVELOG/2014_07_11/';

## Once  that had been done I ran an archive backup with success. And  was happy  that I had solved another Puzzle .

 

As always Happy reading  and till next time,

 

Mathijs

 

 

The good , the bad and the OPatch (applying PSU April 2014 on Rac and Oracle Restart)

Introduction:

 

Last couple of weeks  I have been busy  patching and upgrading Production , Preproduction en test environment and during those activities OPatch was my friend and tool for that.  Looking back an after talking to colleagues I decided to create a post for this .  In my patching activities I had to apply  a recent PSU patch to both the Grid Infra structure and Rdbms , do an Upgrade of the software  and add the latest PSU patch again. In  your  preparations for OPatch I had issues with regard to storage present on the  mount-point of the Grid Infrastructure . So as part of  activities  you should take a look at your file-system size  Since the  PSU patches will need at least  5 GB free space in the mount.

Preparations:

 

  • As was mentioned in the introduction make sure you have at least  5GB ( more is better in this case) in the mount-point where the Grid infra Structure is located . In my case I had /opt/crs/product/11202/crs as a mount with 15GB of space. In this mount the grid software had been installed and One Psu patch had been applied in the old days ( we are talking October 2012 PSU ). And while applying a required PSU (October 2013) ( required for the upgrade to Oracle 11.2.0.3)  there was not enough space to install the software.
  • Since my current platform is Linux ( this is all about patching Rac environments and Oracle Restart env.) I looked at Metalink and downloaded: p6880880_112000_Linux-x86-64.zip. With every PSU patch you install you should ask yourself is my opatch up to date enough , or should  I download a fresh copy  from Metalink. I tend to  check  and to download a fresh copy every time i am my T-shirt “I-m a patch Dba today and I Like it “.
  • In my environment my software installs look pretty much like this :
    • Grid Infra structure is installed in /opt/crs/product//crs
    • Rdbms is installed in /opt/oracle/product/11202_ee_64/db
    • oh and a bit confusing perhaps my ORACLE_BASE is  the same as the home of the ORACLE user ( which is /opt/oracle)

## tips

•    Make a subdirectory for each psu patch you apply if un unzip N psu patches in same directory opatch will apply them every  time again.
•    Is auto really auto , tend to do it with –oh  which still works fine for me.
•    Keep your Opatch tool up to date .

## Setting up your patching :

oracle@mysrvr:/opt/oracle/stage []# dir
drwxr-xr-x  5 oracle dba     4096 Jun 23 13:29 .
drwxr-xr-x 32 oracle dba     4096 Jun 23 15:22 ..
drwxr-xr-x  2 oracle dba     4096 Jun 11 13:32 OPatchlogs
drwxr-xr-x  2 oracle dba     4096 Jun 23 13:28 psuApr2014
drwxr-xr-x  2 oracle dba     4096 Jun 23 13:29 psuOct2013

## inside psuOct2013

oracle@mysrvr:/opt/oracle/stage/psuOct2013 []# ls -ltr
total 288260
-rw-r–r– 1 oracle dba        21 Apr  4  2013 README.txt
drwxr-xr-x 5 oracle dba      4096 Apr  4  2013 16459322
-rw-r–r– 1 oracle dba       450 Oct  9  2013 bundle.xml
drwxrwxr-x 9 oracle dba      4096 Oct 10  2013 17082367
-rw-rw-r– 1 oracle dba    141496 Jan 20 05:18 README.html
-rw-rw-r– 1 oracle dba    136871 Jan 20 05:18 PatchSearch.xml
-rwxr-xr-x 1 oracle dba 294574955 Jun  4 07:28 p17272753_112020_Linux-x86-64.zip

## Inside psuApr2014

oracle@mysrvr:/opt/oracle/stage/psuApr2014 []# ls -ltr
total 586820
drwxr-xr-x  5 oracle dba      4096 Jan  9 16:27 17592127
drwxrwxr-x 12 oracle dba      4096 Feb  5 07:04 18031683
-rw-r–r–  1 oracle dba       450 Feb 10 10:16 bundle.xml
-rw-r–r–  1 oracle dba         0 Feb 10 10:17 README.txt
-rw-rw-r–  1 oracle dba     59977 Apr 15 12:18 README.html
-rw-rw-r–  1 oracle dba    125015 Apr 15 14:17 PatchSearch.xml
-rwxr-xr-x  1 oracle dba 600096863 May 16 15:33 p18139678_112030_Linux-x86-64.zip

 

## Applying  PSU April 2014

unzip /opt/oracle/stage/p6880880_112000_Linux-x86-64.zip in your GRID_HOME and ORACLE_HOME directory
/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin/ocm.rsp  set up a response file (and make not of the absolute path for that response file because you will need it during opatch apply.
/opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp : that is my absolute path to the response file
unzip   p18139678_112030_Linux-x86-64.zip ( this was PSU april 2014 )
AS ROOT:export PATH=/opt/crs/product/11203/crs/OPatch:$PATH
export PATH=/opt/oracle/product/11203_ee_64/db/OPatch:$PATH
which opatch ( check if root can run opatch now )
PER NODE in your Cluster as ROOT :
##Crs
opatch auto /opt/oracle/stage/

unzip /opt/oracle/stage/p6880880_112000_Linux-x86-64.zip in your cdora directory
/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin/ocm.rsp
/opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp
unzip   p18139678_112030_Linux-x86-64.zip
export PATH=/opt/crs/product/11203/crs/OPatch:$PATH
export PATH=/opt/oracle/product/11203_ee_64/db/OPatch:$PATH
which opatch
PER NODE:
##Crs
opatch auto /opt/oracle/stage/

unzip /opt/oracle/stage/p6880880_112000_Linux-x86-64.zip in your cdora directory
/opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin/ocm.rsp
/opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp
unzip   p18139678_112030_Linux-x86-64.zip
export PATH=/opt/crs/product/11203/crs/OPatch:$PATH
export PATH=/opt/oracle/product/11203_ee_64/db/OPatch:$PATH
which opatch
PPER NODE AS ROOT:##Crsopatch auto /opt/oracle/stage/psuApr2014 -ocmrf /opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/11203/crs

##Rdbms

opatch auto /opt/oracle/stage/psuApr2014 -ocmrf /opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin/ocm.rsp -oh /opt/oracle/product/11203_ee_64/db

## Oracle Restart

/opatch auto /opt/oracle/stage/psuApr2014 -ocmrf /opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/11203/crs

/opatch auto /opt/oracle/stage/psuApr2014 -ocmrf /opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/oracle/product/11203_ee_64/db

-ocmrf /opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/11203/crs
##Rdbms
opatch auto /opt/oracle/stage/11203 -ocmrf /opt/oracle/product/11203_ee_64/db/OPatch/ocm/bin/ocm.rsp -oh /opt/oracle/product/11203_ee_64/db

## Oracle Restart
/opatch auto /opt/oracle/stage/psuApr2014 -ocmrf /opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/crs/product/11203/crs

/opatch auto /opt/oracle/stage/psuApr2014 -ocmrf /opt/crs/product/11203/crs/OPatch/ocm/bin/ocm.rsp -oh /opt/oracle/product/11203_ee_64/db

 

And as last recommendation . Check the logfiles that are produced during the OPatch in detail  cause i have seen a situation where the OPatch reported “succeeded”  but a detailed look in the logs showed that one of the patches had not been applied due to lack of space !!!!

 

As always happy reading and have a great day,

 

Mathijs