Steps to Recreate my Central Inventory in a Real Applications Cluster environment on Linux

Introduction:

In the last quarter of 2012 i have set up  a 4 Node Real Application Cluster on Linux which was great to do as always. In day-to-day business I had lost track of this environment  which in itself is not that bad of course. However this week I needed the environment as a preparation for patching so it would be mandatory to check with opatch last patch installed in the grid infra structure and in the rdbms homes that have been set up on those boxes. Well it was to be expected that other dbas would work on those boxes, maybe / likely might have installed new patches so it was really like getting back together with old friends to see what was under the hood.

Well it was a bit of disappointment that the inventory destination  (/opt/oracle/orInventory) was simply empty . Opatch did not like it either cause it  exited with a returncode 73. But then again on the bright side this meant i could take note of the necessary steps  performed  and turn it into a blog.

Continue reading

Rman Catalog Crosscheck and Delete obsolete an Approach

Introduction

When You are or will be working with Rman  as your backup solution you will have to ask yourself several questions . One of the elementary ones will be that you have to consider whether or not you will be utilizing an Rman Catalog or that  you will have faith in the existing  multiplexed controlfiles.  On the web I see debates going on about the use of a catalog and indeed given the idea that you will not need a catalog , since you are not be using stored scripts, you might get timeouts simply cause working with catalog shows long time no communication ( and you have those hmm lets say aggressive  firewalls in place that will snap at that so-it-seems idle sessions) etc. Well this post is about what you should do if you decide to have a catalog  for rman after all or if a catalog is in place due to company hmm  history – habits – or standards.

What  you should consider to have in place ( of course always  depends on your settings and configuration) Should be  a maintenance job for your Rman Catalog for the specific Database ( target)  you have added there.  With some explain below I would like to show you what was done for that maintenance part  by showing scripts we used for that.

Details:

#!/bin/ksh
#set -vx
PROG=$(basename $0)

if [ -z "$1" ]; then
echo “ERROR – wrong start commando”
echo “EXAMPLE – $PROG <SID>”
exit 1
fi

## Script will take the Instance that has been registered as an input parameter
export ORACLE_SID=$1

## Finding the ORATAB since it will help to set Oracle Home
if [ "${OS}" = "Linux" ];then
export ORATAB=/var/opt/oracle/oratab
else
export ORATAB=/etc/oratab
fi
ORATABLINE=`grep $ORACLE_SID $ORATAB`
export ORACLE_HOME=`echo $ORATABLINE | cut -f2 -d:`

## exporting several Parameters among which logging path and logfilename with a timestamp in it
export LOGPATH=/opt/oracle/scripts/log
export CURRENT_TIMESTAMP=`date +%Y%m%d_%H%M%S`   # Format: YYYYMMDD_HHMISS   e.g.: 20110907_150455
export LOGFILE=${ORACLE_SID}_crosscheck_${CURRENT_TIMESTAMP}.log
export NLS_DATE_FORMAT=”DD-MON-YYYY HH24:MI:SS”
export ORA_NLS32=${ORACLE_HOME}/ocommon/nls/admin/data
export ORA_NLS33=${ORACLE_HOME}/ocommon/nls/admin/data
export NLS_LANG=American_America.WE8ISO8859P15

## Connect to rman and your catalog  and start logging (msglog)  to the specified logdir and logfile.
## then allocate a channel. Note if you make backups to TAPE as we do you HAVE to allocate a channel for maintenance type SBT_TAPE and you HAVE to mention in our case specific Networker Server and Networker Client.
## then you perform the crosscheck backup which will check in the Networker database if the Backups is still available. If it is not the backup will be tagged expired in the Rman catalog.
## Once the  crosscheck is finished we wanted to see a result so did a list backup.
## If you are sure that what you see is correct  delete noprompt expired will erase the no longer existing backups in rman.. MAYBE best to run script first without this. Always best to be safe then sorry.
## After delete another Listing is performed as a record of the post  delete information in the catalog.

${ORACLE_HOME}/bin/rman TARGET  / RCVCAT rman_${ORACLE_SID}/${ORACLE_SID}@RMAN MSGLOG ${LOGPATH}\/${LOGFILE} << EOF
allocate channel for maintenance device type ‘SBT_TAPE’;
send ‘NSR_ENV=(NSR_SERVER=adebckus-nl,NSR_CLIENT=adesnouh)’;
crosscheck backup;
list backup of database summary;
delete noprompt expired backup;
list backup of database summary;
release channel;
exit
EOF

Happy Reading and best  of Luck,

Mathijs

When Starting a Rac database brings you: ORA-29760: instance_number parameter not specified.

Introduction

Recently was supporting a colleague who was unable to start a Real Application Database after he had  created the environment by manual scripts. As i already mentioned in another post nothing wrong with exploring both methods ( either using DBCA (with care but hey any tool will come with a manual and  some shortcomings) or manual. Well if you decide to work manually  of course it will take maybe even  more discipline in setting up init.ora files , registering the database and its instances into the Grid Infra structure layer etc.  But generally speaking there is a challenge in either approach you choose. Well on to our case.  The Database would not start via the clusterware (with srvctl ) but manually  it was possible to start either of the two instances in this Rac.

Investigation:

Well first glimpse to see what might be going on:

SQL> select * from gv$instance;

INST_ID INSTANCE_NUMBER INSTANCE_NAME HOST_NAME VERSION STARTUP_TIME STATUS PAR THREAD# ARCHIVE LOG_SWITCH_WAIT
———- ————— —————- —————————————————————- —————– ——————- ———— — ———- ——- —————
LOGINS SHU DATABASE_STATUS INSTANCE_ROLE ACTIVE_ST BLO
———- — —————– —————— ——— —
1 1 MYDB1 server1hr 11.2.0.3.0 11.04.2013 16:14:31 OPEN YES 1 STARTED
ALLOWED NO ACTIVE PRIMARY_INSTANCE NORMAL NO

2 2 MYDB2 server2hr 11.2.0.3.0 11.04.2013 16:25:54 OPEN YES 2 STARTED
ALLOWED NO ACTIVE PRIMARY_INSTANCE NORMAL NO

That query  showed both instances running. 

Then the colleague tried:

srvctl start database -d  … 

PRCR-1079 : Failed to start resource ora.mydb.db

CRS-5017: The resource action “ora.mydb.db start” encountered the following error:
ORA-29760: instance_number parameter not specified
. For details refer to “(:CLSN00107:)” in “/opt/crs/product/11203_ee_64/crs/log/server1hr/agent/crsd/oraagent_oracle/oraagent_oracle.log”.

CRS-2674: Start of ‘ora.mydb.db’ on ‘server1hr’ failed
CRS-2632: There are no more servers to try to place resource ‘ora.mydb.db’ on that would satisfy its placement policy
CRS-5017: The resource action “ora.mydb.db start” encountered the following error:
ORA-29760: instance_number parameter not specified
. For details refer to “(:CLSN00107:)” in “/opt/crs/product/11203_ee_64/crs/log/server2hr/agent/crsd/oraagent_oracle/oraagent_oracle.log”.

CRS-2674: Start of ‘ora.mydb.db’ on ‘server2hr’ failed

Well then it was time to Google  for ORA-29760: instance_number parameter not specified

A Fellow blogger  mentioned the following Mos Note to solve this issue. And it once again proved that if you really settle for doing manual setup of you Rac environment you have to be well prepared and consistent in the naming of your parameters in you configuration file ( init.ora as a base for an spfile) and the way you enter services in the cluster layer. It once again proved that  names are registered in the way you enter them.  In plain English it is Case sensitive so  the old saying : what you type is what you get is very applicable.

MOS note ‘ORA-29760: instance_number parameter not specified’ When Starting the Database with Srvctl [ID 749515.1]

Happy reading and as always best of luck.

Mathijs.

11.2 Rac – Instance unwilling to start with Error ORA-00132

Introduction:

This week  had one of those days where after working together with customer and Oracle Consultant  one  of  the Rac Databases needed a restart due to a variable change.  Together with customer it was decided to restart second  instance first so first took care of the  environment by enabling a Maintenance mode so our monitoring tool would not strike me with tickets.

The second instance was   friendly:

  • the shutdown: srvctl stop instance -d -i went smooth .
  • the startup: srvctl start instance -d -i went smooth .

Then i  performed same steps on first Instance:

  • the shutdown: srvctl stop instance -d -i went smooth .
  • the startup: srvctl start instance -d -i   failed  with an error message that puzzled me a lot .

ORA-00132: syntax error or unresolved network name ‘myserver-scan:33000′

I checked my scan listener settings:

oracle@myserver-s1:/opt/oracle [CRS]#
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node myserver-s4
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node myserver-s2
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node myserver-s3

oracle@myserver-s1:/opt/oracle [CRS]# srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:33000
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:33000
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:33000

Crs agent log showed same error and i could not start the instance via the  srvctl tool. The error suggested that my remote listener ( i am using scan ) was the issue.. But the  remote listener was not even part of  my tnsnames.ora, since my spfile value for the remote listener = myserver-scan:33000.

That was soo confusing. Well  I attempted to start the Instance via sqlplus  and that worked. I checked the cluster and it reported  ONLINE ONLINE on the resource so  the first pressure was off.

To solve it , make sure that  the sqlnet.ora is similar on all environments and it seems to need the ezconnect information in there  ( in names.directory_path).([ID 1318026.1])

Happy reading

Mathijs

Is it really the good – the – bad – and the Gui Dba building Racs?

Introduction

From my experience for some Database Administrators / sites it  still seem to be more tough /  hardcore to use the command line instead of using tools that  could do the same for you only with a lot less effort. And maybe that is also regarded to be a risk cause it is so easy. Well my thoughts on this is that a healthy mix on the use Gui – tools and command lines should  be an asset  to every administrator. And indeed before even starting debate when using the Gui of course you should always know what you are doing .. but then again that is always the case is it not ?

Creating an Rac manually – using DBCA.

Personally i think that the DBCA tool is  your best way forward if you are creating a new Rac environment. Of course  this tool also has also its limits ( such as on a 4 Node environment it will  by default set your cluster_database_instances default to 4 even when you build only a two node Rac)  but in general it is best way to work cause not only will create the Database , and the instances on the various Nodes  but it also adds the information to the 11G Grid infrastructure ( the Cluster ware ). And  for documentation purposes you can / should also always generate the create scripts in a default ( for me always /opt/oracle//admin/create) directory.

Yesterday i had to support a colleague who decided to create a two node RAC by hand . Indeed  even  when i suggested to him that using the DBCA would be helpful and fast in his actions it was his call to do it all by hand. But then he got stuck in the process with various actions.

If you decide to do  a create manually ( as they say maybe you should try to not use the Gui just cause it is written in this blog).  I think following steps are important:

  1. Work accurately ( setting up init oras , one for single Instance – Db) add the Instance specific information (in my case undo tablespace and local_listener ) were needed. Create spfile from that pfile.
  2. Start – create the single instance-db in full with the spfile.
  3. Once the single instance is alive start alter your spfile:
    1. alter system set cluster_database=true scope = spfile:
    2. alter system set instance_number= 1 scope = spfile sid=”instance1″ ;
    3. alter system set instance_number= 2 scope = spfile sid=”instance2″ ;
    4. alter system set thread = 1 scope = spfile sid=”instance1″ ;
    5. alter system set thread = 2 scope = spfile sid=”instance2″ ;
  4. Add at least the 3 redo groups to the second thread( alter database add logfile thread 2 group x:)
  5. set up the remote_listener with the scan listener
  6. alter system enable thread 2 ;
  7. Add a Taf service.
  8. As a check stop and start and stop both instances manually to see if the database is shared as it should be between the two instances.
  9. Add the information to the cluster:

srvctl  add database -d DB -o /opt/oracle/product/oracleVersion

srvctl  add instance -d DB -i DB1 -n Node1 srvctl  add instance -d DB -i DB2 -n Node2

srvctl modify instance -d DB -i DB1 -s +ASM1 srvctl modify instance -d DB -i DB2 -s +ASM2

srvctl  add service -d DB -s DB_TAF.domain -r DB1,DB2 srvctl start database -d DB srvctl start  service -d DB -s DB_TAF.domain

Ok, are  you still with me , do i rest my case here that it is a lot of work to do this the manual way ? And that these steps can be very  lets say error prone if you do not work 100%  accurate..

Yesterdays event where indeed not successful at first . Amongst other issues He got :

PRCR-1079 : Failed to start resource ora.db.db
CRS-5017: The resource action “ora.db.db start” met the following error:
ORA-29760: instance_number parameter not specified

Once again Google was our friend it showed post of a fellow blogger with regard to a Mos note:

MOS note ‘ORA-29760: instance_number parameter not specified’ When Starting the Database with Srvctl [ID 749515.1].

After that . Things worked .

Bottom-line of this story ? Plain and simple use the right and easiest tool for the Job you need to do. If that is a Gui  there is no dishonor in that . If you decide to go manual create , make sure you work accurately  hmm and have Google available for when things come up during the process.

Happy Reading ,

Mathijs

Once upon a time there was a prince2 and a production frog from an Oracle Dba perspective

Introduction.

A lot of  us have to do both projects and  day – to – day business.  What really sometimes is a sad thing happening is the mere fact that project seem to have no wider horizon then the end  of project .. It seems that the project deliverable  project -end-date is the only sacred time line..   Also not all too uncommon is the Database might be already offered to the customer before ALL  Q&A steps have finished by project managers.  So sometimes a prince might turn into a production frog  if you look at it from a production administrator perspective in the end.

Personally as an Oracle Database Administrator  I think  improving working together with Project manages should have two needs for improvement:

  1. Project managers need to be  made more aware of the Post go live situation if the implementation is only targeting a  go-live-date.
  2. Project manages should never ever ( did i mention ever) be allowed to  have customers entering the systems before go-live ( simply cause they might claim live status once they are hooked up to the database)

Hm and I have a dream I know it is wishful thinking but sometimes I  really think  it would be great to have Project people with a more technical background , maybe even with a  base training about  standards,  production  rules of other departments like Oracle ..

:) and for the Admins ..  Make sure you get involved in projects as early as possible and make sure , standards are followed.. or if you are in production side and a  Databases administrator is part  the project team make sure that he – she  is aware of your production world and   your standards.

Taking a first look at Prince2 if you care..:

http://www.prince2.com/prince2-process-model.asp

Happy reading

Mathijs

Alert log shows Private strand flush not complete

Introduction

Recently came across the following issue that a database is throwing  this message : Private strand flush not complete  during the switch of the redo logfile group. It is interesting to see the various opinions on this matter on the web, and in Mos. That is why i think taking inventory   and a plan might still be useful.

Details:

Database at hand is a 10.2.3.0 EE. on a Solaris box.

Looking under the hood i see:

SQL> select GROUP#,THREAD#,bytes/1024/1024 MB,members, STATUS from v$log order by 2,1 ;
GROUP#    THREAD#         MB    MEMBERS STATUS
 ---------- ---------- ---------- ---------- ----------------
 1          1       2000          2 INACTIVE
 2          1       2000          2 INACTIVE
 3          1       2000          2 CURRENT
 4          1       2000          2 INACTIVE
 5          1       2000          2 INACTIVE

Database is 1.24 TB

and looking at:

show parameter db_writer_processes
NAME                                 TYPE        VALUE
 ------------------------------------ ----------- ------------------------------
 db_writer_processes                  integer     2
Switches per hour:
Date            Switches
 ------------- ----------
 2013-03-13 02          8
 2013-03-13 03          7
 2013-03-13 04          3
 2013-03-13 05         13
 2013-03-13 06          9
 2013-03-13 07          2
 2013-03-13 08          2
 2013-03-13 09          9
 2013-03-13 10          1

According to MOS note Alert Log Messages: Private Strand Flush Not Complete [ID 372557.1] i should/could consider increasing the  db_writer_processes  at least from the current 2 to  4..  Which  sounds like a plan 2 Me.

Happy reading.

Mathijs