Adding Vip Address and Goldengate to Grid Infra structure 11.2.0.3

Introduction:

Earlier this week preparations have been started to add  the Goldengate software to the Grid  infrastructure of 11.2.0.0.3 on  the Billing environment on production. As part of that scenario   I also had to add a Vip address  that is to be used by the Goldengate  Software  as part of high(er) availability. In my concept  Goldengate Daemons are  running on on Node only  by default. During a Node crash ( of course not wanted  nor desired )  or as a way to load balance work on the cluster the Vip address and the Goldengate software need to stop and restart on the other Node. Below you will  find a working example as part of the preparations I have performed.  Some comment has been added to the specific steps.

Commands will be typed in italic in this blog.

Details

## First step will be to be adding the vip address 10.97.242.40 oradb-gg-localp1.dc-lasvegas.us  to the Grid Infra (GI). Note IP address and the description have been defined in the DNS. Once I got feedback that the address was added I was able to perform a nslookup. Of course it was not possible yet to ping the ip  because we first have to add it to the cluster as is done here.

## As root:
/opt/crs/product/11203/crs/bin/appvipcfg create -network=1 -ip=10.97.242.40 -vipname=oradb-gg-localp1.dc-lasvegas.us -user=root

## Once that is in place , grant permissions to Oracle user to work with the vip address:
(As root, allow the Oracle Grid infrastructure software owner (e.g. Oracle) to run the script to start the VIP.)

/opt/crs/product/11203/crs/bin/crsctl setperm resource oradb-gg-localp1.dc-lasvegas.us -u user:oracle:r-x

## Now it is time to start  the Vip:
## As Oracle, start the VIP:
/opt/crs/product/11203/crs/bin/crsctl start resource oradb-gg-localp1.dc-lasvegas.us

##Check our activities:
## As Oracle:

/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p

## In my setup  Goldengate is defined to be able to run on either node one (usapb1hr)  or  on node 2 (usapb2hr) in my four node cluster. And  Since i want to make sure it only runs on those two servers I add placement to restricted.
## As root:

/opt/crs/product/11203/crs/bin/crsctl modify resource oradb-gg-localp1.dc-lasvegas.us  -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”
/opt/crs/product/11203/crs/bin/crsctl modify resource oradb-gg-localp1.dc-lasvegas.us  -attr  “PLACEMENT=restricted”

## As always the taste of the creme brulee is in the details  so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p

##  Great that worked , now lets relocate the Vip to the other node as a test:
## As Oracle:

/opt/crs/product/11203/crs/bin/crsctl relocate resource oradb-gg-localp1.dc-lasvegas.us

## completed  action with a smile Because it worked as planned.

## As always the taste of the creme brulee is in the details  so let’ s check :
## As Oracle:
/opt/crs/product/11203/crs/bin/crsctl status resource oradb-gg-localp1.dc-lasvegas.us -p

## As part of making sure that setup from scratch was same on all machines ( had the same solution in Pre Prod env. ) let us first remove the existing resource  for Goldengate and then add it to the GI again.

/opt/crs/product/11203/crs/bin/crsctl delete resource myGoldengate

## as Oracle  ( white paper was very specific about that , performed it as root first time ending up with   wrong primary group in the ACL which i checked in the end) . So stick to plan ! And do this als ORACLE. Add the resource to the GI and  put in a relationship to the Vip address that has been created in the GI earlier, AND  inform the cluster about  the action script that is to be used during a relocate – server boot  – node crash . ( This script is in my case a shell script holding conditions like stop, start , status etc   and the correspondig commands in the Goldengate that are to be used by the GI:

/opt/crs/product/11203/crs/bin/crsctl add resource myGoldengate -type cluster_resource -attr “ACTION_SCRIPT=/opt/crs/product/11203/crs/crs/public/gg.local.active, CHECK_INTERVAL=30, START_DEPENDENCIES=’hard(oradb-gg-localp1.dc-lasvegas.us) pullup(oradb-gg-localp1.dc-lasvegas.us)’, STOP_DEPENDENCIES=’hard(oradb-gg-localp1.dc-lasvegas.us)‘”

## Altering  hosting members and placement again ( by default only one node part of hosting_members and placement=balanced by default).

## As root:
/opt/crs/product/11203/crs/bin/crsctl modify resource  myGoldengate -attr “HOSTING_MEMBERS=usapb1hr usapb2hr”

/opt/crs/product/11203/crs/bin/crsctl modify resource  myGoldengate -attr  “PLACEMENT=restricted”

## so in the end you should check it with this:

/opt/crs/product/11203/crs/crs/public [CRS]# crsctl status resource myGoldengate -p

## Time to set  set permission to myGoldengate (altering Ownership to myGoldengate user ( which is my OS user for this).
### As root:
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -o myGoldengate

## needed and sometimes forgotten to make sure that the oracle user ( who is also the owner of the Grid infra software on these boxes).
###As root, allow oracle to run the script to start the goldengate_app application.
/opt/crs/product/11203/crs/bin/crsctl setperm resource myGoldengate -u user:oracle:r-x

 

Wrap-up:

All preparations are now in place. During an already scheduled maintenance window  following steps will be performed to bring this scenario to a HA solution for Goldengate.

  • Stop the Goldengate software daemons ( at moment  stopped and started by hand) .
  • Start the  Goldengate resource  via the Grid Infra ( giving her control of status and activities) .
  • Perform checks that Goldengate is starting  its activities .
  • Perform a relocate  of the  Goldengate resource  via the Grid Infra to the other node.
  • Perform checks that Goldengate is starting  its activities .

As an old quote states . Success just loves preparation. With these preparations in place  I feel confident  for the maintenance window to put this Solution  live .

 

As Always ,  happy reading and till next Time ,

 

Mathijs

When PRCD-1027 and PRCD-1229 is spoiling your rainy day

Introduction:

More then one year ago i had set up an Oracle restart environment  with Grid Infra, ASM and Databases all in 11.2.0.2.0 since that was a requirement from vendor at first. Once the server had been handed over to production  I got the request that it should also host EMC  based  Clones and those clones where 11.2.0.3.0. That meant i had to upgrade both Grid infrastructure and the database software and of course the databases as well.

So i geared up , did an upgrade of the GI and the Rdbms software and of course of the local databases in place. After that  the Emc clones had been added and every thing  looked fine .

Until ……….

Error Messages after Server reboot:

Well until the server got rebooted. After that server reboot a first sign that things where wrong was that the databases , did not start via the grid infra structure which was not expected !

So there I was again ready for solving another puzzle and of course people waiting for the DBs to come online so they could work.

## First clue:

I checked the Resource ( the database ) in the cluster with:   crsctl status resource ….  –p

Much to my surprise that showed the wrong oracle home ( it was 11.2.0.2.0 the initial Oracle Home before upgrade). But I was so sure that I had upgraded the database.. What did i miss . Even more strange was that the Cluster agent kept altering my oratab  for the specific database to have the old oracle home ( and  it would almost stick out tongue at me telling  #line has been added by agent ).

## Second clue

When i altered the oratab to show the correct oracle home i could start the database via sqlplus which was indeed my second clue .

After a big face-palm it became clear to me that the cluster was not having correct status in the cluster ware about that Oracle Home ..

## Will srvctl modify do the Job:

srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db

##output:

PRCD-1027 : Failed to retrieve database mydb

PRCD-1229 : An attempt to access configuration of database migoms was rejected because its version 11.2.0.2.0 differs from the program version 11.2.0.3.0. Instead run the program from /opt/oracle/product/11202_ee_64/db

Well that was not expected.  Especially since that other clue was that the db can be started as 1120.3 db when oracle env is put properly in ORATAB.

##Solution:

First I tried :

srvctl modify database -d mydb -o /opt/oracle/product/11203_ee_64/db

but that is wrong as we already saw in this post.

Hmm then i thought of an expression in German. once you start doing it the right way  things will start to work for you :

Plain and simple this is what  I have to do making things right again:

srvctl upgrade database -d  mydb -o /opt/oracle/product/11203_ee_64/db

After that I started mydb via cluster she is happy now.

## Bottom-line ( aka lesson learned ).

If you upgrade your databases in on an oracle Restart /  Rac cluster environment make it part of your upgrade plan to upgrade the information in the clusterlayer of that specific database.

As always,

Happy Reading and till we meet again.

Mathijs

Playing with Cluster commands 4 single Instances in Grid Infra Structure

Introduction.

This weekend ( 20 – 22  February 2015)  I am involved in a big Data migration of app 900K  Customers  and load of data into environments that i have set up as Single Instances under control of the  Grid Infra structure in 11.2.0.3 on Red Hat Linux. As always during such big operations there is a need to have a fall-back plan for when all will break. Since I have the luxury  that  we Can use EMC clone technology a fall-back scenario have been set up where during  the week EMC storage Clones have been setup for the databases in scope. These clones are permanent syncing  with the  Source databases on the machines at the moment.

This Friday the application will be stopped, After feedback from Application team I will have to stop  the databases via the cluster (GI). As always as prep , started to make notes which i will share / elaborate here to do stop – start – checks of those Databases..

Setup:

All my databases  have been registered in the GI( Grid Infra) as an application resource since I was not allowed to use RAC or Rac one  during setup of these environments.  Yet  I had to offer  a higher availability  that is why i implemented a poor-mans-rac where a Database becomes a resource in the cluster , that is capable of failing over to another ( specific and specified Node in the cluster).

In the end  when i had my setup in place , the information in the cluster looks  pretty much like this:

### status in detail

/opt/crs/product/11203/crs/bin/crsctl status resource app.mydb1.db -p

NAME=app.mydb1.db

TYPE=cluster_resource

ACL=owner:oracle:rwx,pgrp:dba:rwx,other::r–

ACTION_FAILURE_TEMPLATE=

ACTION_SCRIPT=/opt/crs/product/11203/crs/crs/public/ora.mydb1.active

ACTIVE_PLACEMENT=0

AGENT_FILENAME=%CRS_HOME%/bin/scriptagent

AUTO_START=restore

CARDINALITY=1

CHECK_INTERVAL=10

DEFAULT_TEMPLATE=

DEGREE=1

DESCRIPTION=Resource mydb1 DB

ENABLED=1

FAILOVER_DELAY=0

FAILURE_INTERVAL=0

FAILURE_THRESHOLD=0

HOSTING_MEMBERS=mysrvr05hr mysrvr04hr

LOAD=1

LOGGING_LEVEL=1

NOT_RESTARTING_TEMPLATE=

OFFLINE_CHECK_INTERVAL=0

PLACEMENT=restricted

PROFILE_CHANGE_TEMPLATE=

RESTART_ATTEMPTS=1

SCRIPT_TIMEOUT=60

SERVER_POOLS=

START_DEPENDENCIES=hard(ora.MYDB1_DATA.dg,ora.MYDB1_FRA.dg,ora.MYDB1_REDO.dg) weak(type:ora.listener.type,global:type:ora.scan_listener.type,uniform:ora.ons,global:ora.gns) pullup(ora.MYDB1_DATA.dg,ora.MYDB1_FRA.dg,ora.MYDB1_REDO.dg)

START_TIMEOUT=600

STATE_CHANGE_TEMPLATE=

STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.MYDB1_DATA.dg,shutdown:ora.MYDB1_FRA.dg,shutdown:ora.MYDB1_REDO.dg)

STOP_TIMEOUT=600

UPTIME_THRESHOLD=1h

As  you can see i have set up the dependencies with the disk groups ( start_ and stop_)  i have  set up placement to be restricted ( so the db can only start on restricted number of nodes ( which i defined in hosting_members).

This evening action plan will involve:

### Checking my resources for status and where they are running at moment. So i know where they are when i start my actions . PS the -C 3 is a nice option to show some extra lines in Linux level  about the  resource.

/opt/crs/product/11203/crs/bin/crsctl status resource -t|grep app -C 3

 app.mydb1.db

      1        ONLINE  ONLINE       mysrvr05hr                                    

app.mydb2.db

      1        ONLINE  ONLINE       mysrvr04hr                                    

app.mydb3.db

      1        ONLINE  ONLINE       mysrvr02hr                                     

###  checking status  on a high level .

/opt/crs/product/11203/crs/bin/crsctl status resource app.mydb1.db

/opt/crs/product/11203/crs/bin/crsctl status resource app.mydb2.db

/opt/crs/product/11203/crs/bin/crsctl status resource app.mydb3.db

In order to enable my colleagues to do the EMC split properly the application will be stopped. Once i have my Go  after that  i will stop the databases using GI commands:

### stopping resources:

/opt/crs/product/11203/crs/bin/crsctl stop resource app.mydb1.db

/opt/crs/product/11203/crs/bin/crsctl stop resource app.mydb2.db

/opt/crs/product/11203/crs/bin/crsctl stop resource app.mydb3.db

Once  my storage colleague has finished the EMC split ( this should take only minutes because the databases have been  in sync mode with the production all week, i will put some databases in noarchivelog mode manually to be faster in doing Datapump loads. After shutting down  the databases again I will start them again using the GI command:

### starting resources:

/opt/crs/product/11203/crs/bin/crsctl start resource app.mydb1.db

/opt/crs/product/11203/crs/bin/crsctl start resource app.mydb2.db

/opt/crs/product/11203/crs/bin/crsctl start resource app.mydb3.db

##-  Relocate  if needed                                                                   

–  =========================================                                                                     

–  server mysrvr05hr                                     :                                                                                              

–                         crsctl relocate  resource app.mydb1.db                                                

 

–  server mysrvr04hr                                     :                                                                                              

–                         crsctl relocate  resource app.mydb2.db                                                

 

## Alternatively :

–  server mysrvr05hr:

–                         crsctl relocate  resource app.mydb1.db   -n mysrvr04hr

–  server mysrvr04hr:

–                         crsctl relocate  resource app.mydb2.db   -n mysrvr05hr

On Saturday will stop the databases that are in noarchivelog mode again via the cluster and put them back to archivelog mode. After that i have scheduled a level 0 Backup with rman.

Happy reading,

Mathijs.

11gR2 Database Services and Instance Shutdown

Mathijs Bruggink:

This needs check with app supplier. Great note.

Originally posted on jarneil:

I’m a big fan of accessing the database via services and there are some nice new features with database services in 11gR2. However I got a nasty shock when performing some patch maintenance with an 11.2.0.1 RAC system that had applications using services. Essentially I did not realise what happens to a service when you shutdown an instance for maintenance. Let me demonstrate:

This has the following configuration:

So the service is now online on the node where DBA1 (preferred node in definition) runs:

any examples I’ve seen, show what happens to a service when you perform shutdown abort. First lets see what our tns connection looks like:

Which gives the following in V$SESSION when you connect using this definition:

Lets abort the node:

Oh that’s not good. Look whats happened to my application:

Let’s bring everything back and try a different kind of shutdown This time using the following:

View original 133 more words

The Classic case of OFFLINE Drop in an EMC BCV with Transportable Tablespace

Introduction:

Recently  in the team we came across the fact that we had to be working a lot with technology inside the EMC  storage environment which  is called  BCV ( business continuity  volume) .  This technology enables the possibility  to clone a (production) Oracle database to another server.   Another method would be to use Snaps in the EMC  but that is out of scope for this blog now ( though I do work with them as well and to an Oracle Dba  the End result is the same  cause both technologies offer replication – cloning ).

In this scenario there are two ways to do to an replication of the data in the EMC env..

  1. Either you will do an alter database begin backup – and backup than do a split in the EMC  , do and  alter database end backup on the source side . After that on the Unix Side the disks need to be made visible and mounted (when using file systems). If you are using ASM then Linux administrators need to make sure that all the disks needed for the database are present in the multi-path configuration. Once that Unix/ Linux part is done you could  mount the database ( when using a file system structure  )  or mount the disk groups in ASM and then mount the database for  backups  or open the database.
  2. Or you do a split in the EMC ( without the begin and end backup) on  the source environment.  The rest  of the scenario is similar to 1 which means  make the disk visible and mount the database or mount the disk groups , then mount the database. In this scenario you will open the Database on this target side thus performing a crash recovery  of the instance.

In this blog we will follow scenario 2. Once the file systems are mounted on the target Unix machine  the database is opened.  Purpose of this all would be to create  a migration database where the project  members could consolidate data in a local and extra created tablespace as part of a migration strategy.

Details

After the file systems had been mounted by Unix Admins , by dbas request some extra local mount points had been set up  on top of the  mount points that had come across  with the BCV.   Once the database was opened the first time an extra tablespace was  created on those file systems only locally present on the target machine.  This tablespace was created 1.1TB in size. Together with project it was agreed that this tablespace would become a transportable tablespace since recreating it after  a BCV refresh would take some 6 to 7 hours each time a refresh occurred.  Making it a transportable tablespace would offer the idea of unplugging it right before a BCV refresh would occur and  plugging it back in after a refresh thus saving time.

This scenario worked well  many times, but as always one time things did not work out well. After a refresh and after the plug-in of the tablespaces there was technical issue on the local file systems . Unix Admins had run repair jobs on the file system where the datafiles of the  transportable tablespace (TTS)  was located and in the end even had cleaned out  that file system ..   So there i was with a refreshed database  that had plugged in the transportable tablespace of 1.1TB  and the database  had crashed and would not open ( since the datafiles for the  TTS where missing).  A refresh  from production would take  3 – 4 hours extra downtime  and i would still have to recreate the  tablespace by hand anyhow  after that .

Then i recalled a scenario from the old days in which  the scenario was that you had a database without a backup of the new created tablespace that had crashed. Once again Google was my friend and  after doing some reading I set up the scenario below after confirming that this could be my way out:

## First of all it was needed to find  all  the datafiles supposed to be part of  this  TTS:

select FILE#,substr(NAME,1,80)
from v$datafile
where name like ‘%CVN_TBS%’
order by 1;

SQL> /

FILE# SUBSTR(NAME,1,80)
———- ——————————————————————————–
1063 /db/MYDB1/cvn_tbs/CVN_TBS_050.dbf
.

.

.
1113 /db/MYDB1/cvn_tbs/CVN_TBS_005.dbf
1114 /db/MYDB1/cvn_tbs/CVN_TBS_004.dbf
1115 /db/MYDB1/cvn_tbs/CVN_TBS_003.dbf
1116 /db/MYDB1/cvn_tbs/CVN_TBS_002.dbf
1117 /db/MYDB1/cvn_tbs/CVN_TBS_001.dbf

55 rows selected.

##  Next step was to offline drop them since there was no Backup present.

select ‘alter database datafile ‘||file#||’ offline drop ;’
from v$datafile
where name like ‘%CVN_TBS%’
order by 1;

## Spooled it to a file and ran the script so that dropped the datafiles ..

##  In the next step opened db
alter database open;

## Recreating the 1.1  TB  tablespace would take long hours so i created the tablespace first with a small datafile then started alter  tablespace  with ranges of datafile to be added in script and ran it in background..

#!/bin/ksh
#
# start all  steps
#

## recreate
cat step6_all.ksh
nohup /opt/oracle/SRCENS1/admin/create/step6_1.ksh&
sleep 60
nohup /opt/oracle/SRCENS1/admin/create/step6_11.ksh& <– adding datafile #2 – 10
nohup /opt/oracle/SRCENS1/admin/create/step6_21.ksh& <– adding datafile #11 – 20
nohup /opt/oracle/SRCENS1/admin/create/step6_31.ksh&
nohup /opt/oracle/SRCENS1/admin/create/step6_41.ksh&
nohup /opt/oracle/SRCENS1/admin/create/step6_51.ksh&

Since  I knew that project would not need all 1.1 TB from the first minute that scenario  minimized my downtime, thus saving time. And indeed at the next normal refresh  first I unplugged the TTS again , got a new refreshed BCV and plugged the TTS back in .  Thus making everybody happy again.

As always  happy reading  and till we meet again,

Mathijs

When your Restart / Rac Node screams : [ohasd(1749)]CRS-0004:logging terminated

Introduction:

During  one of those days on a Monday where  a hot line ( ticket ) service was provided team received a ticket that all instances  of an Oracle Restart ( 11.2.0.3)  where down. I assisted in doing  troubleshooting  which is always fun because that gives a bit of a Sherlock Holmes idea investigating issues at hand .  After logging on to the box  I saw  that the ASM Instance was up  but a  “crsctl check  has”  showed that no communication with has daemon could be established.  So the puzzle became a bit more interesting at that moment .

Details:

  • Environment is an Oracle restart environment on Linux.
  • 11.2.0.3 Grid Infra with 11.2.0.3 Databases running on it
  • Grid infra has been installed in: /opt/crs/product/11203/crs/
  • Environment went down during the weekend ( fortunately it was a Test environment  with a standard office time SLA)

It was time to  take a peek in the alert-log of the cluster on that node.  Since my installation was done in: /opt/crs/product/11203/crs/  that meant I had to look for the log-file in the subdirectory:  /opt/crs/product/11203/crs/log/mysrvr01/   ( where in  the cluster mysrvr01 is the name of the node  i was working on ) to see the alert-file. In there I found this showing what was going on:

2014-10-18 10:47:37.060:
[ohasd(1749)]CRS-0004:logging terminated for the process. log file: “/opt/crs/product/11203/crs/log/mysrvr01/ohasd/ohasd.log”
2014-10-18 10:47:37.128:
[ohasd(1749)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /opt/crs/product/11203/crs/auth/ohasd/mysrvr01/A4107671
2014-10-18 10:47:37.146:
[ohasd(1749)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /opt/crs/product/11203/crs/auth/ohasd/mysrvr01/A5139099

With this as a clue dear Watson it was a lot easier to proceed.  Started hunting large files in the file system of the  /opt/crs/product/11203/crs/  and found a flood of XML files  that had not  been erased for ages there . Which made me realize that a clean up job in cron would become handy for this as well.

After  removing the xmls and checking the mount point again  I logged in as Root  and first issued a  crsctl start has.

After some time i noticed in the alert log of the node :
014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.DATA1.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.DUMMY_DATA.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:26:50.847:
[/opt/crs/product/11203/crs/bin/oraagent.bin(32461)]CRS-5818:Aborted command ‘check’ for resource ‘ora.FRA1.dg’. Details at (:CRSAGF00113:) {0:0:2} in /opt/crs/product/11203/crs/log/mysrvr01/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
2014-10-20 08:27:52.189:
[evmd(4105)]CRS-1401:EVMD started on node mysrvr01.
2014-10-20 08:27:53.685:
[cssd(4193)]CRS-1713:CSSD daemon is started in local-only mode
2014-10-20 08:27:55.581:
[ohasd(32421)]CRS-2767:Resource state recovery not attempted for ‘ora.diskmon’ as its target state is OFFLINE
2014-10-20 08:28:02.233:
[cssd(4193)]CRS-1601:CSSD Reconfiguration complete. Active nodes are mysrvr01 .

When I checked again :) ah good news not only ASM instance running but also all Instances present on the box .  Another puzzle solved.

Happy reading and till next Time ,

Mathijs

PRVF-4557 During installation of Oracle Binaries ( 11.2.0.3)

Introduction:

During an Oracle software install for a new RAC environment once again  I was surprised by Oracle . I had installed the Grid infrastructure (GI) 11.2.0.3 on RedHat Linux with success and I was about to install the Oracle Binaries which should be a piece of cake with the installed  GI up and running properly.  According to the old quote “Never stop learning ( or being surprised  )”   during the preparations the runInstaller  managed to give me a surprise  and I was unable to continue with installation. So obviously this  surprise needed  fixing first.

 

Details:

When running the runInstaller , after choosing the option to install the Oracle binaries on a Real Application Cluster environment  and after  a couple of options  I received a pop up telling me :

PRVF-4557 : Node application “ora.svrb1hr.vip” is offline on node

I was unable to go ahead and had to investigate. As always Google and the Oracle Community where  my brothers in oracle arms for this so i Came across this scenario :

First I checked my hosts file to make sure  information to be present on both nodes. That showed  following details which looked ok:

oracle@svrb1hr:/opt/oracle [CRM]# grep  vip /etc/hosts 
10.97.242.32 svrb1hr-vip.myenv.dc-world.de svrb1hr-vip
10.97.242.33 svrb2hr-vip.myenv.dc-world.de svrb2hr-vip

Next step was  checking my cluster resources. As  mentioned before GI install had finished  properly the day before so that really made me wonder:

oracle@svrb1hr:/opt/oracle [CRM]# crsctl status resource -t 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CLUSTERDATA.dg
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER.lsnr
               ONLINE  OFFLINE      svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.asm
               ONLINE  ONLINE       svrb1hr                 Started             
               ONLINE  ONLINE       svrb2hr                 Started             
ora.gsd
               OFFLINE OFFLINE      svrb1hr                                     
               OFFLINE OFFLINE      svrb2hr                                     
ora.net1.network
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.ons
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.registry.acfs
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.cvu
      1        ONLINE  ONLINE       svrb2hr                                     
ora.svrb1hr.vip
      1        ONLINE  INTERMEDIATE svrb2hr                 FAILED OVER         
ora.svrb2hr.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.oc4j
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan1.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan2.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.scan3.vip
      1        ONLINE  ONLINE       svrb1hr                                     

Well that was unexpected, because  expectation was that the srvrb1hr.vip should be running on the first node. Still don’t understand what happened to cause this. Hmm and frankly if you have suggestions what happened please let me know . But  I did know  it was needed to bring back the vip  address to the first server.

First attempt  was to issue the command needed  ( crs_relocate ) on the node where  I was already working ( node 1) .

oracle@svrb1hr:/opt/oracle [CRM]# which crs_relocate
/opt/crs/product/11203/crs/bin/crs_relocate

oracle@svrb1hr:/opt/oracle [CRM]# crs_relocate svrb1hr.vip
CRS-0210: Could not find resource ‘svrb1hr.vip’.
###  activities needed 2 b done from second node

Grumbling with this  but well at least it was explained what to do next ….

So I opened a session against the second node , made sure my Oracle Home was pointing to the GI.

oracle@svrb1hr:/opt/oracle [CRM]# ssh svrb2hr

On the second box  the command crs_relocate was entered:

                     
oracle@svrb2hr:/opt/oracle [CRS]# crs_relocate ora.svrb1hr.vip
Attempting to stop `ora.svrb1hr.vip` on member `svrb2hr`
Stop of `ora.svrb1hr.vip` on member `svrb2hr` succeeded.
Attempting to start `ora.svrb1hr.vip` on member `svrb1hr`
Start of `ora.svrb1hr.vip` on member `svrb1hr` succeeded.
Attempting to start `ora.LISTENER.lsnr` on member `svrb1hr`
Start of `ora.LISTENER.lsnr` on member `svrb1hr` succeeded.

Well that  looked promessing  so let’s check one more time then:

 

oracle@svrb2hr:/opt/oracle [CRS]#  crsctl status resource -t 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CLUSTERDATA.dg
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER.lsnr
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.asm
               ONLINE  ONLINE       svrb1hr                 Started             
               ONLINE  ONLINE       svrb2hr                 Started             
ora.gsd
               OFFLINE OFFLINE      svrb1hr                                     
               OFFLINE OFFLINE      svrb2hr                                     
ora.net1.network
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.ons
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
ora.registry.acfs
               ONLINE  ONLINE       svrb1hr                                     
               ONLINE  ONLINE       svrb2hr                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       svrb2hr                                     
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       svrb1hr                                     
ora.cvu
      1        ONLINE  ONLINE       svrb2hr                                     
ora.svrb1hr.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.svrb2hr.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.oc4j
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan1.vip
      1        ONLINE  ONLINE       svrb1hr                                     
ora.scan2.vip
      1        ONLINE  ONLINE       svrb2hr                                     
ora.scan3.vip
      1        ONLINE  ONLINE       svrb1hr

 

Much better!!  after this I restarted the runInstaller again from the first node and indeed installation of  the Oracle Binaries was flawless.

 

As always  happy  reading ,

 

Mathijs