Frederick Tang Weblog

Stories about Oracle concepts I have learnt from work, and the occasional brain-dump…

Failover for 9i RAC

with 3 comments

– Edit: I have updated the title of this post to include 9i. New failover features are available in 10g, so this article is a little out-of-date.

Over the week, my colleagues and I discussed application failover on a Oracle9i RAC database. I don’t know much about how RAC works, so I decided to do some research and post some notes here. Hopefully, this information can add to someone else’s researching into understanding the technology.

Disclaimer: What I am posting here isn’t something new, the information is freely available in Oracle documentation. If you find the information I post here is wrong, please kindly let me know. By all means, test the technology for yourselves. If you would like to ask a question, please feel free and I will try research a bit more to give an educated answer when I have time.

To start this off, I would imagine I have a 2-node Oracle9i RAC database and the simplest client tnsnames entry would look like this:

BLOGRAC =
  (DESCRIPTION =
     (LOAD_BALANCE = ON)
     (ADDRESS_LIST =
        (ADDRESS=(PROTOCOL=TCP)(HOST=blograc1)(PORT=1521))
        (ADDRESS=(PROTOCOL=TCP)(HOST=blograc2)(PORT=1521))
     )
     (CONNECT_DATA = (SERVICE_NAME=BLOGRAC))
   )    

The first failover feature is connect-time failover, where “a client connect request is forwarded to another listener if a listener is not responding”[source][source]. To enable this, the client tnsnames entry might look like this:

BLOGRAC =
  (DESCRIPTION =
     (LOAD_BALANCE = ON)
     (FAILOVER = ON)
     (ADDRESS_LIST =
        (ADDRESS=(PROTOCOL=TCP)(HOST=blograc1)(PORT=1521))
        (ADDRESS=(PROTOCOL=TCP)(HOST=blograc2)(PORT=1521))
     )
     (CONNECT_DATA = (SERVICE_NAME=BLOGRAC))
   )

The next failover feature is Transparent Application Failover (TAF), where “the failover and reestablishment of application-to-service connections. It enables client applications to automatically reconnect to the database if the connection fails, and optionally resume a SELECT statement that was in progress. This reconnect happens automatically from within the Oracle Call Interface (OCI) library.” [source].

TAF has several restrictions [source]:

  • All PL/SQL package states on the server are lost at failover
  • ALTER SESSION statements are lost
  • If failover occurs when a transaction is in progress, then each subsequent call causes an error message until the user issues an OCITransRollback call. Then Oracle issues an Oracle Call Interface (OCI) success message. Be sure to check this message to see if you must perform additional operations.
  • Oracle fails over the database connection and if TYPE=SELECT in the FAILOVER_MODE section of the service name description, Oracle also attempts to fail over the query
  • Continuing work on failed-over cursors can result in an error message

To enable this feature, the client tnsnames entry might look like this:

BLOGRAC =
  (DESCRIPTION =
     (LOAD_BALANCE = ON)
     (FAILOVER = ON)
     (ADDRESS_LIST =
        (ADDRESS=(PROTOCOL=TCP)(HOST=blograc1)(PORT=1521))
        (ADDRESS=(PROTOCOL=TCP)(HOST=blograc2)(PORT=1521))
     )
     (CONNECT_DATA =
        (SERVICE_NAME=BLOGRAC)
        (FAILOVER_MODE=
           (TYPE=select)
           (METHOD=basic))
     )
   )

The TYPE are session| select | none, and METHOD are basic | preconnect [source].

As stated above, TAF is only supported on JDBC OCI drivers and not on JDBC Thin drivers. I found a couple of Oracle Metalink Notes that re-iterates this, although they are updated for Oracle10g, they still apply to Oracle9i. I also found a note which explains how to write Java code to enable OCI TAF callbacks:

Note:297490.1 Is TAF (Transparent Application Failover) Supported On The Latest JDBC Thin Driver (10g)?

Note:465423.1 Can the JDBC Thin Driver Do Failover by Specifying FAILOVER_MODE?

Note:276225.1 JDBC Sample on How to Implement OCI TAF callbacks with DML and PL/SQL

This is an aside… I wanted to know what error will occur to a Java application [source] when there’s instance shutdown or failure. I have written a very simple Java program to test this, and it will query results from a big table I created on a Oracle9i RAC database, using the Oracle9i JDBC Thin Driver:

1. Oracle instance shutdown

Vendor Error Code: 1089
java.sql.SQLException: ORA-01089: immediate shutdown in progress – no operations are permitted
    at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
    at oracle.jdbc.ttc7.TTIoer.processError(TTIoer.java:289)
    at oracle.jdbc.ttc7.Oclose.receive(Oclose.java:122)
    at oracle.jdbc.ttc7.TTC7Protocol.close(TTC7Protocol.java:683)
    at oracle.jdbc.driver.OracleStatement.close(OracleStatement.java:644)
    at TestDBOracle.main(TestDBOracle.java:49)

2. Oracle instance failure (by killing the pmon and smon processes) – yes, it came out with funny characters…

Vendor Error Code: 17410
java.sql.SQLException: �S���q��y����
    at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
    at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179)
    at oracle.jdbc.dbaccess.DBError.check_error(DBError.java:1160)
    at oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:963)
    at oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893)
    at oracle.jdbc.ttc7.Oclose.receive(Oclose.java:101)
    at oracle.jdbc.ttc7.TTC7Protocol.close(TTC7Protocol.java:683)
    at oracle.jdbc.driver.OracleStatement.close(OracleStatement.java:644)
    at TestDBOracle.main(TestDBOracle.java:49)

[Please note: If your application is getting a 17410 error, it doesn't always mean the database / instance died.]

Written by fredericktang

May 30, 2008 at 9:24 am

Posted in 9i, Oracle

3 Responses

Subscribe to comments with RSS.

  1. Thanks Frederick,

    I was actually looking for information on whether TAF is supported on jdbc thin driver. Now I have my answers.

    -Naveenkanth

    Naveenkanth

    November 3, 2008 at 1:22 pm

  2. Hi, I am having problem on connect-time failover configuration. Actually I have three nodes as node_a, node_b, node_c. I need to configure as:
    If instance fails on node_a, then new coming client must be redirect to node_b. Mean time if node_b also fails then the new connection need to be redirected to node_c.

    Following is my client’s tnsnames.ora
    RACDB =
    (DESCRIPTION =
    (FAILOVER = true)
    (LOAD_BALANCE = no)
    (ADDRESS = (PROTOCOL = TCP)(HOST = node_a-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = node_b-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = node_c-vip)(PORT = 1521))
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = racdb)
    (FAILOVER_MODE =
    (TYPE = SELECT)
    (METHOD = BASIC)
    (RETRIES = 30)
    (DELAY = 5)
    )
    )
    )

    I am not able to connect to node_b in case of failure of instance on node_a.

    Dilli R. Maharjan

    March 10, 2009 at 3:05 pm

  3. Hi Dilli,

    I appreciate your question, but I would like to not engage in troubleshooting on this blog, which is more for forums.

    Nevertheless, thinking this through aloud…

    1. The tnsnames configuration suggests you have a 10g RAC as opposed to 9i RAC (because you are using VIPs).

    2. I think your question is why hasn’t client-side connect time failover worked when there is an instance failure (on node_a).

    3. Your configuration has “failover=true”, which means client-side connect-time failover is enabled. Connect-time failover by definition, enables clients to connect to another listener if the initial connection to the first listener fails. The number of listener protocol addresses determines how many listeners are tried. Without connect-time failover, Oracle Net attempts a connection with only one listener. (http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/concepts.htm#sthref170)

    * Note: it’s to do with connection to the listener, not to the database or instance.

    4. Your configuration has “load_balance=no”, which means client-side load balancing is disabled. This means addresses on the list will be accessed sequentially, i.e. always connect node_a-vip first. Note, if load_balance=yes, I think addresses on the list will be used in a random fashion?

    5. My understanding is that if there was an instance failure (e.g. instance on node_a), the listener (e.g. listener on node_a) should still be alive. This can be simulated by shutting down the node_a instance. Therefore, in this scenario, the node_a listener should actually have responded, and redirect your client to either node_b or node_c listener (depending on server-side load balancing). This is better explained here http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/advcfg.htm#sthref1267 and more importantly http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/advcfg.htm#i480740.

    6. If my assumptions above is true, then there might be some RAC configuration issues or you may have to clarify how you were not able to connect to node_b or node_c when there was an instance failure on node_a. Perhaps there might be an ORA- error message, or timeout or some other evidence that might provide us with some clues?

    7. To test connect-time failover, try having all instances up, and shutdown node_a listener, while keeping node_b and node_c listeners up. I think you will find the node_b listener will respond to the connection request (given node_b-vip listener is the next on the address list, and load_balance=off).

    8. If the above test works, and you are still wondering why your original scenario didn’t work, then I would suggest taking a look at your RAC configurations. For example: http://forums.oracle.com/forums/thread.jspa?messageID=2740669&#2740669 and http://fredericktang.wordpress.com/2008/04/11/ora-12535-tnsoperation-timed-out/.

    It would be important to cross-check with documentation, as I am not the subject-matter expert.
    I hope I got all the facts right, if anyone spot an error, feel free to contribute…

    Hope this helps…
    Fred

    fredericktang

    March 11, 2009 at 6:03 am


Leave a Reply