Small introduction to SQL*Net debugging [client side]

Say it’s that day again. You try to connect to a (new) database and once again you are facing SQL*Net problems.

C:/oracle/product/10.1.0/client/BIN>sqlplus /nolog
 
SQL*Plus: Release 10.1.0.4.0 - Production ON Thu Sep 29 10:37:15 2005
Copyright (c) 1982, 2005, Oracle.  ALL rights reserved.
 
 
SQL> conn scott/tiger@lab.amis.nl
 
 
ERROR: ORA-12154: 
TNS:could NOT resolve the CONNECT identifier specified

Blast!. What can you do? Once in a while i get a lot of these SQL*Net problems (network changes, etc, the wrong weather type). So to help you (and me), here i will you give you my grey-matter checklist :-) (it’s not complete but should help you to resolve 80% of the problems)

TNS/SQL*Net problem checklist

1. First of all convince yourself that your client Oracle software is correctly installed.
2. Check that your environment settings are correct for the client software
3. Check the listener settings on the database side (listener.ora)
4. Check your SQL*Net settings on your client machine (tnsnames.ora / sqlnet.ora)
5. Check if the TNS_ADMIN path has been set / the search order
6. Check your network
7. Check that you can you ping the database machine
8. Check your network route via the traceroute(unix) or tracert (windows) commands
9. Check what and how your database machine is resolved
10. Check if you can tnsping the database
11. Check if you can use the trcroute utility to gather more information
12. Check if you can use SQL*Plus from the command prompt
13. Check if you can find a collegae who is more experienced in handling SQL*Net problems
14. Check the Oracle SQL*Net documentation, metalink or google regarding you specific problem
15. Start tracing SQL*Net network data exchange on the client and server side
16. If its serious, contact an Oracle support specialist via a TAR on http://metalink.oracle.com

First of all convince yourself that your client Oracle software is correctly installed

I know, that’s easier said than done. If your sure that it worked – correctly was installed – you used the installation manual by the letter – nothing was changed regarding system/user privileges – etc – you probably can skip this check and pass on to the next step. If however you get strange errors like “message file not found” / “*.msb file not found”, you will have to check your software installation first. If the software was installed correctly than, the biggest chance is that system settings were changed or not correctly set. Get your Oracle client manual, read it and check for the correct settings of PATH, NLS_LANG, NLS_LIBRARY_PATH, ORACLE_HOME, ORACLE_BASE, etc. settings. These are system settings that determine where, what and how Oracle programs have to handle your OS environment.

Check that your environment settings are correct for your client software

How can you check these settings?. For instance on Windows, these parameters can be set in:

  • your registry [HKYE_LOCAL_MACHINE -> SOFTWARE -> ORACLE], look with the help of eg. “regedt32″ in the correct home registry key
  • your system environment parameter settings [start -> settings -> control panel -> system -> advanced tab -> environment button -> system and/or user settings]
  • locally in a shell or DOS command (run “command”, 16bits shell or “cmd”, 32bits shell) prompt window and check your environment parameters via the SET statement [SET {parameter}={value}]

Check the listener settings on the database side (listener.ora)

Your client software, in SQL*Net terms, has to match with the SQL*Net server settings. This means, in simple words that, the server SQL*Net deamon, the listener, must have the same settings / values as your SQL*Net client. This listener process is listening for incoming SQL*Net network requests. If your parameters match the ones defined on the server, than it will pass your network requests to the correct database environment. In simple, it wants to know:

  • What database are you refering too? [SID or SERVICE_NAME]
  • On which port is this database defined? [PORT]
  • On which host (machine) is this database defined? [HOST]
  • Which network protocol is this database looking for / listening on? [PROTOCOL]

These parameters are defined on the database side in the listener.ora file. At least HAT TO BE, nowadays a dba can determine that he want to use the auto-registering functionality of the database. This means that a database registers itself on the listener (the listener settings are determined on database level and passed on to the listener, so NO listerner.ora file is needed). To work properly, the listener needs nethertheless these values. You can ask your dba and/or start looking for this listener.ora file on the database server (default it resides in the directory $ORACLE_HOME/network/admin). An example listener.ora file looks like:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = LAB)
    )
  )
LISTENER =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)
                    (HOST = vamisnt02.amis.nl)(PORT = 1521))
  )

In real life you should not forget to set an encrypted password via the change password command and enable the ADMIN_RESTRICTIONS_listener parameter!

Check your SQL*Net settings on your client machine (tnsnames.ora / sqlnet.ora)

Your client SQL*Net counter part, has to supply these same settings (SID, PROTOCOL, PORT, HOST) to be able to connect to the database. This is (most of the time, as said its a “small introduction to SQL*Net”) defined on the client side in a configuration file called the tnsnames.ora file. So based on the listener.ora settings, the tnsnames.ora file should be (check for the complete syntax the Oracle Database Net Services Administrators Guide manual on OTN):

LAB.AMIS.NL=(DESCRIPTION=
                         (ADDRESS_LIST=(ADDRESS=
                         (PROTOCOL=TCP)
                         (HOST=vamisnt02.amis.nl)(PORT=1521)))
                         (CONNECT_DATA=(SERVICE_NAME=LAB)))

The LAB.AMIS.NL you see here is an ALIAS – a reference you use in your client software (eg. SQL*Plus, Toad, ODBC, etc). By referencing to this alias, the Oracle SQL*Net software is aware of the WHERE/WHAT/HOW to make a SQL*Net connection. There is an extra configuration file which is important in this context. It’s called sqlnet.ora. In this file a dba, or someone else, can define extra functionality. Among others, for example, you can enable SQL*Net tracing via this file. In this file parameters can be defined which set the default DOMAIN name (in my example above this is amis.nl or search order or default network naming. Some of these parameters are shown below:

names.default_domain = world
name.default_zone = world
names.directory_path = (TNSNAMES, LDAP)

To give you an example of what these entries will do. If you would use the tns alias “LAB” instead of “LAB.AMIS.NL”, this “LAB” alias will be translated to “LAB.WORLD” and because this is not defined in your tnsnames.ora file, your client can not create a connection to the database. Also you see in the example above the reference “NAMES.DIRECTORY_PATH”. This parameter defines the search order HOW to find the configuration file or method. In the example above, the tns alias will be resolved via the “tnsnames.ora” file and if it not succeeds the software will try to find a directory server which can.

Check if the TNS_ADMIN path has been set / search order

So you checked all this, it looks fine and still you can’t connect. Blast again! Where does SQL*Net search for these tnsnames.ora and sqlnet.ora files you may ask? SQL*Net looks for these files via a parameter which is called TNS_ADMIN and if it is not set it will search in the following order / in the following places: UNIX

1. .tnsnames.ora (mind the “dot”) in the users home directory
2. $ORACLE_HOME/network/admin
3. explicit path setting via TNS_ADMIN parameter

Windows

1. registry setting via TNS_ADMIN (in the correct HOMEn key of your Oracle client software)
2. system/user setting via system in your control panel
3. %ORACLE_HOME%/network/admin

Check your network

So let’s say you checked all these settings and you determined that the TNS_ADMIN parameter is set via the windows registry to a network network share called “O”. Then you should also check, for instance via explorer, that there is a actual network share defined called “O” AND that it is pointing to the correct place (the directory contains the correct tnsnames.ora, sqlnet.ora, etc files). If you, after all these checks still encounter problems, you will probably have network related problems. So how to check these?

Check that you can you ping the database machine

Unix and windows have a ping utility. With this utility you can test if a machine or TCP/IP nummer or machine name can be reached via the network. Be aware that the fact that you can’t reach this machine or tcp/ip address, not always means that the machine isn’t listening. The ping utility is also seen as a security risk, so (network) administrators sometimes decide that machines should not give a response on a ping request (and block the protocol the ping utility is based upon).

C:>ping vamisnt01.amis.nl
 
Pinging vamisnt01.amis.nl [10.10.10.2] with 32 bytes of data:
 
Reply from 10.10.10.2: bytes=32 time=1ms TTL=128
Reply from 10.10.10.2: bytes=32 time<1ms TTL=128
Reply from 10.10.10.2: bytes=32 time<1ms TTL=128
Reply from 10.10.10.2: bytes=32 time<1ms TTL=128
 
Ping statistics for 10.10.10.2:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 1ms, Average = 0ms

If a machine doesn’t give you a response, check again via it’s tcp/ip number This will sometimes work because the nameserver doesn’t know or can’t resolve the machine name and/or domainname to a tcp/ip number.
Check your network route via the traceroute(unix) or tracert (windows) commands
If your network reponse is slow or you know you should be able to reach the machine, but it just doesn’t anymore, use the traceroute (unix) or tracert utilities. This will show you the way ;-) or which part of the network (for example a firewall) makes transport impossible.

C:>tracert www.oracle.com
 
Tracing route to www.oracle.com [141.146.8.66]
over a maximum of 30 hops:
 
  1    21 ms     1 ms     1 ms  amis1.customer.bit.nl [x.x.x.x]
  2     2 ms     2 ms     2 ms  amis-gw.network.bit.nl [x.x.x.x]
  3     3 ms     3 ms     3 ms  jun1.telecity.network.bit.nl [213.136.31.5]
  4     4 ms     4 ms     3 ms  212.72.45.29
  5     4 ms     4 ms     4 ms  ae-0-56.mp2.amsterdam1.level3.net [213.244.165.114]
  6   122 ms   122 ms   122 ms  as-0-0.bbr2.dallas1.level3.net [64.159.0.137]
  7   122 ms   122 ms   122 ms  ge-8-0.hsa1.dallas1.level3.net [4.68.122.10]
  8   128 ms   127 ms   127 ms  unknown.level3.net [64.158.168.6]
  9   127 ms   128 ms   127 ms  austin1q7-swi-7-rtr-1-v10.oracle.com [141.146.1.1]
 10   133 ms   127 ms   129 ms  141.146.3.31
 11   128 ms   127 ms   128 ms  bigip-otn-portal.oracle.com [141.146.8.66]
 
Trace complete.

By the way if it is not a firewall problem, then you are allowed to make a connection via port 1521. You should be able to check this via the almighty telnet utility:

C:> telnet vamisnt02.amis.nl 1521

Check what and how your database machine is resolved

You should also find out how your HOST (the machine on which the database resides), defined in your tnsnames.ora file, is resolved (who or what is making the translation) on the network. A host name can, for instance, be translated by a nameserver (eg. DNS machine) or by a host file (/etc/hosts on unix or c:/windows/system32/drivers/host file on windows) into it’s corresponding tcp/ip nummer. An indication will be given by the use of the nslookup utility.

C:>nslookup www.oracle.com
Server:  amisdns.amis.local
Address:  10.10.10.1
 
Non-authoritative answer:
Name:    www.oracle.com
Address:  141.146.8.66
 
C:>nslookup
Server:  amisdns.amis.local
Address:  10.10.10.1
> help
Commands:   (identifiers are shown in uppercase, [] means optional)
NAME            - print info about the host/domain NAME using default server
NAME1 NAME2     - as above, but use NAME2 as server
help or ?       - print info on common commands
set OPTION      - set an option
    all                 - print options, current server and host
    [no]debug           - print debugging information
    [no]d2              - print exhaustive debugging information
    [no]defname         - append domain name to each query
    [no]recurse         - ask for recursive answer to query
    [no]search          - use domain search list
    [no]vc              - always use a virtual circuit
    domain=NAME         - set default domain name to NAME
    srchlist=N1[/N2/.../N6] - set domain to N1 and search list to N1,N2, etc.
    root=NAME           - set root server to NAME
    retry=X             - set number of retries to X
    timeout=X           - set initial time-out interval to X seconds
    type=X              - set query type (ex. A,ANY,CNAME,MX,NS,PTR,SOA,SRV)
    querytype=X         - same as type
    class=X             - set query class (ex. IN (Internet), ANY)
    [no]msxfr           - use MS fast zone transfer
    ixfrver=X           - current version to use in IXFR transfer request
server NAME     - set default server to NAME, using current default server
lserver NAME    - set default server to NAME, using initial server
finger [USER]   - finger the optional NAME at the current default host
root            - set current default server to the root
ls [opt] DOMAIN [> FILE] - list addresses in DOMAIN (optional: output to FILE)
    -a          -  list canonical names and aliases
    -d          -  list all records
    -t TYPE     -  list records of the given type (e.g. A,CNAME,MX,NS,PTR etc.)
view FILE           - sort an 'ls' output file and view it with pg
exit            - exit the program

In this example, the translation of “www.oracle.com” is made by the AMIS DNS nameserver “amisdns.amis.nl” machine. On unix/linux there is a /etc/resolv.conf file which will define the nameservers and the search order of how a network name is translated. It’s always a good idea to check, who or what is resolving your machine name.

Check if you can tnsping the database

You are still in no luck and it doesn’t work. Look and see if the Oracle tnsping utility can help you:

C:/oracle/product/10.1.0/client/BIN> tnsping lab.amis.nl
 
TNS Ping Utility for 32-bit Windows: Version 10.1.0.4.0 - Production on 29-SEP-2005 09:35:47
Copyright (c) 1997, 2003, Oracle.  All rights reserved.
Used parameter files: 
TNS-03505: Failed to resolve name
 
C:/oracle/product/10.1.0/client/BIN>set TNS_ADMIN=o:
 
C:/oracle/product/10.1.0/client/BIN>tnsping lab.amis.nl
 
TNS Ping Utility for 32-bit Windows: Version 10.1.0.4.0 - Production on 29-SEP-2
005 14:21:41
 
Copyright (c) 1997, 2003, Oracle.  All rights reserved.
 
Used parameter files:
o:sqlnet.ora
 
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=va
misnt02.amis.nl)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=LAB)))
OK (60 msec)

For the specific error messages and their meaning look it up via OTN or in the network trouble shooting guide chapter in the Administrators Guide mentioned earlier.

Check if you can use the trcroute utility

One of the “new” utilities, sometimes installed on your client, is the Oracle counterpart of the traceroute utility. It’s called trcroute and it gives you much more information than the tnsping command. I don’t have it by hand so here is the example from the manual.

trcroute sales
Trace Route Utility for Solaris: Version 10.1.0.2.0 on 15-NOV-2003 14:43:05
Copyright (c) 1999 Oracle Corporation. All rights reserved.
Route of TrcRoute:
------------------
Node: Client Time and address of entry into node:
-------------------------------------------------------------
25-FEB-2002 14:43:05 ADDRESS= PROTOCOL=TCP HOST=sales-server PORT=1521
TNS-12543: TNS:unable to connect to destination
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-03601: Failed in route information collection

Check if you can use SQL*Plus from the command prompt

If you have still no luck, check what kind of error message (extra information) SQL*Plus can provide you. Open a shell or DOS command window and check what error message the sqlplus (command line) or sqlplusw (GUI) software you will present.

C:/oracle/product/10.1.0/client/BIN>sqlplus scott/
 tiger@lab.amis.nl
 
SQL*Plus: Release 10.1.0.4.0 - Production on Thu Sep 29 18:33:42 2005
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
 
ERROR:
ORA-12154: TNS:could not resolve the connect identifier specified

Start gathering information, search for information

OK. It didn’t help! Now you probably hit an SQL*Net obscurity or you just don’t see it. The best way is now to get info via the internet, manuals OR involve a colleague in your problem. Sometimes looking at your problem with more then one pair of eyes, solves the problem. The internet is a huge playing field for gathering information. The best starting points are Oracle OTN network (http://otn.oracle.com), Oracle forums (also on OTN) or Google. If in luck a DBA or developer can give you a hand. If it is a production problem and you are in luck to have an account on metalink (http://metalink.oracle.com), you could open a iTAR / Service Request.
At the end.

The problems are – most of the time – or Oracle or network related. You can start with an easy “ping”, before really checking all the software and other mention checkpoints. SQL*Net problems can be very difficult to solve (the Admin manual, nowadays, has more then 300 pages). This shortlist will hopefully deal with most of your SQL*Net problems. So…

I hope, i could be off service ;-)