Database users may complain that something isn’t right with the application and request that the DBAs have a look in the database for problems.  Oracle doesn’t record every error reported to database clients in the alert log, just server related messages, (internal errors, snapshot too old, segment cannot extend, etc).  It is the application that should capture and/or report the full error stack returned to the database client, but unfortunately not all of them do.

The following is an example of this situation, in which I had to track down the cause of a problem where the application should have made that clear from its error log.

Application is Stuck – Example Case

An application owner asked me to look at the database, because an job on the application server was stuck and appeared to  be looping.  I asked for the error messages and while waiting for them, I fired up SQL*Plus.

The predominant and most unusual wait event for the recent and active sessions was “undo segment recovery”.  This happens when the session is waiting for the PMON to roll back a dead transaction.

V$SESSION showed that the SQL being executed just before the wait was:

select STATE from DUMMY where ID = :1  FOR UPDATE

By this time I had received the application error logs.  They referred to a different select statement on the same table, but the only clue was “This may be due to a transient race condition. Rolling back and retrying the transaction.”

The “rolling back” error text fitted with the “undo segment recovery” wait event I was seeing, even if the SQL didn’t match.  The application was trying to do the same thing with the same result repeatedly.

I noticed that the number of “SQL*Net break/reset to client” wait events were steadily increasing.  This indicates error messages were being generated for some database sessions.

At this point I could either try to trace from the client side, or from the server side.  I only had access to the database server, so I enabled SQL tracing for a couple of likely database sessions and soon had a trace file of a session that had been waiting on “undo segment recovery”:

=====================
PARSING IN CURSOR #4 len=52 dep=0 uid=30 oct=3 lid=30 tim=52668306764175 hv=3327540542 ad='914b1fb8'
select STATE from DUMMY where ID = :1  FOR UPDATE
END OF STMT
PARSE #4:c=0,e=941,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=52668306764164
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=01 csi=871 siz=24 off=0
  kxsbbbfp=ffffffff7ae74f60  bln=22  avl=04  flg=05
  value=376059
WAIT #4: nam='undo segment recovery' ela= 2929424 segment#=5 tx flags=183 p3=0 obj#=27086 tim=52668309703725
EXEC #4:c=10000,e=2939555,p=0,cr=7,cu=2,mis=1,r=0,dep=0,og=1,tim=52668309703911
ERROR #4:err=1591 tim=1098267617
WAIT #4: nam='SQL*Net break/reset to client' ela= 9 driver id=1952673792 break?=1 p3=0 obj#=27086 tim=52668309704412
WAIT #4: nam='SQL*Net break/reset to client' ela= 973 driver id=1952673792 break?=0 p3=0 obj#=27086 tim=52668309705441
WAIT #4: nam='SQL*Net message to client' ela= 5 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309705502
WAIT #4: nam='SQL*Net message from client' ela= 2441 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309708092
WAIT #28: nam='SQL*Net message to client' ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309708548
WAIT #28: nam='SQL*Net message from client' ela= 2608 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309711219
XCTEND rlbk=1, rd_only=1
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309711650

And there is the error message!  ORA-1591 means “lock held by in-doubt distributed transaction”.

Querying DBA_2PC_PENDING showed a couple of pending two-phase-commit transactions since the previous day.  I informed the application owner, who then asked me to ‘commit force’ those transactions.  Problem solved.

Wrap Up

If the users can’t provide the DBA with an ORA-NNNN error message, then the DBA can try to capture the error message by enabling SQL tracing and searching through the trace files for “err=“.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s