Tuesday, 28 August 2018

ABORTING STATE : ReCoVery Manager or RCV Manager/RMAN


The ABORTING state indicates that the session is aborting its latest request.

Do you want to get rid of the aborting session in the system from long time, to clean up such sessions quickly?”
1.       Use RCVManager to list/abort the Rollback process   OR
2.       Use Gateway Global to abort user session   OR
3.       Use Function to abort the session
4.       Hopeless ?  Then do TPA reset.(Teradata Database Bounce)


If the aborting session is not fixed in the above mentioned steps then check with GSC. TPA reset with dump will clear those and GSC can investigate why they were in aborting state for so long .

When there is an in-flight transaction failure, Teradata tries to rollback all the transactions that are uncommitted.
Rollback information is present in Transient Journal. If the transaction is huge, then there is good chance that there is a very high AMP I/O utilized for rollback operation.
As you may now, Transient Journal is AMP local. (Meaning each row is maintained on the same AMP for rollback purpose)
There are multiple issues that we have to mitigate
·         Impacts other queries
·         Tables are locked
·         If it is INSERT/SELECT operation then both tables are locked if you are not using ACCESS LOCKS
There are multiple ways You can get around this problem
1 Allow the rollback operation to complete. The priority of the rollback operation depends on the settings that you have in Teradata Manager. Default recovery level is LOW , which I believe is recommended by Teradata. You may change this using Recovery manager to a higher priority for Rollback purpose.
2 Use Recovery Manager
2.1 Use Recovery Manager and identify the table ID for which recovery is on. This can be found by issuing the following command on the Recovery Manager Command Line LIST CANCEL ROLLBACK TABLES
2.2 Above command lists all the tables that are being rolled back. It also identifies the session ID, user ID and other important information
2.3 Execute the following CANCEL ROLLBACK ON TABLE <Table Id> to cancel the rollback operation. You will need DBC password to accomplish this. This will discard entries in Transient Journal. By doing this, the underlying table goes in an inconsistent state. We need to issue a DELETE ALL or DROP/CREATE the table.
2.4 Restore the tables data from last available backup or if the table was empty, we can resume the LOAD job

The ABORTING state display is shown below.
State Details : ABORTING

Statements Code Time     CPU Usage Accesses
---------- ---- -------- --------- --------
         1 1043 14:22:09         2       22
The columns on the ABORTING state display provide the following information.
The column named...
Contains the...
Statements
statements, up to the number displayed, that are aborting.
Code
error that caused the abort.
Time
time that the abort step was sent to the AMP.
CPU Usage
accumulated time in thousandths of a second that all AMPs spent processing the current request.
Accesses
total number of segment access calls executed on all AMPs for the session request.

1.1 Use RCVManager to list/abort the Rollback process   


RCVManager is CNS subsystem utility. You cannot invoke them via queries. Rather you can use Viewpoint Remote Console Utility OR use Cnsterm 6.

> Cnsterm 6

start rcvmanager;

rcmanager started in cnsterm 1

Go to cnsterm 1, then just type the command "LIST ROLLBACK TABLES;"

It will return the table name as well as the time estimated to complete the rollback.

In other case, the session appears to be in 'Aborting state', but it wont be listed by LIST ROLLBACK TABLES command, that does not mean that the table is NOT in rollback state. Remember that RCVMANGER return the table name if you execute that command 'provided' the transcation in rollback state has  more than 10000 rows on at least one AMP. So 2 things we should remember is:

1) The transaction should be in 'aborting state' in Viewpoint

2) Table that is rolling back does not have more than 10000 rows in TJ (Transient Jrnl) on at least one amp, it will not be listed by the command.

Coming to LIST STATUS, correct syntax for this command as per 13.10 version is: LIST STATUS VprocID;

For ex: LIST STATUS 0;

The requesting AMP is not executing Dow AMP recovery.

That means this vproc0 is not participating in the recovery of this rollback. From PUMA -c command, you can see the whether the MSGWORKABORT AWTs are high inuse or not. As MSGWORKABORT is meant for internal aborts, you can find out which particular vproc returning high number and check LIST STATUS for that vproc. This will give you clear picture on finding out whether its a real rollback OR not and how costly it is.

Utility rcvmanager started in processor 0, partition 20.

 _______
    |                                |              |
    |    ___     __     ____         |    ____    __|__    ____
    |   /      |/  \    ____|    ____|    ____|     |      ____|
    |   ---    |       /    |   /    |   /    |     |     /    |
    |   \___   |       \____|   \____|   \____|     |__   \____|

    Release 15.00.04.02 Version 15.00.04.03a
    RCVMANAGER Utility (Jan 96)


Enter command, "QUIT;" or "HELP;" :
start rcvmanager;
$The word "QUIT/LIST/DEFAULT/REBUILD/RECOVERY/CANCEL/ROLLBACK" is expected.

Enter command, "QUIT;" or "HELP;" :

The LIST ROLLBACK TABLES command displays the names and details of the tables undergoing rollback in the Teradata Database system.

LIST ROLLBACK TABLES;

TABLES BEING ROLLED BACK AT 11:14:47 18/08/08


ONLINE USER ROLLBACK TABLE LIST

Host  Session   User ID    Performance Group               AMP W/Count
----  --------  ---------  ------------------------------  -----------
   1  17910001  0000:03FD                          SYSTEM           64

TJ Rows Left   TJ Rows Done   Time Est.
-------------  -------------  ---------
     24648328          13870  435:23:19

Table ID   Name
---------  ------------------------------------------------------------------
0000:E1CD  "DP_GL_TAB"."IN_DUMP"


SYSTEM RECOVERY ROLLBACK TABLE LIST

Host  Session   TJ Row Count
----  --------  -------------

Table ID   Name
---------  ------------------------------------------------------------------

Enter command, "QUIT;" or "HELP;" :
The LIST STATUS command generates two reports:
·         Online Transaction Recovery Journal Counts
·         Down/Catchup AMP Recovery Status
The first reports transaction recovery information, and the second reports AMP recovery information for unavailable AMPs.
You can use commands like LIST STATUS,LIST ROLLBACK TABLES.If that session is listed in output of above command then rollback is still in progress. If CPU is not an issue then increase the rollback priority.

LIST STATUS 0;   The requested AMP is not executing Down AMP recovery.

The REBUILD/RECOVERY PRIORITY command sets or displays a priority level for use by the Table Rebuild utility and a Teradata Database system recovery operation.

The LIST STATUS command with the proc-id option specified as mmmm provides additional detailed information about the recovery process of a specific AMP.


Enter command, "QUIT;" or "HELP;" :
The QUIT command exits rcvmanager.



Enter command, "QUIT;" or "HELP;" :
LIST ROLLBACK TABLES;
TABLES BEING ROLLED BACK AT 11:40:57 18/08/08


ONLINE USER ROLLBACK TABLE LIST

Host  Session   User ID     Performance Group               AMP W/Count
----  --------  ---------   ------------------------------  -----------

TJ Rows Left   TJ Rows Done   Time Est.
-------------  -------------  ---------

Table ID   Name
---------  ------------------------------------------------------------------


SYSTEM RECOVERY ROLLBACK TABLE LIST

Host  Session   TJ Row Count
----  --------  -------------

Table ID   Name
---------  ------------------------------------------------------------------

Enter command, "QUIT;" or "HELP;" :

1.2 Use RCVManager to abort the Rollback process   

If for some reason you have ran a huge delete and the process was terminated then that huge table rollback will take lots of time. Generally 3-4 times the time it ran before being killed.
Below are the steps as how you can kill a rollback process but do note that if you do this step then the table will become unusable and you have to drop and recreate.
Steps to follow:
1) Connect to your viewpoint server
2) Go to remote Console port let
3) Select your Teradata database server and then select Recovery Manager Utility. You will find this option on the left top just under port let name)
4) Now, run below queries to find the tables which are being rolled back
Enter command, “QUIT;” or “HELP;” :
LIST ROLLBACK TABLES;
TABLES BEING ROLLED BACK AT 14:35:29 15/10/06
ONLINE USER ROLLBACK TABLE LIST
Host Session User ID Workload Definition AMP W/Count
—- ——– ——— —————————— ———–
1 12252 0000:03B2 NoWD-PG1054 125
TJ Rows Left TJ Rows Done Time Est.
————- ————- ———
190950034 122512 1904:52:18
Table ID Name
——— ——————————————————————
0000:1AC1 “DATA_TEST”.”MY_TABLE”
SYSTEM RECOVERY ROLLBACK TABLE LIST
Host Session TJ Row Count
—- ——– ————-
Table ID Name
——— ——————————————————————
5) Run below queries to actually kill rollback. Note we will use Table ID from the output of above command
Enter command, “QUIT;” or “HELP;” :
CANCEL ROLLBACK ON TABLE 00001AC1;
Type the password for user DBC or press the Enter key to return:
Rollback will be cancelled for:
0000:1AC1 “DATA_TEST”.”MY_TABLE”
Confirm y/n ?
Y
6) Check again to make sure that rollback is killed
Enter command, “QUIT;” or “HELP;” :
list rollback tables; 
TABLES BEING ROLLED BACK AT 14:39:25 15/10/06
ONLINE USER ROLLBACK TABLE LIST
Host Session User ID Performance Group AMP W/Count
—- ——– ——— —————————— ———–
TJ Rows Left TJ Rows Done Time Est.
————- ————- ———
Table ID Name
——— ——————————————————————
SYSTEM RECOVERY ROLLBACK TABLE LIST
Host Session TJ Row Count
—- ——– ————-
Table ID Name
——— ——————————————————————
Enter command, “QUIT;” or “HELP;” :

2.Use Gateway Global to abort user session   

The Gateway Global utility, gtwglobal, allows you to monitor and control the sessions of Teradata Database workstation-connected users. The gateway software runs as a separate operating system task and is the interface between the network and the Teradata Database.
The KILL SESSION command terminates the specified session on the Teradata Database by aborting any active request on that session and logging the session off.
Aborting an outstanding request could take a significant amount of time. Therefore, killing a session or a user’s sessions does not necessarily free up the resources of those sessions immediately.
The KILL USER command terminates all logged on sessions for the specified user name restricted to the host group of the gateway. KILL USER aborts any active requests on those sessions and logs the sessions off.

3.Use Function to abort the session To abort session without PEMON or TDManager in Teradata

The following example helps to abort the session of a user ABCD. 
1.       REPLACE FUNCTION AbortSessions
2.       (HostIdIn SMALLINT,
3.       UserNameIn VARCHAR(30),
4.       SessionNoIn INTEGER,
5.       LogoffSessions VARCHAR(1),
6.       UserOverride VARCHAR(1)
7.       )
8.       SELECT AbortSessions (1, 'ABCD', 0, 'Y', 'Y');
9.        
10.   -- Logoff/Kill all sessions where the user ABCD is logged on.
SELECT AbortSessions (1, 'USERNAME', 0, 'N', 'N');
The above would abort all sessions for user “USERNAME”.  The third value is the “SessionNoIn” field, where you can either specify a specific session number, or specify 0 for all sessions that match the other conditions.
The details for each field are as follows:
§  HostIdIn – The host ID (normally 1, as it’s only relevant to Z/OS clients)
§  UserNameIn – The username of the user that executed the session.
§  SessionNoIn – The session number of the session (or 0 if you want to kill all sessions for that user)
§  LogoffSessions – (This needs to be confirmed) Logoff session after abort (rather than leaving it idle)
§  UserOverride – …as below…
Indicator of whether to override an ABORT SESSION failure:
• Y = Override the ABORT SESSION request to fail in any of the following cases:
• An identified session is being session-switched.
• An identified session is executing its own ABORT SESSION request.
• An identified session has a PEState of IDLE: IN-DOUBT as a result of a 2PC.
Note: Sessions are marked IN-DOUBT by the 2PC protocol, which governs how transactions are committed by multiple systems that do not share memory. The protocol guarantees that either all systems commit or all roll back.
• N or NULL = Do not override.
The AbortSessions function returns the number of sessions aborted as the resultset.

4.Hopeless ?  Then do TPA reset.

The TPA Reset utility, tpareset, resets Teradata Database. All PDE tasks except for some kernel daemons are killed. All database processes, including those for AMPs and PEs, are stopped and restarted. TPA Reset options control whether dump information is collected, whether the node on which tpareset is run is rebooted, and whether the system remains down or restarts.
TPA Reset can be used:
•after Teradata Database has been reconfigured
•to activate new versions of Teradata Database software
•to recover from a database hang
•in other situations that warrant a full database shutdown and restart
Use TPA Reset only when absolutely necessary. TPA Reset has a system-wide effect. Shutting down the system and restarting not only causes system down time, but can also have a performance impact when the system starts up and runs standard recovery and reconcile operations.
Checking Teradata Status
Let's start with the basic status tool, pdestate:
1
# pdestate -a

If all is well and Teradata is running, you will see this output:
1
2
PDE state is RUN/STARTED.
DBS state is 5: Logons are enabled - The system is quiescent

Stopping the database
Should you need to stop the database in a controlled manner, the command is:
1
# tpareset -x bringing down the database

The syntax for this command is tpareset -x {comment}, where the comments can be used for logging/auditing purposes to capture the reason that the database was stopped.
Here is output from this command.  Notice the prompt as a last check to make sure that this is what you really want to do.
1
2
3
4
You are about to shutdown the database
on the system
's10-1300'
Do you wish to continue (default: n) [y,n] y

Once the database has been stopped, you can check the status of the database like we did earlier:
1
2
# pdestate -a
PDE state is STOP/KILLTASKS.

But if there was a problem that halted Teradata instead of a controlled shutdown, you will see this message from pdestate:
1
2
#pdestate -a
"down/hardstop" is down

Restarting Teradata
To restart Teradata after manually stopping the database or to restart after a 'hardstop' event, run this command:
1
# /etc/init.d/tpa start
Which should give this output:
1
2
Teradata Database Initiator service is starting...
Teradata Database Initiator service started successfully.

To restart Teradata if it is any state other than "down/hardstop" use the command:
1
# tpareset -f restarting
This asks for a confirmation before beginning the restart process:
1
2
3
4
You are about to restart the database
on the system
's10-1300'
Do you wish to continue (default: n) [y,n] y

You can then issue the watch the various start up levels by issuing the pdestate command every few seconds..
1
2
# pdestate -a
PDE state is START/RECONCILE.

1
2
# pdestate -a
PDE state is START/STARTTPA.

1
2
3
# pdestate -a
PDE state is RUN/STARTED.
DBS state is 1/5: DBS Startup - Voting for Transaction Recovery

1
2
3
# pdestate -a
PDE state is RUN/STARTED.
DBS state is 1/4: DBS Startup - Starting PE Partitions

1
2
3
# pdestate -a
PDE state is RUN/STARTED.
DBS state is 5: Logons are enabled - The system is quiescent



Technical team:
Pankaj Chahar         
References:
https://www.info.teradata.com/HTMLPubs/DB_TTU_16_00/index.html#page/Utilities/B035-1102-160K/B035-1102-160K.1.506.html

No comments:

Post a Comment