The ABORTING state indicates that the session is aborting its
latest request.
“Do you want to get rid of the aborting session in the system from long
time, to clean up such sessions quickly?”
1.
Use RCVManager to
list/abort the Rollback process OR
2.
Use Gateway Global to abort
user session OR
3.
Use Function to abort the
session
4.
Hopeless ? Then do TPA reset.(Teradata Database Bounce)
If the aborting session is not fixed in the above mentioned steps
then check with GSC. TPA reset with dump will clear those and GSC can
investigate why they were in aborting state for so long .
When there is an in-flight transaction failure, Teradata tries to
rollback all the transactions that are uncommitted.
Rollback information is present in Transient
Journal. If the transaction is huge, then there is good chance that there is a
very high AMP I/O utilized for rollback operation.
As you may now, Transient Journal is AMP local.
(Meaning each row is maintained on the same AMP for rollback purpose)
There are multiple issues that we have to
mitigate
·
Impacts other queries
·
Tables are locked
·
If it is INSERT/SELECT
operation then both tables are locked if you are not using ACCESS LOCKS
There are multiple ways You can get around this
problem
1 Allow the rollback
operation to complete. The priority of the rollback operation depends on the
settings that you have in Teradata Manager. Default recovery level is LOW , which
I believe is recommended by Teradata. You may change this using Recovery
manager to a higher priority for Rollback purpose.
2 Use Recovery
Manager
2.1 Use Recovery
Manager and identify the table ID for which recovery is on. This can be found
by issuing the following command on the Recovery Manager Command Line LIST
CANCEL ROLLBACK TABLES
2.2 Above command
lists all the tables that are being rolled back. It also identifies the session
ID, user ID and other important information
2.3 Execute the
following CANCEL ROLLBACK ON TABLE <Table Id> to cancel the rollback
operation. You will need DBC password to accomplish this. This will discard
entries in Transient Journal. By doing this, the underlying table goes in an
inconsistent state. We need to issue a DELETE ALL or DROP/CREATE the table.
2.4 Restore the
tables data from last available backup or if the table was empty, we can resume
the LOAD job
The ABORTING state display
is shown below.
State Details : ABORTING
Statements Code Time CPU Usage Accesses
---------- ---- -------- --------- --------
1 1043 14:22:09 2 22
Statements Code Time CPU Usage Accesses
---------- ---- -------- --------- --------
1 1043 14:22:09 2 22
The columns on the ABORTING state display provide the following
information.
The column named...
|
Contains the...
|
Statements
|
statements, up to the number displayed, that are
aborting.
|
Code
|
error that caused the abort.
|
Time
|
time that the abort step was sent to the AMP.
|
CPU Usage
|
accumulated time in thousandths of a second that all
AMPs spent processing the current request.
|
Accesses
|
total number of segment access calls executed on all
AMPs for the session request.
|
1.1 Use RCVManager to list/abort the Rollback process
RCVManager is CNS subsystem utility. You cannot invoke them via
queries. Rather you can use Viewpoint Remote Console Utility OR use Cnsterm 6.
> Cnsterm 6
start rcvmanager;
rcmanager started in cnsterm 1
Go to cnsterm 1, then just type the command "LIST
ROLLBACK TABLES;"
It will return the table name as well as the time estimated to
complete the rollback.
In other case, the session appears to be in 'Aborting state', but
it wont be listed by LIST ROLLBACK TABLES command, that does not mean that the
table is NOT in rollback state. Remember that RCVMANGER return the table name
if you execute that command 'provided' the transcation in rollback state
has more than 10000 rows on at least one
AMP. So 2 things we should remember is:
1) The transaction should be in 'aborting state' in Viewpoint
2) Table that is rolling back does not have more than 10000 rows
in TJ (Transient Jrnl) on at least one amp, it will not be listed by the
command.
Coming to LIST STATUS, correct syntax for this command as per
13.10 version is: LIST STATUS VprocID;
For ex: LIST STATUS 0;
The requesting AMP is not executing Dow AMP recovery.
That means this vproc0 is not participating in the recovery of
this rollback. From PUMA -c command, you can see the whether the MSGWORKABORT
AWTs are high inuse or not. As MSGWORKABORT is meant for internal aborts, you
can find out which particular vproc returning high number and check LIST STATUS
for that vproc. This will give you clear picture on finding out whether its a
real rollback OR not and how costly it is.
Utility rcvmanager started in processor 0, partition 20.
_______
| | |
| ___
__ ____ |
____ __|__ ____
| /
|/ \ ____|
____| ____| |
____|
| ---
| / |
/ | /
| | /
|
| \___
| \____| \____|
\____| |__ \____|
Release
15.00.04.02 Version 15.00.04.03a
RCVMANAGER
Utility (Jan 96)
Enter command, "QUIT;" or "HELP;" :
start rcvmanager;
$The word
"QUIT/LIST/DEFAULT/REBUILD/RECOVERY/CANCEL/ROLLBACK" is expected.
Enter command, "QUIT;" or "HELP;" :
The LIST ROLLBACK TABLES command displays the names and details of the
tables undergoing rollback in the Teradata Database system.
LIST ROLLBACK TABLES;
TABLES BEING ROLLED BACK AT 11:14:47 18/08/08
ONLINE USER ROLLBACK TABLE LIST
Host Session User ID
Performance Group
AMP W/Count
---- -------- ---------
------------------------------
-----------
1 17910001
0000:03FD
SYSTEM 64
TJ Rows Left TJ
Rows Done Time Est.
-------------
------------- ---------
24648328 13870
435:23:19
Table ID Name
---------
------------------------------------------------------------------
0000:E1CD
"DP_GL_TAB"."IN_DUMP"
SYSTEM RECOVERY ROLLBACK TABLE LIST
Host Session TJ Row Count
---- -------- -------------
Table ID Name
---------
------------------------------------------------------------------
Enter command, "QUIT;" or "HELP;" :
The LIST STATUS command generates two reports:
·
Online Transaction Recovery Journal Counts
·
Down/Catchup AMP Recovery Status
The first reports
transaction recovery information, and the second reports AMP recovery
information for unavailable AMPs.
You can use commands like
LIST STATUS,LIST ROLLBACK TABLES.If that session is listed in output of above
command then rollback is still in progress. If CPU is not an issue then
increase the rollback priority.
LIST STATUS 0;
The requested AMP is not executing Down AMP recovery.
The REBUILD/RECOVERY PRIORITY command sets or displays a priority level
for use by the Table Rebuild utility and a Teradata Database system recovery
operation.
The LIST STATUS command with the proc-id option
specified as mmmm provides additional detailed information about the recovery
process of a specific AMP.
Enter command, "QUIT;" or "HELP;" :
The QUIT command exits rcvmanager.
Enter command, "QUIT;" or "HELP;" :
LIST ROLLBACK TABLES;
TABLES BEING ROLLED BACK AT 11:40:57 18/08/08
ONLINE USER ROLLBACK TABLE LIST
Host Session User ID
Performance Group
AMP W/Count
---- -------- ---------
------------------------------
-----------
TJ Rows Left TJ
Rows Done Time Est.
-------------
------------- ---------
Table ID Name
--------- ------------------------------------------------------------------
SYSTEM RECOVERY ROLLBACK TABLE LIST
Host Session TJ Row Count
---- -------- -------------
Table ID Name
---------
------------------------------------------------------------------
Enter command, "QUIT;" or "HELP;" :
1.2 Use RCVManager to abort the Rollback process
If for some reason
you have ran a huge delete and the process was terminated then that huge table
rollback will take lots of time. Generally 3-4 times the time it ran before
being killed.
Below are the steps as how you can kill a rollback process but do note that if you do this step then the table will become unusable and you have to drop and recreate.
Below are the steps as how you can kill a rollback process but do note that if you do this step then the table will become unusable and you have to drop and recreate.
Steps to follow:
1) Connect to your viewpoint server
2) Go to remote Console port let
3) Select your Teradata database server and then select Recovery Manager Utility. You will find this option on the left top just under port let name)
4) Now, run below queries to find the tables which are being rolled back
1) Connect to your viewpoint server
2) Go to remote Console port let
3) Select your Teradata database server and then select Recovery Manager Utility. You will find this option on the left top just under port let name)
4) Now, run below queries to find the tables which are being rolled back
Enter command, “QUIT;” or “HELP;” :
LIST ROLLBACK TABLES;
TABLES BEING ROLLED BACK AT 14:35:29 15/10/06
LIST ROLLBACK TABLES;
TABLES BEING ROLLED BACK AT 14:35:29 15/10/06
ONLINE USER ROLLBACK TABLE LIST
Host Session User ID Workload Definition AMP W/Count
—- ——– ——— —————————— ———–
1 12252 0000:03B2 NoWD-PG1054 125
—- ——– ——— —————————— ———–
1 12252 0000:03B2 NoWD-PG1054 125
TJ Rows Left TJ Rows Done Time Est.
————- ————- ———
190950034 122512 1904:52:18
————- ————- ———
190950034 122512 1904:52:18
Table ID Name
——— ——————————————————————
0000:1AC1 “DATA_TEST”.”MY_TABLE”
——— ——————————————————————
0000:1AC1 “DATA_TEST”.”MY_TABLE”
SYSTEM RECOVERY ROLLBACK TABLE LIST
Host Session TJ Row Count
—- ——– ————-
—- ——– ————-
Table ID Name
——— ——————————————————————
——— ——————————————————————
5) Run below
queries to actually kill rollback. Note we will use Table ID from the output of
above command
Enter command, “QUIT;” or “HELP;” :
CANCEL ROLLBACK ON TABLE 00001AC1;
Type the password for user DBC or press the Enter key to return:
CANCEL ROLLBACK ON TABLE 00001AC1;
Type the password for user DBC or press the Enter key to return:
Rollback will be cancelled for:
0000:1AC1 “DATA_TEST”.”MY_TABLE”
0000:1AC1 “DATA_TEST”.”MY_TABLE”
Confirm y/n ?
Y
6) Check again
to make sure that rollback is killed
Enter command, “QUIT;” or “HELP;” :
list rollback tables;
TABLES BEING ROLLED BACK AT 14:39:25 15/10/06
list rollback tables;
TABLES BEING ROLLED BACK AT 14:39:25 15/10/06
ONLINE USER ROLLBACK TABLE LIST
Host Session User ID Performance Group AMP W/Count
—- ——– ——— —————————— ———–
—- ——– ——— —————————— ———–
TJ Rows Left TJ Rows Done Time Est.
————- ————- ———
————- ————- ———
Table ID Name
——— ——————————————————————
——— ——————————————————————
SYSTEM RECOVERY ROLLBACK TABLE LIST
Host Session TJ Row Count
—- ——– ————-
—- ——– ————-
Table ID Name
——— ——————————————————————
——— ——————————————————————
Enter command, “QUIT;” or “HELP;” :
2.Use Gateway Global to abort user session
The Gateway Global
utility, gtwglobal, allows you to monitor and control the sessions of Teradata
Database workstation-connected users. The gateway software runs as a separate
operating system task and is the interface between the network and the Teradata
Database.
The KILL SESSION command terminates the specified session on the
Teradata Database by aborting any active request on that session and logging
the session off.
Aborting an outstanding request could take a significant amount of
time. Therefore, killing a session or a user’s sessions does not necessarily
free up the resources of those sessions immediately.
The KILL USER command
terminates all logged on sessions for the specified user name restricted to the
host group of the gateway. KILL USER aborts any active requests on those
sessions and logs the sessions off.
3.Use Function to abort the session To abort session without PEMON or
TDManager in Teradata
The following example helps to abort
the session of a user ABCD.
1.
REPLACE FUNCTION AbortSessions
2.
(HostIdIn SMALLINT,
3.
UserNameIn VARCHAR(30),
4.
SessionNoIn INTEGER,
5.
LogoffSessions VARCHAR(1),
6.
UserOverride VARCHAR(1)
7.
)
8.
SELECT AbortSessions (1, 'ABCD', 0, 'Y', 'Y');
9.
10.
-- Logoff/Kill all sessions where the user ABCD is logged on.
SELECT AbortSessions (1,
'USERNAME', 0, 'N', 'N');
The above would abort all
sessions for user “USERNAME”. The third value is the “SessionNoIn” field,
where you can either specify a specific session number, or specify 0 for all
sessions that match the other conditions.
The details for each
field are as follows:
§ HostIdIn – The host ID
(normally 1, as it’s only relevant to Z/OS clients)
§ UserNameIn – The username of
the user that executed the session.
§ SessionNoIn – The session
number of the session (or 0 if you want to kill all sessions for that user)
§ LogoffSessions –
(This needs to be confirmed) Logoff session after abort (rather than leaving it
idle)
§ UserOverride – …as below…
Indicator of whether to override an ABORT
SESSION failure:
• Y = Override the ABORT SESSION request to fail in any of the following cases:
• An identified session is being session-switched.
• An identified session is executing its own ABORT SESSION request.
• An identified session has a PEState of IDLE: IN-DOUBT as a result of a 2PC.
Note: Sessions are marked IN-DOUBT by the 2PC protocol, which governs how transactions are committed by multiple systems that do not share memory. The protocol guarantees that either all systems commit or all roll back.
• N or NULL = Do not override.
• Y = Override the ABORT SESSION request to fail in any of the following cases:
• An identified session is being session-switched.
• An identified session is executing its own ABORT SESSION request.
• An identified session has a PEState of IDLE: IN-DOUBT as a result of a 2PC.
Note: Sessions are marked IN-DOUBT by the 2PC protocol, which governs how transactions are committed by multiple systems that do not share memory. The protocol guarantees that either all systems commit or all roll back.
• N or NULL = Do not override.
The AbortSessions
function returns the number of sessions aborted as the resultset.
4.Hopeless ? Then do TPA reset.
The TPA Reset utility, tpareset, resets Teradata
Database. All PDE tasks except for some kernel daemons are killed. All database
processes, including those for AMPs and PEs, are stopped and restarted. TPA
Reset options control whether dump information is collected, whether the node
on which tpareset is run is rebooted, and whether the system remains down or
restarts.
TPA Reset can be used:
•after Teradata Database has been reconfigured
•to activate new versions of Teradata Database software
•to recover from a database hang
•in other situations that warrant a full database
shutdown and restart
Use TPA Reset only when absolutely necessary. TPA Reset
has a system-wide effect. Shutting down the system and restarting not only
causes system down time, but can also have a performance impact when the system
starts up and runs standard recovery and reconcile operations.
Checking Teradata Status
Let's start with the basic status tool, pdestate:
1
|
# pdestate -a
|
If all is well and Teradata is running, you will see this output:
1
2
|
PDE state is RUN/STARTED.
DBS state is 5: Logons are enabled - The system is
quiescent
|
Stopping the database
Should you need to stop the database in a controlled manner, the command
is:
1
|
# tpareset -x bringing down the database
|
The syntax for this command is tpareset -x {comment}, where
the comments can be used for logging/auditing purposes to capture the reason
that the database was stopped.
Here is output from this command. Notice the prompt as a last check
to make sure that this is what you really want to do.
1
2
3
4
|
You are about to shutdown the database
on the system
's10-1300'
Do you wish to continue (default: n) [y,n] y
|
Once the database has been stopped, you can check the status of the database like we did earlier:
1
2
|
# pdestate -a
PDE state is STOP/KILLTASKS.
|
But if there was a problem that halted Teradata instead of a controlled shutdown, you will see this message from pdestate:
1
2
|
#pdestate -a
"down/hardstop" is down
|
Restarting Teradata
To restart Teradata after manually stopping the database or to restart
after a 'hardstop' event, run this command:
1
|
# /etc/init.d/tpa start
|
Which should give this output:
1
2
|
Teradata Database Initiator service is starting...
Teradata Database Initiator service started
successfully.
|
To restart Teradata if it is any state other than
"down/hardstop" use the command:
1
|
# tpareset -f restarting
|
This asks for a confirmation before beginning the restart process:
1
2
3
4
|
You are about to restart the database
on the system
's10-1300'
Do you wish to continue (default: n) [y,n] y
|
You can then issue the watch the various start up levels by issuing the
pdestate command every few seconds..
1
2
|
# pdestate -a
PDE state is START/RECONCILE.
|
1
2
|
# pdestate -a
PDE state is START/STARTTPA.
|
1
2
3
|
# pdestate -a
PDE state is RUN/STARTED.
DBS state is 1/5: DBS Startup - Voting for Transaction
Recovery
|
1
2
3
|
# pdestate -a
PDE state is RUN/STARTED.
DBS state is 1/4: DBS Startup - Starting PE Partitions
|
1
2
3
|
# pdestate -a
PDE state is RUN/STARTED.
DBS state is 5: Logons are enabled - The system is
quiescent
|
Technical team:
References:
https://www.info.teradata.com/HTMLPubs/DB_TTU_16_00/index.html#page/Utilities/B035-1102-160K/B035-1102-160K.1.506.html
No comments:
Post a Comment