Monday, 10 September 2018

Hash Functions to identify skewed data

On day to day activity we often get situation when the database has a lot of space left but query fails with “2644: No more room in database“. If any table has severe skewing it can cause the database to run out of space.
Here we will learn how to identify data causing skewness using hash functions.
Following query identifies the database space:
locking table dbc.diskspace for access
select databasename as “Database_Name”,
sum(maxperm)/(1024*1024*1024) as “Max_Perm (GB)”,
sum(currentperm)/(1024*1024*1024) as “Current_Perm (GB)”,
(sum(maxperm) – sum(currentperm))/(1024*1024*1024) as “Free_Actual (GB)”,
(((sum(maxperm) – sum(currentperm)) * (100)) /
CASE
WHEN sum(maxperm) = 0 THEN 1
ELSE sum(maxperm)
END
) AS “Free_Space %”
from dbc.diskspace
where databasename = ‘databasename’
group by databasename
order by 1;
Example:
locking table dbc.diskspace for access
select databasename as “Database_Name”,
sum(maxperm)/(1024*1024*1024) as “Max_Perm (GB)”,
sum(currentperm)/(1024*1024*1024) as “Current_Perm (GB)”,
(sum(maxperm) – sum(currentperm))/(1024*1024*1024) as “Free_Actual (GB)”,
(((sum(maxperm) – sum(currentperm)) * (100)) /
CASE
WHEN sum(maxperm) = 0 THEN 1
ELSE sum(maxperm)
END
) AS “Free_Space %”
from dbc.diskspace
where databasename = ‘testdatabase’
group by databasename
order by 1;
Database_Name Max_Perm (GB) Current_Perm (GB) Free_Actual (GB) Free_Space %
————  ————  ————      ————     ————
testdatabase  10,837.80     124.23            10,713.58        98.85
Following query identifies which Vproc has more perm space than the other Vprocs within a database:
locking table dbc.diskspace for access
select vproc,
currentperm as “Current_Perm”
from dbc.diskspace
where databasename=’databasename’
order by 2 desc 1;
Example:
locking table dbc.diskspace for access
select vproc,
currentperm as “Current_Perm”
from dbc.diskspace
where databasename=’testdatabase’
order by 2 desc 1;
Vproc           Current_Perm
——          ——————-
31               3,218,931,212.00
2                234,423,654.00
45               234,128,231.00
5                234,213,213.00
7                234,754,232.00
>
From the above query output we can determine which Vproc contains skewed data. Now we will isolate which table has highest skewing on Vproc 31.
Following query identifies which table contains skewing on a specific vproc within a database:
locking table dbc.tablesize for access
select vproc,
currentperm as “Current_Perm”,
tablename as “Table_Name”
from dbc.tablesize
where databasename=’databasename’
and Vproc in (baseline vproc,skewed vproc)
order by 2 desc,3,1;
NOTE: Above query can yield huge data ,you can import it into Excel for easy review of output.
Example:
locking table dbc.tablesize for access
select vproc,
currentperm as “Current_Perm”,
tablename as “Table_Name”
from dbc.tablesize
where databasename=’testdatabase’
and Vproc in (1,31)
order by 2 desc,3,1
Vproc     Current_Perm    Table_Name
——   —————  ————
1         39,792.00        testtable2
31        23,212.00        testtable2
1         2,793.00         testtable1
31        2,911.00         testtable1
1         793,791.00       testtable
31        1,654,210,212.00 testtable
1         3,702.00         testtable3
31        1,791.00         testtable3
>
Now we will find out the hashbucket which contains the skewed data for table.
The following query identifies which hashbucket on a specifc vproc contains the skewed data for a specific table:
select distinct(hashrow(primaryindexname))
,count(hashrow(primaryindexname))
from databasename.tablename
where HASHAMP(HASHBUCKET(HASHROW(fld1)))=vproc number
group by 1
order by 2 desc;
NOTE: Use SHOW TABLE to get the Primary Index name for the skewed table.
Example:
select distinct(hashrow(testpi))
,count(hashrow(testpi))
from testdatabase.testtable
where HASHAMP(HASHBUCKET(HASHROW(fld1)))=31
group by 1
order by 2 desc;
HASHROW(testpi)           Count(HASHROW(testpi))
——————–      ————————
23D54C43                  43124378
E3213D6C                  3482
The following query identifies the actual data that hashes to a specific hashbucket for a specific table.
select distinct(indexname)
from databasename.tablename
where hashrow(indexname)=’hashbucket id’xb;
Example:
select distinct(testpi)
from testdatabase.testtable
where hashrow(testpi)=’23D54C43’xb;
testpi
——————-
0
NOTE: Above output shows skewing is caused by NULLs in the Primary Index value, however it could be any data value.
Following query identifies to which AMP “NULLS” hash:
select HASHAMP(HASHBUCKET(HASHROW(0)));

No comments:

Post a Comment