Saturday, 18 November 2017

Remove all or some special characters from a string using regular expressions

Source:

CREATE TABLE FILENAM(FILENAME VARCHAR2(200));
INSERT INTO FILENAM VALUES('FILE1/\:');
INSERT INTO FILENAM VALUES('FILE1/?<');
INSERT INTO FILENAM VALUES('FILE1/?>');
INSERT INTO FILENAM VALUES('FILE1/?|');
INSERT INTO FILENAM VALUES('FILE1/?=<');
INSERT INTO FILENAM VALUES('FILE1/?<,');
INSERT INTO FILENAM VALUES('FILE1/?<*');
INSERT INTO FILENAM VALUES('FILE1/\:*?<>|=,');
INSERT INTO FILENAM VALUES('FILE1/\:*?<>|$=,');
COMMIT;


Query:



Output:


Tuesday, 25 July 2017

Identifying all the domains in the data

Input:




Output:




Identify the domain in the email id



Usually we know how to use SUBSTRING and INSTRING functions to extract a portion of a string passing the positions of a character.

Here we have a scenario , step ahead of normal scenario.

The username itself contains @symbols.It's just an assumption.




Tuesday, 31 January 2017

Split the comma separated values of a single column into multiple rows


I have a column and value is 1,2,3,4,5,6. Need to split this value into rows.

Note: 1,2,3,4,5,6 is a single value under column.





Friday, 20 January 2017

Find all the positions of an occurrence of a character in a string



I have an SSN# with format 123-45-6789. Here the character '-' is repeated in 4th and 7th position of a string. I need to write an SQL query to find out the positions of that characters.





Friday, 12 February 2016

DISPLAY MISSING NUMBER IN THE SEQUENCE

I have a sequence with fixed boundaries and there can be any number of misses in the sequence in a random fashion. I need to get those missed numbers.


INPUT:



OUTPUT:


DISPLAY THE MOST REPEATED NUMBER


I have a sequence of numbers repeated.I need to get the number which repeated maximum number of times.








Saturday, 6 February 2016

Display Bad records in a file

We have a delimited file and need to capture bad records other than the specific columns

In the below file,we have 4 columns separated by pipe '|' delimiter. 2nd row and 3 rd row are invalid here.

Zero byte file in Unix

Find zero byte file 



Find non-zero byte file

Tuesday, 22 December 2015

When does Optimizer chooses FTS

When 

1)there is no INDEX exists
2)the table is small
3)the table has  high degree of parallelism
4)query uses full table scan hint
5)stats are stale
6)query requires most of the blocks
7)When implicit conversions takes place
8)When there are NULL's
9)query predicate does not use the leading edge of an index.


FTS uses larger IO calls.This means it read multiple blocks in an instance.
Making a fewer large I/O calls is cheaper than making many smaller calls.


High-water Mark 

This is a term used with table segments stored in the database. If you envision a table, for example, as a 'flat' structure or as a series of blocks laid one after the other in a line from left to right, the high-water mark (HWM) would be the rightmost block that ever contained data, as illustrated 
+---- high water mark of newly created table
|
V
+--------------------------------------------------------+
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

      high water mark after inserting 10,000 rows
                                    |
                                    v
+--------------------------------------------------------+
|x |x |x |x |x |x |x |x |x |x |x |x |  |  |  |  |  |  |  |
|x |x |x |x |x |x |x |x |x |x |x |x |  |  |  |  |  |  |  |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+


      high water mark after inserting 10,000 rows
                                    |
                                    v
+--------------------------------------------------------+
|x |x |x |x |x |x |x |  |  |  |  |  |  |  |  |  |  |  |  |
|x |x |x |x |x |x |x |  |  |  |  |  |  |  |  |  |  |  |  |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+



HWM starts at the first block of a newly created table. As data is placed into the table over time and more blocks get used, the HWM rises. If we delete some (or even all) of the rows in the table, we might have many blocks that no longer contain data, but they are still under the HWM, and they will remain under the HWM until the object is rebuilt, truncated, or shrunk (shrinking of a segment is a new Oracle 10g feature that is supported only if the segment is in an ASSM tablespace). 

The HWM is relevant since Oracle will scan all blocks under the HWM, even when they contain no data, during a full scan. This will impact the performance of a full scan,especially if most of the blocks under the HWM are empty. To see this, just create a table with 1,000,000 rows (or create any table with a large number of rows), and then execute a SELECT COUNT(*) from this table. Now, DELETE every row in it and you will find that the SELECT COUNT(*) takes just as long to count 0 rows as it did to count 1,000,000. This is because Oracle is busy reading all of the blocks below the HWM to see if they contain data. You should compare this to what happens if you used TRUNCATE on the table instead of deleting each individual row. TRUNCATE will reset the HWM of a table back to 'zero' and will truncate the associated indexes on the table as well. If you plan on deleting every row in a table, TRUNCATE;if it can be used it would be the method of choice for this reason. 

In an MSSM tablespace, segments have a definite HWM. In an ASSM tablespace, however, there is an HWM and a low HWM. In MSSM, when the HWM is advanced (e.g., as rows are inserted), all of the blocks are formatted and valid, and Oracle can read them safely. With ASSM, however, when the HWM is advanced Oracle doesn't format all of the blocks immediately they are only formatted and made safe to read upon their first use. So, when full scanning a segment, we have to know if the blocks to be read are 'safe' or unformatted (meaning they contain nothing of interest and we do not process them). To make it so that not every block in the table need go through this safe/not safe check, Oracle maintains a low HWM and a HWM. Oracle will full scan the table up to the HWM and for all of the blocks below the low HWM, it will just read and process them. For blocks between the 'low HWM' and the HWM, it must be more careful and refer to the ASSM bitmap information used to manage these blocks in order to see which of them it should read and which it should just ignore. 



MSSM-Manual Segment Space Management --Free space is maintained by Freelist
ASSM-Automatic Segment Space Management--Free Space is maintained by Bitmaps.




Saturday, 5 September 2015

Dimensional Modeling Nomenclature


Normalization vs Denormalization

Normalization:

PROS:
  • When writes are more than reads
  • Vertically divided among columns i.e. split-ted into multiple tables.
  • Updates and Inserts will be effective since no duplicates.
  • Less use of heavy DISTINCT or GROUP BY queries since no duplicates.

CONS:
  • JOINS due to multiple tables.
  • Index strategies are not effective due to joins.

De-Normalization:

PROS:
  • When reads are more than writes.
  • SELECTS are very fast since we avoid JOINS and effective because of index strategies works very well with selective columns.

CONS:
  • Updates and INSERTS becomes costly.

In real time , you should be good enough to choose the most read or most write tables to apply this concepts.






Wednesday, 2 September 2015

Unix Script to generate files based on unique values of a field

Below file has different mobile companies. I need to generate the files dynamically based on the unique values of the field.I can have blank values also.But I should not generate files for those.

Input:














Generate the Unique values :





Script to generate files based on the unique values. Here -z in the if condition represents the check for empty variable.










Output:



Saturday, 15 August 2015

RESET a RANK




SQL> select * from t1;

      DEPT DATE1
---------- ---------
       100 01-JAN-13
       100 02-JAN-13
       200 03-JAN-13
       100 04-JAN-13

select dept,  
       date1,
       rank () Over (partition by dept order by date1) rnk
from t1
order by date1;

      DEPT DATE1            RNK
---------- --------- ----------
       100 01-JAN-13          1
       100 02-JAN-13          2
       200 03-JAN-13          1
       100 04-JAN-13          3
The desired output is as follows. The last rnk=1 is becuase the Jan-04 record is the first record after the change.
      DEPT DATE1            RNK
---------- --------- ----------
       100 01-JAN-13          1
       100 02-JAN-13          2
       200 03-JAN-13          1
       100 04-JAN-13          1  <<<----------
Solution:
select dept, date1,
       CASE WHEN StartFlag = 0 THEN 1
            ELSE 1+StartFlag+NVL(lag(StartFlag) over (order by date1),0)
       END as rnk
from (select t1.*,
             (case when dept = lag(dept) over (order by date1)
                   then 1
                   else 0
              end) as StartFlag
      from t1
     ) t1
order by date1;

Friday, 14 August 2015

CUBE, ROLLUP , GROUPING SETS & GROUPING_ID


The GROUPING_ID function provides an alternate and more compact way to identify subtotal rows. Passing the dimension columns as arguments, it returns a number indicating the GROUP BY level.





Generate RANDOM values

You can generate random values from DBMS_RANDOM supplied package.


Get SAMPLE rows from a table


CUBE, ROLLUP and GROUPING SETS

GROUPING SETS simplifies the management of the subtotals


CUBE, ROLLUP and GROUPING SETS

CUBE:
Super aggregation is performed for each dimension.


CUBE, ROLLUP and GROUPING SETS

The CUBE, ROLLUP and GROUPING SETS functions are used in the GROUP BY clause to generate totals and subtotals.

ROLLUP can have more than one dimension, generating grand totals and subtotals:






















Calculate maximum and minimum salary without using GROUP functions



Saturday, 8 August 2015

Complex SQL Scenario

Get the deptno from employee table that are not in dept table and deptno from dept table that are not in employee table without using SET operators.

SELECT
COALESCE(DEPT.DEPTNO,EMP.DEPTNO)
FROM
EMP
FULL OUTER JOIN
DEPT
ON EMP.DEPTNO=DEPT.DEPTNO
WHERE EMPNO IS NULL OR DNAME IS NULL

/

Friday, 7 August 2015

Complex SQL Scenario

Get the minimum date if the consecutive code is same

CT KBB(DT DATE,COD VARCHAR(1))
INS INTO KBB VALUES('2014-01-01','A');
INS INTO KBB VALUES('2014-01-02','A');
INS INTO KBB VALUES('2014-01-03','D');
INS INTO KBB VALUES('2014-01-04','A');
INS INTO KBB VALUES('2014-01-05','D');
INS INTO KBB VALUES('2014-01-06','A');

Solution:
SEL EFF_DT,COD,
RANK() OVER(ORDER BY EFF_DT RESET WHEN COD<>MAX(COD) 
OVER(ORDER BY EFF_DT ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))

FROM KBB

Thursday, 6 August 2015

Residual Condition in Explain Plan

Residual condition means a filter is applied on particular table to limit the number of rows fetched into Spool.
Say for example
Sel * from EMP where sal > 1000;

Pseudo Table in Explain Plan

It is a false lock which is applied on the table to prevent  two users from getting conflicting locks with all-AMP requests.
PE will determine an particular AMP to manage all AMP LOCK requests for given table and Put Pseudo lock on the table.

To put in Simple terms , its like an University with 10 gates and at a given time you can enter the university through one gate ( or main gate) for security check.

Reasons for product joins


1. Stale or no stats causing optimizer to use product join
2. Improper usage of aliases in the query.
3. missing where clause ( or Cartesian product join  1=1 )
4. non equality conditions like > ,< , between   example ( date)
5. few join conditions
6.  when or conditions are used.

last but not the least   product joins are not bad always!! sometimes PJ are better compared to other types of joins

Different types of skews

If you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called skew exists. Major are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).

-Data skew?
When data is not distributed equally on all the AMPs.
-Cpu skew?
Who is taking/consuming more CPU called cpu skew.
-IO skew?
Who perform more IO Operation? Resulting in IO Skew

TENACITY and it's default value


  • TENACITY specifies the amount of time in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. 
  • The default for Fast Load is “no tenacity”, meaning that it will not retry at all. 
  • If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours.

Query to find the skew factor in Teradata

SELECT
TABLENAME,
SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,
(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR
FROM
DBC.TABLESIZE
WHERE DATABASENAME= <DATABASENAME>
AND
TABLENAME =<TABLENAME>
GROUP BY 1;

Acceptable range for skew factor in a table


There is no particular range for skew factor.  In case of production systems, it is suggested to keep skew factor between 5-10.
There are various considerations for skew factor
- Number of AMPS
- Size of row in a table
- number of records in a table
- PI of a table
- Frequent access of table (Performance consideration)
- whether table getting loaded daily /monthly or how frequently data is being refreshed

Tuesday, 4 August 2015

How to skip the header row in the fastload script



RECORD 2; /* this skips first record in the source file */

Why Fload doesn’t support multiset table in Teradata?



Fload does not support Multiset table because of restart capability.

Say, the fastload job fails. Till the fastload failed, some number of rows was sent to the AMP's.
Now if you restart FLOAD,  it would start loading record from the last checkpoint and some of the consecutive rows are sent for the second time. These will be caught as duplicate rows are found after sorting of data.

This restart logic is the reason that Fastload will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic. Fastload support Multiset table but does not support the duplicate rows. Multiset tables are tables that allow duplicate rows. When Fastload finds the duplicate rows it discards it. Fast Load can load data into multiset table but will not load the duplicate rows

What are the types of HASH functions used in Teradata?


These are the types of HASH, HASHROW, HASHAMP and HASHBAKAMP. Their SQL functions are-
HASHROW (column(s))
HASHBUCKET (hashrow)
HASHAMP (hashbucket)
HASHBAKAMP (hashbucket)

To find the data distribution of a table based on PI, below query will be helpful. This query will give the number of records in each AMP for that particular table.

SELECT HASHAMP(HASHBUCKET(HASHROW(PI_COLUMN))),COUNT(*) FROM TABLENBAME GROUP BY 1.

How to create a table with an existing structure of another table with or without data and also with stats defined in Teradata?



CREATE TABLE new_TABLE AS old_TABLE WITH DATA;
CREATE TABLE new_TABLE AS old_TABLE WITH NO DATA;
CREATE TABLE new_TABLE AS old_TABLE WITH DATA AND STATS;

How to calculate the tablesize , database size and free space left in a database in teradata?



DBC.TABLESIZE and DBC.DISKSPACE are the systems tables used to find the space occupied.

Below Query gives the table size of each tables in the database and it will be useful to find the big tables in case of any space recovery.
SELECT         DATABASENAME,
TABLENAME,
SUM(CURRENTPERM/(1024*1024*1024)) AS "TABLE SIZE"
FROM
DBC.TABLESIZE
WHERE
DATABASENAME = 'DATABASE_NAME' AND TABLENAME = 'TABLE_NAME'
GROUP BY 1,2;

Below query gives the total space and free space available in a database.

SELECT
        DATABASENAME DATABASE_NAME,
        SUM(MAXPERM)/(1024*1024*1024) TOTAL_PERM_SPACE,
        SUM(CURRENTPERM)/(1024*1024*1024) CURRENT_PERM_SPACE,
        TOTAL_PERM_SPACE-CURRENT_PERM_SPACE as FREE_SPACE
 FROM
        DBC.DISKSPACE
 WHERE
        DATABASENAME = 'DATABASE_NAME'
group by 1;

Teradata Locks