Tuesday, March 8, 2011

Finding duplicate records

I do not find any reason to store duplicate records in a database table. Duplicate records will always create issues. On this blog I am not going after duplicate record related issues but I will write SQL code to find duplicate records that exist in a database.  

First let’s create a small table in oracle database and add few duplicate records to it.

create table Dups
(First_Name varchar2(20)
,last_name varchar2(20)
,Salary number ) ;

Above SQL creates table named Dups in Oracle database. Now let’s insert few rows into the table. This time, to save our time, we will use data from another table and insert them into our new table. Let’s use data from employees table.

insert into dups
(select FIRST_NAME, LAST_NAME, SALARY from employees);

Since we want duplicates in Dups table let’s use above SQL one more time. This time let’s add where clause to the SQL.

insert into dups
(select FIRST_NAME, LAST_NAME, SALARY from employees
where Employee_id > 150);

Since we added where clause to the above SQL (where employee_id >150), we will have duplicate records of those employees whose id is greater than 150. Now let’s write SQL to find duplicate records. We will use SQL analytical function to find duplicates.

select First_name, Last_name, Salary, Count(First_name) OVer (Partition by First_name, Last_name, Salary) as Dup_count from dups;

FIRST_NAME           LAST_NAME            SALARY                 DUP_COUNT             
-------------------- -------------------- ---------------------- ----------------------
Adam                                        Fripp                8200                   1                     
Alana                                        Walsh                3100                   2                      
Alana                                        Walsh                3100                   2     

In the above example, if dup_count > 1 then that record has duplicate value. Now let’s add few codes to the existing SQL to show only duplicate records.

Select distinct First_name, Last_name, Salary from
(select First_name, Last_name, Salary, Count(First_name)OVer (Partition by First_name, Last_name, Salary) as Dup_count from dups)
a where Dup_count > 1;
Above SQL will show only duplicate records that exist in Dups table. In my next blog i will show how to get rid of duplicate rocords.



No comments:

Post a Comment