Thursday, November 1, 2012

Removing Duplicate Records in a SQL Server Table


Removing Duplicate Records in a SQL Server Table

Step 1: Creating a Table
/* Create Table with 7 entries - 3 are duplicate entries */
CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)
Step 2: Inserting data into the table
INSERT INTO DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 --duplicate
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO   
Step 3: Selecting Data
/* It should give you 7 rows */
SELECT *
FROM DuplicateRcordTable
GO   
Step 4: Deleting the Duplicate Records
/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO   

a CTE acts as a virtual table, I am able to process data modification statements against it, and the underlying table will be affected. In this case, I am removing any record from the SalesCTE that is ranked higher than 1. This will remove all of my duplicate records.
Step 5: Selecting the records after deleting the duplicate records
/* It should give you Distinct 4 records */
SELECT *
FROM DuplicateRcordTable
GO
----------------------------------------------------------------------------

2 way


Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.
DELETE
FROM 
MyTable WHERE ID NOT IN

(SELECT MAX(ID)FROM MyTable GROUP BY DuplicateColumn1DuplicateColumn2DuplicateColumn3)


--------------------------------------------------------------------------------------------

No comments:

Post a Comment