Eliminating Duplicate Data in a Specific Column in SQL Server

2 min readJan 26, 2023

Duplicate data can be a common issue when working with databases, and it can have a negative impact on the performance and accuracy of your queries. In this article, we will discuss how to eliminate duplicate data in a specific column in a table in SQL Server.

The first method we will cover is using the DELETE statement with a subquery that employs the GROUP BY clause. This query will identify and select the duplicate rows, and then delete them from the table. Here is an example of how to use this method:

DELETE FROM tablename
WHERE columnname IN (
    SELECT columnname
    FROM tablename
    GROUP BY columnname
    HAVING COUNT(columnname) > 1
)

This query will delete all rows that have duplicate values in the specified column. The subquery selects only the duplicated values, and the outer query deletes all rows that have those values in the specified column.

Another method to achieve this is using CTE (Common Table Expression) with ROW_NUMBER(). This method will delete all rows that have duplicate values in the specified column, but will keep the original record.

WITH cte AS (
   SELECT columnname, ROW_NUMBER() OVER (PARTITION BY columnname ORDER BY columnname) as rn
   FROM tablename
)
DELETE FROM cte WHERE rn > 1;

In conclusion, duplicate data can cause issues in your database and it is important to eliminate it to maintain the accuracy and performance of your queries. By using the DELETE statement with a subquery or CTE with ROW_NUMBER() in SQL Server, you can effectively remove duplicate data from a specific column in a table. You can choose the best method depends on your need, if you want to keep the original record or delete all the duplicate data.

Eliminating Duplicate Data in a Specific Column in SQL Server

Written by Ambiyansyah Risyal