Login or Sign Up to become a member!
LessThanDot Site Logo

LessThanDot

Community Wiki

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Navigation

Google Ads

Cursors and How to Avoid Them

From Wiki

Jump to: navigation, search

Row by row processing, the impact on performance

In order to avoid using cursors, the developer must first understand why it is important to do so. Cursors operate on one row at a time whereas SQL Server is optimized to perform set-based operations (those affecting all the rows in one operation). Therefore a set-based solution is usually significantly faster than a cursor solution. How much faster of course depends on many factors, the number of records, the complexity of the operation, the load on the server, etc. But in every case where I have ever been able to replace a cursor with a set-based solution, performance has improved significantly. One real life example, deleting 40,000 records from a table is shown below:

Reads	   Writes   Time to process   Type of Action
321704224  46355    47 Min 8 seconds  With Cursor
1480203	   31109    32 Seconds	      Without Cursor

I want to emphasize that the above example is typical of the performance gain by replacing a cursor. I’ve even seen times when a process was reduced from several hours to seconds or minutes. Once I eliminated a cursor from a process and the time was reduced from more than 24 hours to less than ten minutes. As you can see, cursors are performance hogs that should only be used if the data set affected is very tiny (and the cursor will not be used frequently by the application) or if there is no other realistic choice to accomplish the task. I also believe that even when the data set it small, it is usually worthwhile to learn to perform the task without a cursor, so that the methods to avoid the cursor are well-learned and available when you need to affect a large data set. This is particularly true for the developer who has not yet learned to think in set-based terms as his or her first choice.

Typical Uses of a Cursor and Set-Based Replacements for Them

To perform an insert using the values clause

Sometimes a cursor is used because the developer only knows how to insert using the VALUES clause. Since this syntax supports the insert of only one record at a time, a cursor is used if you have multiple records. However, it is better in this case to use the SELECT clause syntax instead of the VALUES clause.

For instance, the following code:

  1. DECLARE @Employee_Name VARCHAR (100), @Employee_Start_Date DATETIME
  2.  
  3. DECLARE Employees_Cursor CURSOR
  4.    FOR
  5. SELECT Employee_name, Employee_Start_Date FROM employees
  6.     where Employee_Start_Date > Dateadd(d,-7, getdate())
  7. OPEN Employees_Cursor
  8.  
  9. FETCH NEXT FROM Employees_Cursor INTO @Employee_Name, @Employee_Start_Date
  10. WHILE @@FETCH_STATUS = 0
  11. BEGIN
  12. INSERT New_Employees (New_Employee_name, New_Employee_Start_Date)
  13. VALUES (@Employee_Name, @Employee_Start_Date)
  14.  
  15. FETCH NEXT FROM Employees_Cursor INTO @Employee_Name, @Employee_Start_Date
  16. END
  17. CLOSE Employees_Cursor
  18. DEALLOCATE Employees_Cursor

can be replaced with

  1. INSERT New_Employees (New_Employee_name, New_Employee_Start_Date)
  2. SELECT Employee_name, Employee_Start_Date FROM employees
  3.     where Employee_Start_Date > Dateadd(d,-7, getdate())


Using joins in updates, and deletes

Sometimes developers use cursors because they don’t know the syntax for using joins in updates or deletes. So when they want to update or delete based on the values in a differ table they reach for a cursor. Using a join is not only faster, the syntax is actually simpler once you become familiar with it.

In the case of an update where you need to reference another table, the cursor solution might look like this:

  1. DECLARE @person_id INT
  2. DECLARE @field2 VARCHAR (10)
  3.  
  4. DECLARE TestCursor3 CURSOR FOR
  5. SELECT t2.person_id, t2.field2 FROM table2 t2
  6.  
  7. OPEN TestCursor3
  8.  
  9. FETCH NEXT FROM TestCursor3
  10. INTO @person_id, @field2
  11.  
  12. WHILE @@FETCH_STATUS = 0
  13. BEGIN
  14.  
  15. UPDATE table1
  16. SET field2 = @field2
  17. WHERE person_id =  @person_id
  18.  
  19. FETCH NEXT FROM TestCursor3 INTO  @person_id, @field2
  20.  
  21. END
  22.  
  23. CLOSE TestCursor3
  24. DEALLOCATE TestCursor3

It can be replaced by the following set-based solution:

  1. UPDATE t1
  2. SET field2 = t2.field2
  3. --SELECT t1.field2, t2, field2, t1.person_id
  4. FROM table1 t1
  5. JOIN table2 t2 ON t2.person_id = t1.person_id

Note: I have placed a commented out line that includes a select in this statement. Normally when I am creating this type of code, I will do this so that I can verify what will be updated before running the code while I am in the development process. This can save many hours of fixing a bad update because you didn’t realize it would update 1,000,000 rows when you thought that only about 1,000 should have been updated. I never run an update or delete statement for the first time that contains a join or complex where clause without first checking the select for the same code. If I am going to do something that impacts the data in the database, I need to be sure it will impact the correct data before I run it against production data.

Now suppose I wanted to delete any records from one table where there is no corresponding record in a different table. If I use a cursor, the code might look something like this:

  1. DECLARE @person_id INT
  2.  
  3. DECLARE TestCursor4 CURSOR FOR
  4. SELECT t1.person_id FROM table1 t1
  5. JOIN table2 t2 ON t2.person_id = t1.person_id
  6. WHERE t2.field1 IS NULL
  7. OPEN TestCursor4
  8.  
  9. FETCH NEXT FROM TestCursor4
  10. INTO @person_id
  11.  
  12. WHILE @@FETCH_STATUS = 0
  13. BEGIN
  14.  
  15. DELETE table1
  16. WHERE person_id =  @person_id
  17.  
  18. FETCH NEXT FROM TestCursor4
  19.    INTO   @person_id
  20.  
  21. END
  22.  
  23. CLOSE TestCursor4
  24. DEALLOCATE TestCursor4

This can be replaced by the following set-based solution:

  1. DELETE T1
  2. --SELECT1.*
  3. FROM table1 T1
  4. JOIN table2 T2 ONT2.person_id =T1.person_id
  5. WHERE T2.field1 IS NULL


The need to process in different tables depending on a value in the table

  1. DECLARE @person_id INT
  2. DECLARE @field2 VARCHAR (10)
  3.  
  4. DECLARE TestCursor6 CURSOR FOR
  5. SELECT T2.person_id, T2.field2 FROM table2 T2
  6.  
  7. OPEN TestCursor6
  8.  
  9. FETCH NEXT FROM TestCursor6
  10. INTO @person_id, @field2
  11.  
  12. WHILE @@FETCH_STATUS = 0
  13. BEGIN
  14.  
  15. IF @field2 = 'test'
  16. BEGIN
  17. UPDATE table1
  18. SET field2 = @field2
  19. WHERE person_id =  @person_id
  20. END
  21. ELSE
  22. UPDATE table3
  23. SET field2 = @field2
  24. WHERE person_id =  @person_id
  25.  
  26. FETCH NEXT FROM TestCursor6
  27.    INTO   @person_id, @field2
  28.  
  29.  
  30. END
  31.  
  32. CLOSE TestCursor6
  33. DEALLOCATE TestCursor6

You can replace the cursor with

  1. UPDATE T1
  2. SET field2 =T2.field2
  3. --SELECT T1.field2, T2.field2, T1.person_id
  4. FROM table1 T1
  5. JOIN table2 T2 ON T2.person_id = T1.person_id
  6. WHERE field2 = 'test'
  7.  
  8. UPDATE T3
  9. SET field2 =  t2.field2
  10. --SELECT T3.field2,T2, field2, T3.person_id
  11. FROM table3 T3
  12. JOIN table2 Tt2 ON T2.person_id = T3.person_id
  13. WHERE field2 <> 'test'


Using case for special treatment based on a the data in a field

Sometimes developers don’t realize that they can use a CASE statement to perform processing based on the data in the field. So they write a cursor to look at each value and go to a particular branch of the code depending on the value.

  1. DECLARE @Employee_ID  INT, @Employee_Status varchar (100)
  2.  
  3. DECLARE Employees_Cursor CURSOR
  4.    FOR
  5. SELECT Employee_ID, Employee_Status FROM Employees
  6. OPEN Employees_Cursor
  7.  
  8. FETCH NEXT FROM Employees_Cursor INTO @Employee_ID, @Employee_Status
  9. WHILE @@FETCH_STATUS = 0
  10. BEGIN
  11. IF @Employee_Status = 'new hire probation'
  12.  
  13. UPDATE Employee_groups
  14. SET Group_id = 2
  15. where Employee_ID = @Employee_ID
  16.  
  17. ELSE IF @Employee_Status = 'performance probation'
  18.  
  19. UPDATE Employee_groups
  20. SET Group_id = 2
  21. where Employee_ID = @Employee_ID
  22.  
  23. ELSE
  24.  
  25. UPDATE Employee_groups
  26. SET Group_ID= 1
  27. where Employee_ID = @Employee_ID
  28.  
  29. FETCH NEXT FROM Employees_Cursor INTO @Employee_ID, @Employee_Status
  30. END
  31. CLOSE Employees_Cursor
  32. DEALLOCATE Employees_Cursor

Replace the cursor with a set-based solution using the CASE statement

  1. UPDATE EG
  2. SET Group_ID = CASE    
  3.             WHEN E.Employee_Status = ‘new hire probation’ THEN 2
  4.             WHEN E.Employee_Status = ‘performance probation’ THEN 3
  5.             ELSE 1 END
  6. FROM Employee_groups EG
  7. JOIN Employees E on E.Employee_ID = EG.EmployeeID


The need to do either an update or an insert depending on whether the record exists in the table

Another time people use cursors is when they don’t know if a record is already in a table. So they look at each record individually to decide to insert or update.

  1. DECLARE @Employee_ID  INT
  2.  
  3. DECLARE Employees_Cursor CURSOR
  4.    FOR
  5. SELECT Employee_ID FROM Employees
  6. OPEN Employees_Cursor
  7.  
  8. FETCH NEXT FROM Employees_Cursor INTO @Employee_ID
  9. WHILE @@FETCH_STATUS = 0
  10. BEGIN
  11. IF EXISTS(SELECT 1 FROM EmployeeAddress WHERE Employee_ID = @Employee_ID)
  12.  
  13. UPDATE EA
  14. SET Street = EI.Street,
  15. City = EI.city,
  16. [State] = EI.[State]
  17. from EmployeeAddress EA
  18. JOIN EmployeeImport EI on EA.Employee_id = EI.Employee_id
  19. where EA.Employee_ID = @Employee_ID
  20.  
  21. ELSE
  22.  
  23. INSERT EmployeeAddress (Employee_id, Street, City, [State])
  24. select Employee_id, Street, City, [State] from EmployeeImport EI
  25. where EI.Employee_ID = @Employee_ID
  26.  
  27. FETCH NEXT FROM Employees_Cursor INTO @Employee_ID
  28. END
  29. CLOSE Employees_Cursor
  30. DEALLOCATE Employees_Cursor

This can be replaced with a MERGE query

  1. MERGE dbo.EmployeeAddress AS Target
  2. USING (select Employee_id, Street, City, [State] from EmployeeImport EI)
  3.     AS Source
  4. ON (Target.Employee_id = Source.Employee_id )
  5. WHEN MATCHED THEN
  6.     UPDATE
  7.     SET Target.Street = Source.Street,
  8.     Target.City = Source.city,
  9.     Target.[State] =Source.[State]
  10. WHEN NOT MATCHED BY TARGET THEN
  11.     INSERT(Employee_id, Street, City, [State])
  12.     VALUES( Source.Employee_id, Source.Street, Source.City, Source.[State])

See also:

The Truth About Cursors: Part I

The Truth About Cursors: Part II

The Truth About Cursors: Part III

781 Rating: 3.8/5 (93 votes cast)