2009-08-26

Reserving a range of identity values

Recently I had need of a way of knowing the identity values assigned to each record that was bulk inserted into a table. If you are inserting the rows one by one, good old @@IDENTITY or the saner SCOPE_IDENTITY() are your friends. For bulk inserts there's no way of knowing what identities were assigned to each record unless you take careful note of the starting identity value and the order in which the records are inserted.

This is doable, but too brittle for my taste. I'd much rather like to turn it around and first give the records identity values that I know to be unique, then perform the insert with identity_insert on. As it turns out, this isn't very hard. Below is a stored procedure that will reserve a range of identity values that you can use for subsequent inserts:

  CREATE PROCEDURE reserve_identity_range(@table SYSNAME, @size INT) AS BEGIN
      SET NOCOUNT ON;
      DECLARE @SQL NVARCHAR(MAX);
      SET @SQL =
          'BEGIN TRANSACTION;' +
          '    DELETE TOP(0) FROM ' + @table + ' WITH (TABLOCKX);' +
          '    DECLARE @before_identity INT, @after_identity INT;' +
          '    SET @before_identity = IDENT_CURRENT(''' + @table + ''');' +
          '    SET @after_identity = @before_identity + ' + CAST(@size AS NVARCHAR) +
          '    DBCC CHECKIDENT(''' + @table + ''', RESEED, @after_identity);' +
          '    SELECT @before_identity AS before_identity, @after_identity AS after_identity;' +
          'COMMIT;'
      ;
      PRINT @SQL;
      EXEC(@SQL);
  END;

The use of IDENT_CURRENT() and DBCC CHECKIDENT is pretty straightforward, as is the always yucky but necessary dynamic SQL (the usual caveats apply).

The only "mysterious" statement is the transaction with DELETE TOP(0) .. WITH (TABLOCKX). This is a do-nothing construct to ensure atomicity with regards to any other statement, in particular INSERT statements and other reserve_identity_range invocations. It would have been SELECT TOP(0), but that would interfere with the result set.

Be careful: even though the reserve is atomic, this still doesn't mean the result is always safe to use. Someone could come and do a TRUNCATE TABLE or DBCC CHECKIDENT(..., RESEED) afterwards (both of which set the increased identity value back) and mess up your subsequent INSERT. If you really want to be safe you can maintain the table lock, but this has obvious consequences for concurrency.

If you are using (or want to use) the ORDER hint of BULK INSERT to supply records in clustered index order (saving SQL Server a sorting step), you have to be careful not to make the explicitly assigned identity column interfere with this. If your clustered index is the identity column, you should definitely supply ORDER because you are assigning the identities yourself and are therefore in a prime position to order them without even needing to sort them at all.

Addendum: Apologies for the lack of syntax highlighting; I currently have no good way of doing this. I did it in previous posts, but all I remember is that it was an incredibly laborious procedure. The closest I get is pasting the syntax-highlighted code from Management Studio into Word, and from there you can copy and paste HTML—but Word makes an absolute mess of HTML that even Tidy can hardly fix, and the resulting output is too hard to integrate into a post. I think the way to go is a very simple application that integrates RTF to HTML conversion, but I haven't whipped it up yet.

No comments: