T-SQL CRUD Generation
Following on from my last post complaining about MSSQLMS UI code generation, I thought I would back it up with a few T-SQL scripts that generate code that I actually want to see.
I'm currently developing a large scale, real life application using my new .Net 4.0 nTier framework, so I've decided to make my life a little bit easier by generating the boring old CRUD stored procedures with some funky T-SQL scripts. I'm also currently looking in to Visual Studio's T4 code generation tools to help me create my business objects, as most of those will also be fairly standard - though I still need the flexibility to change them (and guarantee that a code generation tool isn't going to overwrite my changes) once the templates have all been created.
Let me start by saying I'm usually against the use of the 'sp_executesql' stored procedure due to the additional overhead created by executing dynamic SQL. In this instance however, I'm happy to use it because these scripts are only intended to be fired on the rare occasion you need your CRUD procedures generated. Normally I avoid dynamic SQL like the plague, sometimes creating complex JOIN or WHERE clauses to circumvent the need. However, I saw very little alternative this time around, and as these scripts do not form part of a software system (only the set up of said system), I'm allowing my personal standard to slip slightly.
The Code
This code is intended for use with the nTier framework I'm currently writing, but can be used individually or modified to fit any CRUD requirements you may have. It isn't particularly efficient, but it still executes very fast in my environment. It conditionally drops and creates 5 stored procedures for each table that exists in the database its executed against, which conform to the standards set in my framework's SQL Server Data Provider. As a side note, I did try to achieve this with the undocumented 'sp_MSforeachtable' stored procedure, but unfortunately the replacement character (? by default) used within the procedure replaces it with a fully qualified name, which meant that I couldn't use it to create my procedure names.
The first thing to point out here is the use of a T-SQL cursor. Many developers tend to steer away from cursors, though I've found if you manage your tempdb and database log files and code your cursors well, they can make your life considerably easier. A lot of the functionality I used cursors for has been superseded by CTEs as from SQL Server 2005, though they still have their uses.
Secondly, there's an extensive use of the INFORMATION_SCHEMA schema views. They provide meta data on most of the objects within your database, though not always everything you need. I still need to delve in to the sys schema views every now and again for bits of information, and it's sometimes helpful to know that all of these "system tables" are actually views, derived from hidden tables named "sys.objects$", etc.
Thirdly, a number of user-defined functions are executed at the beginning of each cursor loop to generate strings for the CRUD statements. I created these simply to save an extra 50 or so lines of code within the script. I'm not particularly proud of them (they were thrown together quite quickly), but they do use the FOR XML PATH clause for some funky concatenation. I'm sure the string manipulation could be done better too, but it works perfectly well for my requirements:
What these functions do is generate a list of column names (and data types in the instance of 'fn_GetTableColumnsSPParams') in different formats, to be used to create the various CRUD stored procedures. They are assigned to variables so the functions aren't executed multiple times during the cursor loop.
Any suggestions for a better way to remove / replace the column names / data types would be appreciated - I haven't thought that much about it, but there must be a T-SQL XML guru that knows a better way to manipulate them than I do.
The last point to mention is the @PKField variable. This is only set if a table has a single column PRIMARY KEY - and if it doesn't, the CRUD procedures are not created (and a simple message is output instead). If a table does not have a primary key defined, the script assumes it has no way of uniquely identifying a row. This of course doesn't take in to account UNIQUE constraints, but it should be punishable by law if a PRIMARY KEY hasn't been declared on a table - I have yet to come across a situation where it is acceptable to ignore this fundamental rule of RDBMS design.
On a related matter, many developers state that all tables must always have an identity column that generates a unique key, used to identify an individual record. This is simply wrong; in 4th Normal Form you must be able to uniquely identify a record, but that doesn't mean you have to create additional data by adding a superfluous IDENTITY column. I would say that in storage and look-up tables (e.g. "user", "news" and "news category") an IDENTITY is 110% required - each row needs to be uniquely identifiable and easy to reference via a FOREIGN KEY and in SQL Server, IDENTITY is the most efficient way to do it. Obviously the scale of your application determines the data type; INT being the norm, BIGINT for larger scale databases / tables, and GUID for insanely large systems. However, when it comes to link tables (tables created for the purpose of "many to many" relationships), a multiple-value PRIMARY KEY will usually suffice. This reduces storage by 4 bytes per row (for INT identities) and completely eradicates the possibility of duplicate link table rows.
Again, a lack of RDBMS understanding usually leads to sloppy database design... if a news post can be tagged with multiple categories, a "NewsCategories" table should be created with a minimum of two fields - the NewsID and the CategoryID. These fields represent the PRIMARY KEY for the table, and one should be created in order to force developers to adhere to these data storage rules. I would even forgive the creation of a multi-column UNIQUE constraint instead of a PRIMARY KEY, the only difference being that UNIQUE constraints can contain NULL values, whereas any columns used in a PRIMARY KEY constraint cannot.
Anyway - I hope this comes in handy. I've used a similar approach in setting up my initial tables (the tables are coded individually, but instead of having multiple DROP statements I remove constraints and drop tables in a loop) and also creating default audit triggers on all of my tables.
I'm currently developing a large scale, real life application using my new .Net 4.0 nTier framework, so I've decided to make my life a little bit easier by generating the boring old CRUD stored procedures with some funky T-SQL scripts. I'm also currently looking in to Visual Studio's T4 code generation tools to help me create my business objects, as most of those will also be fairly standard - though I still need the flexibility to change them (and guarantee that a code generation tool isn't going to overwrite my changes) once the templates have all been created.
Let me start by saying I'm usually against the use of the 'sp_executesql' stored procedure due to the additional overhead created by executing dynamic SQL. In this instance however, I'm happy to use it because these scripts are only intended to be fired on the rare occasion you need your CRUD procedures generated. Normally I avoid dynamic SQL like the plague, sometimes creating complex JOIN or WHERE clauses to circumvent the need. However, I saw very little alternative this time around, and as these scripts do not form part of a software system (only the set up of said system), I'm allowing my personal standard to slip slightly.
The Code
This code is intended for use with the nTier framework I'm currently writing, but can be used individually or modified to fit any CRUD requirements you may have. It isn't particularly efficient, but it still executes very fast in my environment. It conditionally drops and creates 5 stored procedures for each table that exists in the database its executed against, which conform to the standards set in my framework's SQL Server Data Provider. As a side note, I did try to achieve this with the undocumented 'sp_MSforeachtable' stored procedure, but unfortunately the replacement character (? by default) used within the procedure replaces it with a fully qualified name, which meant that I couldn't use it to create my procedure names.
--------------------------------------------------------------------------- -- DEVELOPMENT BOILER PLATE ---------------------------- -- AUTHOR: Chris Sebok -- COMPANY: Shadow Moses Developments Ltd -- DATE: 4th February 2011 -- -- FILE DESCRIPTION: Creates CRUD stored procedures for each table. ----------------------------------------------------------------------------- DECLARE @table_name VARCHAR(500), @instruction VARCHAR(MAX), @PKField VARCHAR(MAX) DECLARE @columnsSPDeclare VARCHAR(MAX), @columnsCSV VARCHAR(MAX), @columnsUPDATE VARCHAR(MAX) DECLARE @columnsINSERTCols VARCHAR(MAX), @columnsINSERTParams VARCHAR(MAX) DECLARE curTables CURSOR READ_ONLY FAST_FORWARD FOR SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES OPEN curTables FETCH NEXT FROM curTables INTO @table_name WHILE @@FETCH_STATUS = 0 BEGIN -- Create the following SPs for each table -- sp_TABLE_SelectByUniqueIdentifier -- sp_TABLE_SelectAll -- sp_TABLE_Save -- sp_TABLE_Delete -- sp_TABLE_Exists SET @columnsCSV = dbo.fn_GetTableColumnsCSV(@table_name) SET @columnsSPDeclare = dbo.fn_GetTableColumnsSPParams(@table_name) SET @columnsUPDATE = dbo.fn_GetTableColumnsForUPDATE(@table_name) SET @columnsINSERTCols = dbo.fn_GetTableColumnsForINSERTColumns(@table_name) SET @columnsINSERTParams = dbo.fn_GetTableColumnsForINSERTParams(@table_name) SET @PKField = NULL -- Get the table's primary key field name SELECT @PKField = MAX(COLUMN_NAME) FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME WHERE tc.TABLE_NAME = @table_name AND CONSTRAINT_TYPE = 'PRIMARY KEY' HAVING COUNT(*) = 1 IF @PKField IS NOT NULL BEGIN -- Drop all CRUD SPs SET @instruction = 'IF OBJECT_ID (''sp_' + @table_name + '_SelectByUniqueIdentifier'',''P'') IS NOT NULL DROP PROC sp_' + @table_name + '_SelectByUniqueIdentifier;' exec sp_sqlexec @instruction SET @instruction = 'IF OBJECT_ID (''sp_' + @table_name + '_SelectAll'',''P'') IS NOT NULL DROP PROC sp_' + @table_name + '_SelectAll;' exec sp_sqlexec @instruction SET @instruction = 'IF OBJECT_ID (''sp_' + @table_name + '_Save'',''P'') IS NOT NULL DROP PROC sp_' + @table_name + '_Save;' exec sp_sqlexec @instruction SET @instruction = 'IF OBJECT_ID (''sp_' + @table_name + '_Delete'',''P'') IS NOT NULL DROP PROC sp_' + @table_name + '_Delete;' exec sp_sqlexec @instruction SET @instruction = 'IF OBJECT_ID (''sp_' + @table_name + '_SelectByUniqueIdentifier'',''P'') IS NOT NULL DROP PROC sp_' + @table_name + '_SelectByUniqueIdentifier;' exec sp_sqlexec @instruction SET @instruction = 'IF OBJECT_ID (''sp_' + @table_name + '_Exists'',''P'') IS NOT NULL DROP PROC sp_' + @table_name + '_Exists;' exec sp_sqlexec @instruction -- Create all SPs ------------------------------- -- _SelectByUniqueIdentifier ------------------------------- SET @instruction = ' CREATE PROC sp_' + @table_name + '_SelectByUniqueIdentifier @UniqueIdentifier BIGINT AS SET NOCOUNT ON SELECT ' + @columnsCSV + ' FROM ' + @table_name + ' (nolock) WHERE ' + @PKField + ' = @UniqueIdentifier' EXEC sp_sqlexec @instruction ------------------------------- -- _SelectAll ------------------------------- SET @instruction = ' CREATE PROC sp_' + @table_name + '_SelectAll AS SET NOCOUNT ON SELECT ' + @columnsCSV + ' FROM ' + @table_name + ' (nolock)' EXEC sp_sqlexec @instruction ------------------------------- -- _Save ------------------------------- SET @instruction = ' CREATE PROC sp_' + @table_name + '_Save @UniqueIdentifier INT OUTPUT, ' + @columnsSPDeclare + ' AS IF ISNULL(@UniqueIdentifier, 0) > 0 BEGIN IF EXISTS(SELECT * FROM ' + @table_name + ' (nolock) WHERE ' + @PKField + ' = @UniqueIdentifier) BEGIN UPDATE ' + @table_name + ' SET ' + @columnsUPDATE + ' WHERE ' + @PKField + ' = @UniqueIdentifier END ELSE BEGIN SET @UniqueIdentifier = 0 END END ELSE BEGIN INSERT INTO ' + @table_name + '(' + @columnsINSERTCols + ') VALUES(' + @columnsINSERTParams + ') SET @UniqueIdentifier = SCOPE_IDENTITY() END' EXEC sp_sqlexec @instruction ------------------------------- -- _Exists ------------------------------- SET @instruction = ' CREATE PROC sp_' + @table_name + '_Exists @UniqueIdentifier INT OUTPUT AS IF NOT EXISTS(SELECT * FROM ' + @table_name + ' (nolock) WHERE ' + @PKField + ' = @UniqueIdentifier) SET @UniqueIdentifier = 0' EXEC sp_sqlexec @instruction ------------------------------- -- _Delete ------------------------------- SET @instruction = ' CREATE PROC sp_' + @table_name + '_Delete @UniqueIdentifier INT AS DELETE FROM ' + @table_name + ' WHERE ' + @PKField + ' = @UniqueIdentifier' EXEC sp_sqlexec @instruction END ELSE PRINT 'No CRUD procs created for ' + @table_name + ' (Single column Primary Key not found)' FETCH NEXT FROM curTables INTO @table_name END CLOSE curTables DEALLOCATE curTables
The first thing to point out here is the use of a T-SQL cursor. Many developers tend to steer away from cursors, though I've found if you manage your tempdb and database log files and code your cursors well, they can make your life considerably easier. A lot of the functionality I used cursors for has been superseded by CTEs as from SQL Server 2005, though they still have their uses.
Secondly, there's an extensive use of the INFORMATION_SCHEMA schema views. They provide meta data on most of the objects within your database, though not always everything you need. I still need to delve in to the sys schema views every now and again for bits of information, and it's sometimes helpful to know that all of these "system tables" are actually views, derived from hidden tables named "sys.objects$", etc.
Thirdly, a number of user-defined functions are executed at the beginning of each cursor loop to generate strings for the CRUD statements. I created these simply to save an extra 50 or so lines of code within the script. I'm not particularly proud of them (they were thrown together quite quickly), but they do use the FOR XML PATH clause for some funky concatenation. I'm sure the string manipulation could be done better too, but it works perfectly well for my requirements:
----------------------------------------------------------------------------- -- SELECT dbo.fn_GetTableColumnsCSV('Users') ----------------------------------------------------------------------------- IF OBJECT_ID('fn_GetTableColumnsCSV', 'FN') > 0 DROP FUNCTION dbo.fn_GetTableColumnsCSV GO CREATE FUNCTION dbo.fn_GetTableColumnsCSV( @TableName VARCHAR(MAX) ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @return VARCHAR(MAX) SET @return = (SELECT c.COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS c WHERE c.TABLE_NAME = @TableName GROUP BY c.COLUMN_NAME ORDER BY MAX(c.ORDINAL_POSITION) FOR XML PATH('')) SET @return = REPLACE(@return, '<COLUMN_NAME>', '[') SET @return = RTRIM(REPLACE(@return, '</COLUMN_NAME>', '], ')) SET @return = SUBSTRING(@return, 0, LEN(@return)) RETURN @return END GO ----------------------------------------------------------------------------- -- print dbo.fn_GetTableColumnsSPParams('Users') ----------------------------------------------------------------------------- IF OBJECT_ID('fn_GetTableColumnsSPParams', 'FN') > 0 DROP FUNCTION dbo.fn_GetTableColumnsSPParams GO CREATE FUNCTION dbo.fn_GetTableColumnsSPParams( @TableName VARCHAR(MAX) ) RETURNS VARCHAR(MAX) AS BEGIN -- Construct the SP parameters, omitting the Primary Key (represented by @UniqueIdentifier in the CRUD procs) DECLARE @return VARCHAR(MAX) SET @return = (SELECT c.COLUMN_NAME, c.DATA_TYPE, c.CHARACTER_MAXIMUM_LENGTH FROM INFORMATION_SCHEMA.COLUMNS c LEFT OUTER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE cu ON ( c.TABLE_NAME = cu.TABLE_NAME AND c.COLUMN_NAME = cu.COLUMN_NAME ) LEFT OUTER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc ON ( cu.CONSTRAINT_NAME = tc.CONSTRAINT_NAME ) WHERE c.TABLE_NAME = @TableName AND ISNULL(tc.CONSTRAINT_TYPE, '') <> 'PRIMARY KEY' GROUP BY c.COLUMN_NAME, c.DATA_TYPE, c.CHARACTER_MAXIMUM_LENGTH ORDER BY MAX(c.ORDINAL_POSITION) FOR XML PATH('')) SET @return = REPLACE(@return, '<COLUMN_NAME>', '@') SET @return = RTRIM(REPLACE(@return, '</COLUMN_NAME><DATA_TYPE>', ' ')) SET @return = RTRIM(REPLACE(@return, '</DATA_TYPE><CHARACTER_MAXIMUM_LENGTH>', '(')) SET @return = RTRIM(REPLACE(@return, '</DATA_TYPE>', ', ')) SET @return = RTRIM(REPLACE(@return, '</CHARACTER_MAXIMUM_LENGTH>', '), ')) SET @return = SUBSTRING(@return, 0, LEN(@return)) -- Fix MAX data type SET @return = REPLACE(@return, 'varchar(-1)', 'varchar(MAX)') RETURN @return END GO ----------------------------------------------------------------------------- -- print dbo.fn_GetTableColumnsForUPDATE('Genres') ----------------------------------------------------------------------------- IF OBJECT_ID('fn_GetTableColumnsForUPDATE', 'FN') > 0 DROP FUNCTION dbo.fn_GetTableColumnsForUPDATE GO CREATE FUNCTION dbo.fn_GetTableColumnsForUPDATE( @TableName VARCHAR(MAX) ) RETURNS VARCHAR(MAX) AS BEGIN -- Construct the column part of an UPDATE statement, omitting the Primary Key (represented by @UniqueIdentifier in the CRUD procs) DECLARE @return VARCHAR(MAX) SET @return = (SELECT c.COLUMN_NAME, c.COLUMN_NAME AS PARAMETER FROM INFORMATION_SCHEMA.COLUMNS c LEFT OUTER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE cu ON ( c.TABLE_NAME = cu.TABLE_NAME AND c.COLUMN_NAME = cu.COLUMN_NAME ) LEFT OUTER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc ON ( cu.CONSTRAINT_NAME = tc.CONSTRAINT_NAME ) WHERE c.TABLE_NAME = @TableName AND ISNULL(tc.CONSTRAINT_TYPE, '') <> 'PRIMARY KEY' GROUP BY c.COLUMN_NAME ORDER BY MAX(c.ORDINAL_POSITION) FOR XML PATH('')) SET @return = REPLACE(@return, '<COLUMN_NAME>', '[') SET @return = REPLACE(@return, '</COLUMN_NAME><PARAMETER>', '] = @') SET @return = RTRIM(REPLACE(@return, '</PARAMETER>', ', ')) SET @return = SUBSTRING(@return, 0, LEN(@return)) RETURN @return END GO ----------------------------------------------------------------------------- -- SELECT dbo.fn_GetTableColumnsForINSERTColumns('Users') ----------------------------------------------------------------------------- IF OBJECT_ID('fn_GetTableColumnsForINSERTColumns', 'FN') > 0 DROP FUNCTION dbo.fn_GetTableColumnsForINSERTColumns GO CREATE FUNCTION dbo.fn_GetTableColumnsForINSERTColumns( @TableName VARCHAR(MAX) ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @return VARCHAR(MAX) SET @return = (SELECT c.COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS c LEFT OUTER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE cu ON ( c.TABLE_NAME = cu.TABLE_NAME AND c.COLUMN_NAME = cu.COLUMN_NAME ) LEFT OUTER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc ON ( cu.CONSTRAINT_NAME = tc.CONSTRAINT_NAME ) WHERE c.TABLE_NAME = @TableName AND ISNULL(tc.CONSTRAINT_TYPE, '') <> 'PRIMARY KEY' GROUP BY c.COLUMN_NAME ORDER BY MAX(c.ORDINAL_POSITION) FOR XML PATH('')) SET @return = REPLACE(@return, '<COLUMN_NAME>', '[') SET @return = RTRIM(REPLACE(@return, '</COLUMN_NAME>', '], ')) SET @return = SUBSTRING(@return, 0, LEN(@return)) RETURN @return END GO ----------------------------------------------------------------------------- -- SELECT dbo.fn_GetTableColumnsForINSERTParams('Genres') ----------------------------------------------------------------------------- IF OBJECT_ID('fn_GetTableColumnsForINSERTParams', 'FN') > 0 DROP FUNCTION dbo.fn_GetTableColumnsForINSERTParams GO CREATE FUNCTION dbo.fn_GetTableColumnsForINSERTParams( @TableName VARCHAR(MAX) ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @return VARCHAR(MAX) SET @return = (SELECT c.COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS c LEFT OUTER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE cu ON ( c.TABLE_NAME = cu.TABLE_NAME AND c.COLUMN_NAME = cu.COLUMN_NAME ) LEFT OUTER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc ON ( cu.CONSTRAINT_NAME = tc.CONSTRAINT_NAME ) WHERE c.TABLE_NAME = @TableName AND ISNULL(tc.CONSTRAINT_TYPE, '') <> 'PRIMARY KEY' GROUP BY c.COLUMN_NAME ORDER BY MAX(c.ORDINAL_POSITION) FOR XML PATH('')) SET @return = REPLACE(@return, '<COLUMN_NAME>', '@') SET @return = RTRIM(REPLACE(@return, '</COLUMN_NAME>', ', ')) SET @return = SUBSTRING(@return, 0, LEN(@return)) RETURN @return END GO
What these functions do is generate a list of column names (and data types in the instance of 'fn_GetTableColumnsSPParams') in different formats, to be used to create the various CRUD stored procedures. They are assigned to variables so the functions aren't executed multiple times during the cursor loop.
Any suggestions for a better way to remove / replace the column names / data types would be appreciated - I haven't thought that much about it, but there must be a T-SQL XML guru that knows a better way to manipulate them than I do.
The last point to mention is the @PKField variable. This is only set if a table has a single column PRIMARY KEY - and if it doesn't, the CRUD procedures are not created (and a simple message is output instead). If a table does not have a primary key defined, the script assumes it has no way of uniquely identifying a row. This of course doesn't take in to account UNIQUE constraints, but it should be punishable by law if a PRIMARY KEY hasn't been declared on a table - I have yet to come across a situation where it is acceptable to ignore this fundamental rule of RDBMS design.
On a related matter, many developers state that all tables must always have an identity column that generates a unique key, used to identify an individual record. This is simply wrong; in 4th Normal Form you must be able to uniquely identify a record, but that doesn't mean you have to create additional data by adding a superfluous IDENTITY column. I would say that in storage and look-up tables (e.g. "user", "news" and "news category") an IDENTITY is 110% required - each row needs to be uniquely identifiable and easy to reference via a FOREIGN KEY and in SQL Server, IDENTITY is the most efficient way to do it. Obviously the scale of your application determines the data type; INT being the norm, BIGINT for larger scale databases / tables, and GUID for insanely large systems. However, when it comes to link tables (tables created for the purpose of "many to many" relationships), a multiple-value PRIMARY KEY will usually suffice. This reduces storage by 4 bytes per row (for INT identities) and completely eradicates the possibility of duplicate link table rows.
Again, a lack of RDBMS understanding usually leads to sloppy database design... if a news post can be tagged with multiple categories, a "NewsCategories" table should be created with a minimum of two fields - the NewsID and the CategoryID. These fields represent the PRIMARY KEY for the table, and one should be created in order to force developers to adhere to these data storage rules. I would even forgive the creation of a multi-column UNIQUE constraint instead of a PRIMARY KEY, the only difference being that UNIQUE constraints can contain NULL values, whereas any columns used in a PRIMARY KEY constraint cannot.
Anyway - I hope this comes in handy. I've used a similar approach in setting up my initial tables (the tables are coded individually, but instead of having multiple DROP statements I remove constraints and drop tables in a loop) and also creating default audit triggers on all of my tables.