Saturday, November 24, 2007

Managing SQL Server Extensions (Functions, Types, Etc.)

[This post is part of a series where I'm exploring how to add data mining functionality to the SQL language; this is an extension of my most recent book Data Analysis With SQL and Excel. The first post is available here.]

Up to now, I have discussed adding functions, types, and aggregates into SQL Server. The code is created in C# and loaded into SQL Server using as an assembly. Placing the code into SQL Server has four steps:
  1. Deleting all objects already defined in the assembly (if any).
  2. Deleting the assembly (if present).
  3. Loading a new version of the assembly.
  4. Redefining the objects in the assembly.
For readers who are familiar with the process of compiling and linking code, this process is similar to linking. The references in the assembly have to be "linked" into the database, so the database knows what the references refer to.

I am doing this process manually for two reasons. First, because this is how I originally set up this project for adding data mining functionality into SQL Server (even though Visual Studio does have options for doing this automatically). Second, this approach provides an opportunity to start to understand how SQL Server manages user defined types.

This post discusses how to manage the first of these steps automatically. That is, it describes how to delete all objects in a database referenced by a particular assembly.


A Common Error
The following code drops the user defined function CreateBasicMarginalValueModel():

DROP AGGREGATE CreateBasicMarginalValueModel
GO

This expression is quite simple. However, if it is executed twice, it returns the error:

Msg 3701, Level 11, State 5, Line 1
Cannot drop the aggregate function 'createbasicmarginalvaluemodel', because it does not exist or you do not have permission.

This is inelegant, because it means that we cannot run the same code to drop a function twice in a row. Even if it works the first time, the second time it runs, the same code will return an error. Furthermore, when we see this error, we do not know if the problem is the non-existance of the object or inadequate database permissions.

To fix this, we use the T-SQL IF construct:

IF OBJECT_ID('CreateBasicMarginalValueModel') IS NOT NULL
....DROP AGGREGATE CreateBasicMarginalValueModel
GO

This simply says that if the object exists, then drop the aggregate. However, this is inelegant, because it mentions the name of the aggregate function twice, once in the "if" clause and once when dropping it. In addition, we do not want to have to explicitly mention every object by name, since we may not know which objects in the assembly were actually declared in the database.

Handling Dependencies
Another problem occurs when we try to drop a type. The following statement:

IF (SELECT COUNT(*) FROM sys.types
....WHERE UPPER(name) = 'BASICMARGINALVALUEMODEL') > 0

....DROP TYPE BasicMarginalValueModel
GO


Returns the enigmatic error:

Msg 3732, Level 16, State 1, Line 3
Cannot drop type 'BasicMarginalValueModel' because it is currently in use.

This error does not, unfortunately, tell us who or what is using the type. In this case, it is the set of functions, procedures, aggregates, and other types that use the type as an argument or return value. We have to remove all these objects before we can remove the type.

In general, we need to remove functions, aggregates, and procedures before we remove types. This ensures that the types have no dependencies on them, so they can be removed cleanly from the database.

This problem with dependencies is actually an advantage. It ensures that code loaded into the database all refers to the proper set of definitions. If a function uses a type, we cannot simply replace the type. We need to drop the function, drop the type, and then re-declare the type and function. This ensures that the function refers to the proper code when using the type.

Finding All User-Defined Functions in an Assembly

The first step to removing a certain class of objects, say functions, is to find all them in the database. They are conveniently located in the sys.objects table, so the following query returns all user-defined functions:

SELECT o.name
FROM sys.objects o
WHERE o.type in ('FS', 'FT')

The sys.objects table contains references to many different objects in the database (almost everything except user defined types). The specific type abbreviations 'FS' and 'FT' refer to scalar functions and table functions, respectively.

The only problem with this code fragment is that returns all user defined functions, there might be user defined functions from different assemblies. What we really want are only user defined functions in the "ud" assembly. To find this, we have to use two more reference tables. To get all the functions in "ud", the query looks like:

SELECT o.name
FROM sys.objects o JOIN
.....sys.assembly_modules am
.....ON o.object_id = am.object_id JOIN
.....sys.assemblies a
.....ON am.assembly_id = a.assembly_id
WHERE a.name = 'ud' and
......o.type in ('FS', 'FT')


This finds all user defined functions only in the desired assembly. The code for procedures and aggregates is quite similar. The only difference is that the type in the WHERE clause matches 'PC' and 'AF', respectively.

User defined types are somewhat different. They are stored in the table sys.types, rather than sys.objects. The query to find all of them is similar, requiring looking up assembly information in additional tables:

SELECT t.name
FROM sys.types t JOIN
.....sys.type_assembly_usages tau
.....ON t.user_type_id = tau.user_type_id JOIN
.....sys.assemblies a
.....ON tau.assembly_id = a.assembly_id
WHERE a.name = 'ud'


Although the query is somewhat different, it returns the name of the user defined types in the given assembly.


Deleting All User Defined Functions
Going from a query that returns a list of user-defined functions (or whatever) to actions on those functions (such as dropping them) requires using the T-SQL command language. In particular, we need to define cursors on the query, so we can do something to each row.

Code that uses cursors has the following structure:

DECLARE @name VARCHAR(2000)
DECLARE the_cursor CURSOR FOR
....QUERY

OPEN the_cursor
....FETCH next FROM the_cursor INTO @name
....WHILE @@fetch_status = 0
....BEGIN
........DO ACTION HERE
........FETCH NEXT FROM the_cursor INTO @name
....END

CLOSE the_cursor
DEALLOCATE the_cursor


The first two lines of the code declare two variables. The first is a standard scalar variable, which is used to store each value returned by the query. The second is a cursor, which is used to cycle through the rows. Notice that the cursor variable is not preceded by an at sign.

Most of the remaining code is the framework used to manage the cursor. It is important to handle cursors correctly. A simple mistake -- such as leaving out the FETCH NEXT FROM -- can result in an infinite loop. We do not want that to happen.

Opening the cursor runs the query and the FETCH NEXT statement gets the next value, which is placed in the local variable @name. When there are no more values, the cursor is closed an deallocated.

The full code for dropping all functions is a bit longer, because the query and action portions are filled in:

DECLARE @function_name VARCHAR(2000)
DECLARE function_cursor CURSOR FOR
....SELECT o.name
....FROM sys.objects o JOIN
.........sys.assembly_modules am
.........ON o.object_id = am.object_id JOIN
.........sys.assemblies a
.........ON a.assembly_id = a.assembly_id
....WHERE a.name = 'udf' AND
..........o.type in ('FS', 'FT')

OPEN function_cursor
....FETCH next FROM function_cursor INFO @function_name
....WHILE @@fetch_status = 0
....BEGIN
........EXEC('DROP FUNCTION '+@function_name)
........FETCH NEXT FROM function_cursor INTO @function_name
....END

CLOSE function_cursor
DEALLOCATE function_cursor
PRINT 'DROPPED FUNCTIONS'


The cursor is defined over the query that returns all the functions in the "ud" assembly. For each of these function, the action is to drop the function. The action uses the EXEC() function rather than just the DROP FUNCTION statement. The EXEC() function takes a string as an argument, and executes the string as a T-SQL statement. This makes it possible to incorporate the name of the function into the command.

The Full Code
The code for dropping aggregates, procedures, and types follows the same structure as the code for dropping functions. The only differences are to the query that defines the cursor and the string passed to the EXEC() function (DROP AGGREGATE, DROP PROCEDURE, or DROP TYPE).

The only important aspect to the code is that types need to be dropped last, because of the dependency problem.

This entry does not include the T-SQL code for this example. The next entry discusses naive Bayesian models. The entry after that will include code that has these enhancements.




0 Comments:

Post a Comment

<< Home