Friday, September 28, 2007

.NET Architecture for Extending SQL Server

I have chosen the .NET architecture for this exploration primarily because it is convenient -- I have ready access to it. It is worth pointing out that many of the ideas are independent of .NET and should be implementable in any database that has comparable facilities for extending functionality.

This post is divided into three sections: the good things about .NET; the bad things about .NET, and a brief overview of .NET archicture for my purposes.

Advantages of .NET
.NET is the Microsoft architecture for developing multi-faceted applications. This is what makes it possible to extend the database to include data mining and statistics functionality. .NET includes SQL Server, components of the operating system, and various programming languages (C# is the primary one, although it also supports Java and C++).

The biggest strength of .NET is that it supports much of what I want to do. It enables extending SQL Server with my own data types, functions, procedures, and aggregations. In addition, SQL Server is a very reasonable database for approaching analysis of moderately large sized data sets.

One big advantage of .NET is that it is single source. That is, all components are developed by Microsoft, which provides evidence that they can work together. This is a big advantage, because I want to think about databases and functionality, rather than APIs (application programming interfaces), version compatibility, installing multiple applications, and so on.

A second advantage of .NET is that it is readily available with the SQL Server database. It has a standard database platform that it interacts with.

The C# language provided with .NET is a powerful, object-oriented language. I view it as a dialect of C++, probably because I've forgotten many of the nuances of C++. The syntax is similar, although .NET provides many more built-in classes.

A fourth advantage is that .NET makes it possible to distribute both the source code and the compiled code (as dynamic load libraries better known as DLLs) regardless of the platform. Of course, .NET only runs on Microsoft operating systems. For true operating system independence, I would need to use a platform such as Java.

Less important but still notable is the fact that it has a consistent documentation interface (both on the web and with the product) and has many books written about it. For my purposes, I have extensively used C# 2005 (published by Wrox and available here) as well as the online and product documentation.

Disadvantages of .NET
.NET is sufficiently powerful for extending the functionality of a database to include data mining and statistical functionality. However, it does have some challenges.

Of course, this approach only works with SQL Server on Microsoft platforms. It is not a general purpose solution, although I do think the ideas should be readily extendible to other platforms.

In addition, there are multiple technical issues that I need to work around:

  • The maximum size of a user defined data type is 8000 bytes.
  • SQL Server does not allow overloading of function names. (That is, I cannot define two functions with the same name where one takes an integer argument and one takes a character argument; I have to give the two functions different names.)
  • SQL Server does not allow updating or modifying the database within a user defined function.
  • SQL Server does not allow aggregations to take more than one argument.
  • SQL Server does not allow functions or aggregations to take variable numbers of arguments.
  • C# has quite a few compiler directives which are needed for building libraries for SQL Server.
  • In order to be loaded as "safe" code into the database, C# is not allowed to use unions, pointers, modify the database, or do file I/O.
  • Loading assemblies (DLLs) into SQL Server is a multi-part process. The assembly is loaded and then individual elements have to be defined (although this is automated using Microsoft Visual Studio).
  • C# is not optimized for processing performance. That said, the performance seems quite reasonable to me.
  • Function calls in SQL Server seem to require a three part name . . . ... This is a minor irritation.
In many cases, there are ways around these limitations.

Overview of .NET
From my perspective, there are four components to the .NET framework:

  • SQL Server;

  • SQL Server Management Studio (for accessing the database);

  • C#; and,

  • Visual Studio (for writing C# code).
Ironically, though, the most important part is the hidden fifth component that ties these together, the .NET framework. The following picture captures my understanding of the relationships between these components. In actual fact, both SQL Server Management Studio and Visual Studio talk through the .NET interface. However, I am concerned with the interface between the database and C# for adding new functionality into the database.


Note that there are two ways that C# code can interact with the database. A C# application has full abilities to SELECT, UPDATE, and INSERT items into the database, as well as changing database parameters, creating new databases, and so on (so long as the user has the appropriate rights). Much database programming is done through such applications (which can also be written in other languagates). That is not what is happening here.

The second method is that C# can extend the functionality of the database through the creation of user defined types, functions, procedures, and aggregations. This is what I am interested in. Alas, we are not able to change the syntax of SQL. However, these elements are great building blocks for what I want to do.


Extending the functionality is simply a matter of doing the following:

  1. Creating a project in Microsoft Visual Studio to produce a DLL.

  2. Load the DLL into SQL Server as an assembly.

  3. Define the elements in the assembly.

(These last two steps can be automated when your version of Visual Studio supports SQL Server assemblies.)

The project in Visual Studio is simply a set of C# classes, with a few well placed compiler directives. Steps (2) and (3) can be done automatically, but I have a script that does the work.

The next post assumes that the Visual Studio and SQL Server environment is available and describes how to add some useful functions.

0 Comments:

Post a Comment

<< Home