Archive for February, 2008

February
21st 2008
How to Generate Test Data in Bulk

Posted under Code & SQL

Performance tuning in SQL Server is unendingly complex. At the basic end of the scale are indexes, query plans, disc-based I/O distribution, and the like; more complex issue demand a lot more investigation and care. Because the query optimizer’s job is most heavily centered in disc access - unlike the CLR optimizer, which is best at putting functions inline, and unwinding loops - performance and scalability code can only be realistically tested against data. Lots and lots of data.

Remote joins are a situation SQL Server could handle better. Generally, an entire table is copied from one server to the other, which could potentially flood tempdb, then the join is performed locally. You can use a query hint to force which table is sent and which server the join is delegated to. A better solution will come in a later post, although this is just one situation that requires gigabytes of data to properly test.

By custom, this test data often comes from Cartesian joins. An example is:

Select * From sysObjects, sysComments, sysColumns

My test server doesn’t have enough space to execute this code. If sysComments is removed and the query is executed in a new, empty database, 20,000 rows will be returned ( using SQL Server 2005, Enterprise Edition ). The trouble is, the rows look very much alike, can easily be mistaken for one another, and make poor test data. Let’s try a better approach. Continue Reading »

No Comments »