This is a placeholder post, I’ll update it as time goes by.
Currently I’m building a presentation to show to the IT community how SQL Server can be used to build Document Management systems.
I have built (me and my team) many applications on SQL Server and several for DM. So i need to structure my experience a bit and give back to the community while researching what anyone else did similar and what the new version of SQL 2008 brings to the table.
If you whish to share your thought, feel free..
Of course I could not update the post as I researched…. but here are the outcomes:
Main topics of interest when trying to build a DMS solution on top of SQL 2008:
- Integrated Fulltext Search
- FILESTREAM data
- Remote Blob Store (RBS)
Other significant SQL Server 2008 functionalities:
- Backup compression
- Data compression
- Data encryption
- New DATE/TIME field (UTC)
- Improved XML processing (with Lax validation)
- Improved reporting services (who doesn’t need reports ? 🙂 )
- last, but not least: Sparse Columns
- more here
Full Text search
Now being integrated (and rewritten), the FTS engine provides more functions to the user and developer. The performance is kept somehow like in 2005 but some areas show significant improvements.
Fot the brave enough to use FTS in 2005 and previous versions, the migration options need to be considered (3 in total: rebuild, import, reset). Rebuild is needed especially if you want to take advantage of the new stemming and word-breaking rules and languages.
Nice things: stop words are now in the database. So they are accesible, programmable and transportable. They are also not only language dependent but you can also define other “set building” rules.
The thesaurus is still in XML but now is lazy cached and can be updated without restarting the server (yey!). Note that it behaves a little different then in 2005. So you need to take care when migrating your XML files.
Cool stuff: troubleshooting functions! Something always needed to look into the FT “magic”. baing able to see what keywords were indexed for a particular document / collection is very nice. To see it from SQL is even nicer. To be able to see how a query is parsed and transformed is great. I’m also happy since I can see how the stemmer and thesaurus work for a particular case.
Some advice: take care if you have many keywords (x 10 million). Use fast disks, IO is very important. Use 64 bits: 3 GB of RAM is usually not enough. Don’t confuse FREETEXT and CONTAINS, use them wisely.
BLOB related news
First of all, please don’t use IMAGE and TEXT/NTEXT fields anymore. They will no longer be supported / encouraged by Microsoft.
You can use VARBINARY(MAX), but you hit the 2 GB limit with it. Use the FILESTREAM modifier (new in 2008) to kill that limit.
FILESTREAM makes content to be stored in the NTFS drive. Nice. And tricky at the same time. Good for streaming, not so good for frequent updates. Good for big files, not so good for many files (especially when having short backup windows).
Nice: works from TSQL as well as Win32. Not so nice: behaves a little differently in TSQL vs. Win32 (transaction isolation level, performance – not necessarly better in Win32).
So, you really have to understand it before using. You can get in some not so obvious pitfalls. But is a good thing.
Remote Blob Store – RBS
Who does not know what CAS (Content Addressable Storage) is probably does not need it.
Is not another column type, it’s an API to be implemented by CAS vendors mainly and used by applications.
Somehow, it’s similar with EBS on SharePoint. In fact, there is a competition between the two (some nice cover is here), and I also feel that RBS is the way to go (regardless of the current limitation about accesing the context of the Blob).
EMC already has a RBS connector for Centera. Nice.
So, 2008 brings a lot of nice things on the table. Let me know when you use them.