Is a Content Management System basically yet another search engine?
From what i learned until now.. might seem so. I consider that most of the requests a CM system needs to solve are requests to find and retrieve. Updates and deletions are usually addressing individual items (now i tend to generalize, please forgive me).
What was extremely funny is that last year I was at a technical event of one major ECM provider and something was said out in the open by a quite highly ranked person: “We never understood until now how important search is” (qoute is approximative). Despite the tragic situation of having to say this after building “top” ECM systems for many many years… It really showed me that there are others which see it the same way as I do.
Actually this post was triggered also by a comment ldallas had on my previous one. Without me saying nothing (if anything) related to how I see that a CM system primary function is to aswer search requests, he saw from my approach probably that I might try to reinvent the wheel.
This is not far from the truth. I’m indeed thinking of a CM architecture in which search is almost the most important function of all. What makes it different from common search engines is that in a CM environment you need to take care of complex security rules.
It is not enough to build a perfect “Google”-like engine. One needs to quickly filter the results based on user permissions. And when user permissions are based on multiple hierarchies of groups and roles this becomes tricky.
This is why i believe that the search engine (including fulltext) needs to be a core part of the CM architecture. This is the only way it can provide quick and adequate responses.
In a system I work with (many of you reading this will recognize it) the search request is forwarded to an external search engine which returns chunks of resultlists (eg. 200 at a time). Then, these are stored in a temporary table in the RDBMS and joined with the security information to find out if the user actually has the rights to any of them! Plain ugly! Imagine if the search matches 1 million records by i have only the rights to see one of them.
What I’m building (i bet i’m not the only one) is a system which embeds the search function which knows natively to handle the security. The security model is the common one: item level, based on user/groups with hierarchical permissions (read<…<delete). If any of you knows of a similar system and can provide some more technical details, I’ll appreciate.
Last but not least, the search functionality should know its content business purpose. I’m not sure right now if i should make it as a core function or is closer to the system front-end or even application specific. What i know is that it would be a real pleasure to have a CM system which will rank / group my results based on their business role (eg group contracts and related documents toghether in a result, then logfile, then SOPs…) not only on word matching rankings. This looks a little like dynamic taxonomies and result clustering… But not really. I think this topic needs another dedicated later post, anyway.
As a conclusion: A Content Management system is likely to a Search Engine in the same way is likely to a Database Management System: can be done like it but it deserves a specific implementation in order to do it right.