CM Architecture – How to index

While building my CM engine, I take a deep breath and plunged into the still implementation empty area of “a new object is created, what to do with it?”.

The reason is that my CM is built like this: when a client application creates a persistent object,  it is quickly stored to disk (well.. “storage”) in a portable and self-consistent manner. After making sure it’s there for the keeping, a task is added to a background queue for “indexing” – aka inserting the new information to the indexing system so that the object would be found in searches.

The architecture allows for a virtually unlimited types of index providers (eg. hashes, btrees, blingy-blingy, whateva’). So i was now at the task to implement at least some default index providers, otherwise my content was only nicely stored and retrievable by ID.

Sleeves up… found some nice bTree variants discussed on the web, added my own some spice for multi threading optimization  .. and here i was diving in design (and i admit, also some coding – let’s call it “agile” approach). After index persistence was implemented and disk cache being considered.. i was having my hands full. It worked, and had reasonable performance. Not as stable as i would liked it, but.. come on.. nothing is bug free on first release.

What to compare with? I feel is not fair to dive right into a head-to-head comparison  with Documentum/SharePoint/CM/FileNet. Soo…

My approach is to use as much of the memory i can get my hands on – which sounds like TimesTen. Also, i address each metadata info individually, so is something like a column oriented database.

Thinking TimesTen is not a poorly  written DBMS (this is highly non-scientific approach, but i know Oracle usually acquires good tech).. I would like to give it a spin.

That being said, probably I’ll try to put TimesTen to the task to act as a column oriented storage for my metadata.

Let’s see what happens.  I’ll start with several millions of objects. And on my laptop.

Anybody want to bet how fast will ingest 1mil new objects with an average of 3 metadata (yes, i know is small)?

Hw config: 2 GB RAM, Core2Duo 2GHz, lame hdd

Small disclaimed: These test results (which I’ll probably publish in part) are not to be considered as a objective comparison of two systems but as an attempt to see how they perform in very particular situations which may not even be close to the real world situations.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s