Are programming skills fading?

For several years i thought i noticed that the programming skills of the majority of ones around me was fading.

I talked with some college professors, with other managers/IT professionals and they confirmed the same. Then I thought: “Maybe I’m (we’re) wrong, and looking at it too influenced by my generation (i really believe we could rocked mountains).” I pushed the idea to the back of my head.

Recently a news on shlashdot caught my attention. It basically points to an article written by 2 professors and discusses exactly what I was experiencing.

I’ll not go into detail since the Slashdot post has a lot of comments and the article is very good it itself. No particular value i can add there.

One thing I can say: I am in not in US. I am in Europe. I am in a country which is quite proud of its programming professionals and i have a lot of success stories on that side. We tend to ‘export’ a lot of engineers and almost half (really!) of my college generation is now working for Microsoft, Google and others in top positions. And the problem happens also here. So it must be universal.

What happened? Well, since I’m a C++/Assembler originating guy i think Java is one problem :). Or the similar programming environments. It makes things too easy. Generally, programmers will forget / not learn the fundamentals of computer programming (yes, assembler, memory, pointers, cpu time, pipeline, IPC…) and they will natively program poorly. Yes, there are excellent Java developers out there. I know some. I’m not talking about them (praise to you guys and galz)! But I’m talking about the majority (i might start a storm coming to me right now).

I keep wondering where are the programs which ran in 48K of memory? Or even 640 K. I remember playing 3D games on a Spectrum computer. Why do i need right now 4 GB of memory for a full text index server? I remember building a similar one to run in less than 8 MB.

Why it takes ages for a Java portal to startup? On a dual core server? With Gigs of memory to spare?

Why do i see NullPointerException so many times? Who the f… is writing that code? (in my opinion NullPointerException is clearly a developer error and should not appear to the user. Ever!)

I’ll stop ranting right now. I simply want to know if you feel the same thing.

Are the programming skills fading? Why?

CM Architecture – yet another search engine?

Is a Content Management System basically yet another search engine?

From what i learned until now.. might seem so. I consider that most of the requests a CM system needs to solve are requests to find and retrieve. Updates and deletions are usually addressing individual items (now i tend to generalize, please forgive me).

What was extremely funny is that last year I was at a technical event of one major ECM provider and something was said out in the open by a quite highly ranked person: “We never understood until now how important search is” (qoute is approximative).  Despite the tragic situation of having to say this after building “top” ECM systems for many many years… It really showed me that there are others which see it the same way as I do.

Actually this post was triggered also by a comment ldallas had on my previous one. Without me saying nothing (if anything) related to how I see that a CM system primary function is to aswer search requests, he saw from my approach probably that I might try to reinvent the wheel.

This is not far from the truth. I’m indeed thinking of a CM architecture in which search is almost the most important function of all. What makes it different from common search engines is that in a CM environment you need to take care of complex security rules.

It is not enough to build a perfect “Google”-like engine. One needs to quickly filter the results based on user permissions. And when user permissions are based on multiple hierarchies of groups and roles this becomes tricky.

This is why i believe that the search engine (including fulltext) needs to be a core part of the CM architecture. This is the only way it can provide quick and adequate responses.

In a system I work with (many of you reading this will recognize it) the search request is forwarded to an external search engine which returns chunks of resultlists (eg. 200 at a time). Then, these are stored in a temporary table in the RDBMS and joined with the security information to find out if the user actually has the rights to any of them! Plain ugly! Imagine if the search matches 1 million records by i have only the rights to see one of them.

What I’m building (i bet i’m not the only one) is a system which embeds the search function which knows natively to handle the security.  The security model is the common one: item level, based on user/groups with hierarchical permissions (read<…<delete). If any of you knows of a similar system and can provide some more technical details, I’ll appreciate.

Last but not least, the search functionality should know its content business purpose. I’m not sure right now if i should make it as a core function or is closer to the system front-end or even application specific. What i know is that it would be a real pleasure to have a CM system which will rank / group my results based on their business role (eg group contracts and related documents toghether in a result, then logfile, then SOPs…) not only on word matching rankings. This looks a little like dynamic taxonomies and result clustering… But not really. I think this topic needs another dedicated later post, anyway.

As a conclusion: A Content Management system is likely to a Search Engine in the same way is likely to a Database Management System: can be done like it but it deserves a specific implementation in order to do it right.

CM Arhitecture – content storage

One of the ‘strange’ ideas i have in my research is to try and completely remove a RDBMS from the equation.

Sure, having a RDBMS back-end brings along a lot of advantages and speeds up the “time-to-market”. Since I’m not building a CM system to go on sale by Christmas, i have plenty of time to experiment. So, why not take a different road?

My approach is also based on an old idea that a CM system is a data management system in its own, with its own specific requirements. Sure, it is similar with the existing DBMSs (notice i removed the R from the acronym) but that’s normal, and it’s also an extra argument on NOT to use a prebuilt system but be it instead.

So, i long thought on how to model and implement such a core system. My work started in late 90’s with testing and benchmarking some RDBMS’s. I used the newly created (back then) TPC tests (oohh, memories…). Also, Winsconsin and similar older benchmarks. This gave me a glimpse of what performance means and what can be expected when you try to analyze the impact design has on it.

To end the digression, my conclusion was that DBMS systems (mainly main-stream ones) are simply not built to handle “content”. They are good at handling “data”, as “pure” as possible. Throw some high transaction and concurrency in the soup, and here is your Oracle / MSSQL / DB2 /Postreges / MySQL …. whatever.

Content, on the other hand.. is special. Is small (think .ini files 😉 ).. Is big (think imaging stuff).. Is huge (think movies). At the same time. Also, it has versions… renditions… annotations…

Of course, this can be modeled by using a normal database but it just doesn’t seem right. I would like to see all of those implemented natively as core functions. Imagine having versions and renditions for a data row in a rdbms table.

My idea is to give it another shot and rethink the storage concept. And build on that thought.

So, lets have content (which means for me also metadata) stored as a unitary piece of data. Let’s say in a compound file on the filesystem. Self contained, self sufficient. Maybe the versions / renditions can be stored in parallel actual files since they may need to reside in other filesystems thn the original.

This has a nice advantage i am quite fond of: if the compound file structure is openly described, then a tool to process it can be easily built at any time, in any technology. So, if I archive that piece of content on a tape and throw it away for 20 years… When i go back, i don’t care if my original software is lost / can’t work. I simply build another one. (sidenote: Frankly, how many of you really believe that a records management software will not change from the grounds up until such records are due for disposal? Content won’t change.)

Ok, what about processing this stuff ? I’m thinking of building a system which works on top of this storage and builds up an index of all things. How does it do it? Well… that’s the secret recipe. Until either i publish my work or I get so bored i will discuss it here. Or somebody else comes with a smarter idea.

So, that’s one of my PhD thesis thoughts. Feel free to trash it.. I’m only thinking on it for the last 7 years. Seriously, any comment is highly appreciated and i’ll share my thoughts and results openly.