Wednesday, 30 November 2011

How difficult is it to design a database?

Well actually, it is not that difficult at all. There is a straightforward set of procedures which,
combined with a bit of common sense and experience, almost form a foolproof recipe.


1 Identify the entities of your system (products, customers, orders etc). Each of these forms a table in your database.
2 Identify the attributes of each entity, and the type of data each represents - Name, Address,
Description, Price etc. These form the columns and column types in the database.
3 Identify any existing unique identifiers for individual instances of entities, eg customer numbers, products codes etc. These form the primary key of the table. If there are't any, create them.
4 Work out how the entities are related, eg one customer can place many orders, one order can contain many products. These form the relationships in the database, implemented as foreign key constraints.

If you follow this recipe properly then your database will automatically be pretty well normalised at least to 3NF or BCNF, or at least any deviations from this will be minor and easily rectified. The design might not be great but will be a lot better than the majority of databases out there.

Obviously I have massively simplified the process that thousand-page books have been written on, but, in essence, that is it. But despite that many very good software developers who can write superb code end up cobbling together appalingly bad database schemas. The most common problems are

1 Repeating columns ("Phone1", "Phone2"). This causes major problems in querying and updating, and what if somebody has 3 phone numbers?
2 The wrong column types, such as as char type for a number. You then have to scatter your SQL and application code with conversions back and forth, and run the risk of somebody entering Fred as a price.

3 No primary keys. No primary keys = no foreign keys. No foreign keys = no relationships. No

relationships = no truly relational database, no data integrity, no reliable and efficient queries.
4 No unique constraints. Some widget or other gets entered twice, chaos and bafflement all around.

5 No domain tables. Somebody enters all their American transaction with currency USD. Somebody else tends to prefer US$. A third person comes along to add up all the USD transactions and only gets half of them. You don't need to be an accountant to see the disadvantages of this!


The principles of relational databases and their design have been around since the early 1970s. That's almost prehistoric in computer terms, but they have stood the test of time and I see no reason why they shouldn't last another 40 years. And that is because they work, when properly implemented. They give us data integrity and fast, efficient and straightforward inserts, updates, deletes and queries, as well as forming a solid foundation for creating denormalised data warehouses when needed.

But far too often they are not properly implemented, not because it is too hard or considered unimportant, but because the expert C#, Java or PHP developer is hardly aware of their existence, and nothing in their training or experience has led them to believe that database design is anything more than something you just get over and done with quickly so you can get on with the real task of writing your application.

Of course, in an ideal world the software architect or project manager would appoint a database designer to do the job properly before a single line of application code is written. But this is the real world and application developers who really only know how to develop applications are expected to do everything including databases design.

Friday, 18 November 2011

Why should you Blog?

If you happen to be an archeologist you probably like to classify human history in terms of what the stuff you dig up is made of - stone age, bronze age, iron age, Coke can age etc. This is all very well but only gives an extremely limited and even misleading picture of the development of mankind - for example we use more iron now (in the form of steel) than they ever did in the Iron Age.

I think a much more accurate and profound way of categorising the epochs of human history would be to use the technologies used to store and pass on thoughts, ideas and facts. They could be something like:

The Gestures and Grunts Age
The Speech Age
The Handwritten Text Age
The Printed Text Age
The Internet Age

What defines us as humans isn't what we do, but how we teach and learn, each generation benefiting from the accumulated knowledge of our ancestors. Each major advance in the technologies used to do this represents an increase of one, two, perhaps three orders of magnitude in our capabilities.

A central thread of philosophy since Classical times has been, how do we know what we know, how do we learn facts, how do we find out new information? The Ancient Greeks believed we were born knowing everything, so to gain knowledge and wisdom one just had to sit back and think. (Whether they really believed this, or just used it as an excuse to sit around doing nothing I couldn't say).

The more Enlightened philosophers such as John Locke believed the mind was a blank slate, formed and shaped by each person's unique combinations of experiences. This is obviously much closer to the truth, but the mechanisms involved in turning the simple firing of synapses into a mind capable of original thought remain a mystery. One thing is, I believe, clear though - pour enough knowledge and experience into a human brain and it will chop them up, mix them, and stick them back together in ways which, in just a few thousand years, has transformed us from hunter-gatherers with a lifestyle little different from mere beasts, to members of the incredibly complex society of today.

A couple of decades ago, only a small elite of writers and journalists enjoyed the privilege of having their knowledge, thoughts and ideas added to the melting pot of human knowledge. Now everyone can - it's not a privilege any more. But should you? Is it worth it? There is so much text sloshing round in blogs and elsewhere that most of it is hardly read. A few people might read a given post in full, more might skim it, the vast majority will remain ignorant of its existence. But it still serves a purpose if just a single person reads or even skims it and picks up one or two thoughts to add to their own personal mindset. Even if nobody reads it, maybe it benefits the writer by helping to clarify the jumble of thoughts in their own mind.

But what of the future, the Sixth Age? Well that, IMHO, will be when computers go beyond just storing text and delivering it to people to read, and start reading it themselves. And not just reading it, but chopping it up, mixing it, and recombining it into something original just as our brains do.

I am not going to get into futurology and try to predict the consequences of all that, or whether some HAL-like supercomputer will take over the world. I'll just say that what you think, believe, imagine and then type will in some small way form part of the inherited wisdom of mankind.

So that is why you should blog.