HCi Journal of Information Development


Print friendly version

Neglecting systems documentation courts disaster

By Stuart Lecky

When fire broke out in the computer room at the headquarters of Britain’s Open University in Milton Keynes in 1987 it destroyed the VAX system used by the faculty to store their research work, and all of the back-up tapes that were, some would say foolishly, stored in the same room.  The media reported that “years of research work” had been lost. 

Imagine that this scenario occurred for your business and it was your company’s data that was lost.  Would it be a problem if you were to lose the last two months records of your financial transactions?  Of course it would.  But that’s not going to happen, is it, because you’ve got adequate, documented, systems in place to cater for such eventualities, haven’t you?  Have you?

Computer systems are vital to the running of all businesses and organisations these days.  From the smallest one-person operation to the largest multi-national corporations, the data held on their computer systems is vital to the efficient and profitable running of those organisations.  These can be any kind of organisation: educational institutions, businesses, government departments, charities, hospitals, etc.  Management of those systems, including maintenance of comprehensive systems documentation and an accurate and up-to-date disaster recovery plan, is one of the key elements in ensuring that “mission critical” systems continue to function efficiently regardless of what disasters may occur.

One major risk factor in particular – systems management documentation - is often neglected because it can be costly to create and maintain.  It’s only when something goes wrong that people begin to ask questions like:

  • “Where are the procedures for doing this?”

  • “Why did this happen, weren’t they following the procedure?”

  • “How can I change this system without any documentation on what’s in it?”

Systems management documentation covers a multitude of different areas.  Everything about any computer system, and the people who use it, can and should be documented so that, when things go wrong (and something always goes wrong), remedial action can be taken to minimise the consequences.

System management documentation covers:

  • system configuration

  • system administration procedures

  • data dictionaries

  • backup procedures

  • disaster recovery plan

  • help desk knowledge bases

  • document management systems

And the consequences of neglecting these include:

System configuration includes the details how all of the systems are set-up.  This includes both software and hardware.  Software includes operating systems, system tools, software packages and software you have developed in-house.  Hardware includes not only PCs and servers but also any peripheral device connected to them.  If the configuration of any of these things is not standard, and that is how they are normally used, then if one of them should fail and need to be set up again you will have difficulty doing this if the configuration information is not documented and that documentation is not kept up to date..

System administration procedures can include details of how systems run day to day, how things like new user accounts are to be set up, how occasional situations (eg disks becoming full) are dealt with - anything to do with the day-to-day running of your IT systems.  If these are not documented, or not kept up to date, it’s not only dealing with problems that arise that would be problematic, but whenever an intermittent problem turns it has to be solved by re-inventing the wheel again and again.  Good system administration documentation saves time and therefore money.  You might say that experienced system managers and operators know this stuff and don’t need the documentation, but what if they are off sick, or on holiday, or (god forbid) encounter the underside of the proverbial bus?

Data dictionaries detail the way in which your databases and associated applications work: what is in a table, what is in a field, how is this data used by the system, what do programs do with the data?  Without adequate and up to date data dictionary information, the maintenance and enhancement of those systems takes much longer, and is less likely to be effective.  Again, time costs money.

Backup procedures include schedules of tapes to be used, where they go, what goes on them, what to do with them when the backup is finished, how to check that the backup worked, how to restore things from backup, where tapes are stored (on-site and off), how to get tapes back from offsite storage.  Most IT departments have their backup procedures documented, but things change and if the documentation is not kept up to date, problems can arise.  IT department staff come and go and what happens in the middle of the night when the relatively new system operator is on his or her own and something that they haven’t run across before goes wrong and there’s no documentation to help them?  They can’t do their job and those vital backups don’t get done.

Disaster recovery plans deal with every possible eventuality that may befall a computer system, from minor problems to their complete destruction and the need to restore the entire system, and how it is done.  Most large organisations have long since recognised the need for a disaster recovery plan and have implemented one.  Keeping it up to date is essential.  The details of new hardware and software and their configurations (see above) must be included whenever they come along.  Hopefully a disaster recovery plan will never be needed, but if it is, and it hasn’t been kept up to date, you may find that you restore your system as it was three years ago, instead of last week.

Help desk knowledge bases are immensely useful, for those who have them, for storing information about how problems have been dealt with in the past and how to deal with them if they arise again.  Keeping these up to date is vital to ensure that mission-critical systems keep running.

Document management systems are often used as the repositories for all of the information upon which a business depends to keep it running.  Ensuring that everyone has access to them, recognises their importance and keeps them up to date (though this needs to be controlled) is an essential part of ensuring the continued efficient running of any organisation.

While the hardware and software that an organisation uses is important, it is useless without the people who use it and the people who ensure that it runs smoothly and keeps on running.  Implementing, and most importantly having continuing commitment to a quality management system such as ISO9000:2000 is one means to ensure that those people recognise the need to maintain things like system management documentation and to continually review and improve it.

The people who run IT departments are always busy.  There are always more immediate and seemingly more important things to do, like developing new projects.  But when those new projects go in, there is the need for the accompanying documentation to ensure that those systems run smoothly and keep on running and that if problems occur, something can be done about them swiftly.  People leave organisations.  Problems occur infrequently.  Some things are only done once a year and can be forgotten.  Systems change.  Making sure that those systems are always clearly and accurately documented is vital in ensuring that they are managed properly so that the entire organisation benefits from their efficient running.

May 2003.

This article may be reproduced only with the permission of HCi (email HCi ). Copyright HCi, 2001-3.

Back to Journal Second Quarter 2003

More articles from the HCi Journal


HCi has formed a new consulting arm called Realisation.  Click here to visit the Realisation site for further information.