HCi Journal of Information Development
By Stuart Lecky
When fire broke out in the computer room at the
headquarters of Britain’s Open University in Milton Keynes in 1987 it
destroyed the VAX system used by the faculty to store their research work, and
all of the back-up tapes that were, some would say foolishly, stored in the same
room. The media reported that “years of research work” had been lost.
Imagine that this scenario occurred for your business and
it was your company’s data that was lost. Would it be a problem if you
were to lose the last two months records of your financial transactions?
Of course it would. But that’s not going to happen, is it, because
you’ve got adequate, documented, systems in place to cater for such
eventualities, haven’t you? Have you?
Computer systems are vital to the running of all businesses
and organisations these days. From the smallest one-person operation to
the largest multi-national corporations, the data held on their computer systems
is vital to the efficient and profitable running of those organisations.
These can be any kind of organisation: educational institutions, businesses,
government departments, charities, hospitals, etc. Management of those
systems, including maintenance of comprehensive systems documentation and an
accurate and up-to-date disaster recovery plan, is one of the key elements in
ensuring that “mission critical” systems continue to function efficiently
regardless of what disasters may occur.
One major risk factor in particular – systems management
documentation - is often neglected because it can be costly to create and
maintain. It’s only when something goes wrong that people begin to ask
questions like:
“Where are the procedures for doing this?”
“Why did this happen, weren’t they following the
procedure?”
“How can I change this system without any
documentation on what’s in it?”
Systems management documentation covers a multitude of
different areas. Everything about any computer system, and the people who
use it, can and should be documented so that, when things go wrong (and
something always goes wrong), remedial action can be taken to minimise the
consequences.
System management documentation covers:
system configuration
system administration procedures
data dictionaries
backup procedures
disaster recovery plan
help desk knowledge bases
document management systems
And the consequences of neglecting these include:
System configuration includes the details how all of the
systems are set-up. This includes both software and hardware.
Software includes operating systems, system tools, software packages and
software you have developed in-house. Hardware includes not only PCs and
servers but also any peripheral device connected to them. If the
configuration of any of these things is not standard, and that is how they are
normally used, then if one of them should fail and need to be set up again you
will have difficulty doing this if the configuration information is not
documented and that documentation is not kept up to date..
System administration procedures can include details of how
systems run day to day, how things like new user accounts are to be set up, how
occasional situations (eg disks becoming full) are dealt with - anything to do
with the day-to-day running of your IT systems. If these are not
documented, or not kept up to date, it’s not only dealing with problems that
arise that would be problematic, but whenever an intermittent problem turns it
has to be solved by re-inventing the wheel again and again. Good system
administration documentation saves time and therefore money. You might say
that experienced system managers and operators know this stuff and don’t need
the documentation, but what if they are off sick, or on holiday, or (god forbid)
encounter the underside of the proverbial bus?
Data dictionaries detail the way in which your databases
and associated applications work: what is in a table, what is in a field, how is
this data used by the system, what do programs do with the data? Without
adequate and up to date data dictionary information, the maintenance and
enhancement of those systems takes much longer, and is less likely to be
effective. Again, time costs money.
Backup procedures include schedules of tapes to be used,
where they go, what goes on them, what to do with them when the backup is
finished, how to check that the backup worked, how to restore things from
backup, where tapes are stored (on-site and off), how to get tapes back from
offsite storage. Most IT departments have their backup procedures
documented, but things change and if the documentation is not kept up to date,
problems can arise. IT department staff come and go and what happens in
the middle of the night when the relatively new system operator is on his or her
own and something that they haven’t run across before goes wrong and there’s
no documentation to help them? They can’t do their job and those vital
backups don’t get done.
Disaster recovery plans deal with every possible
eventuality that may befall a computer system, from minor problems to their
complete destruction and the need to restore the entire system, and how it is
done. Most large organisations have long since recognised the need for a
disaster recovery plan and have implemented one. Keeping it up to date is
essential. The details of new hardware and software and their
configurations (see above) must be included whenever they come along.
Hopefully a disaster recovery plan will never be needed, but if it is, and it
hasn’t been kept up to date, you may find that you restore your system as it
was three years ago, instead of last week.
Help desk knowledge bases are immensely useful, for those
who have them, for storing information about how problems have been dealt with
in the past and how to deal with them if they arise again. Keeping these
up to date is vital to ensure that mission-critical systems keep running.
Document management systems are often used as the
repositories for all of the information upon which a business depends to keep it
running. Ensuring that everyone has access to them, recognises their
importance and keeps them up to date (though this needs to be controlled) is an
essential part of ensuring the continued efficient running of any organisation.
While the hardware and software that an organisation uses
is important, it is useless without the people who use it and the people who
ensure that it runs smoothly and keeps on running. Implementing, and most
importantly having continuing commitment to a quality management system such as
ISO9000:2000 is one means to ensure that those people recognise the need to
maintain things like system management documentation and to continually review
and improve it.
May 2003.
This article may be reproduced only with the permission of HCi . Copyright HCi Consulting, 2001-3.
HCi information development - www.hci.com.au
(technical writing, quality management, knowledge management)