| HCi
Journal of Information Development
|
|
Neglecting systems documentation courts disasterBy Stuart Lecky When fire broke out in the computer room at the
headquarters of Britain’s Open University in Milton Keynes in 1987 it
destroyed the VAX system used by the faculty to store their research work,
and all of the back-up tapes that were, some would say foolishly, stored
in the same room. The media reported that “years of research
work” had been lost. Imagine that this scenario occurred for your business
and it was your company’s data that was lost. Would it be a
problem if you were to lose the last two months records of your financial
transactions? Of course it would. But that’s not going to
happen, is it, because you’ve got adequate, documented, systems in place
to cater for such eventualities, haven’t you? Have you? Computer systems are vital to the running of all
businesses and organisations these days. From the smallest
one-person operation to the largest multi-national corporations, the data
held on their computer systems is vital to the efficient and profitable
running of those organisations. These can be any kind of
organisation: educational institutions, businesses, government
departments, charities, hospitals, etc. Management of those systems,
including maintenance of comprehensive systems documentation and an
accurate and up-to-date disaster recovery plan, is one of the key elements
in ensuring that “mission critical” systems continue to function
efficiently regardless of what disasters may occur. One major risk factor in particular – systems
management documentation - is often neglected because it can be costly to
create and maintain. It’s only when something goes wrong that
people begin to ask questions like:
Systems management documentation covers a multitude
of different areas. Everything about any computer system, and the
people who use it, can and should be documented so that, when things go
wrong (and something always goes wrong), remedial action can be taken to
minimise the consequences. System management documentation covers:
And the consequences of neglecting these include: System configuration includes the details how all of
the systems are set-up. This includes both software and hardware.
Software includes operating systems, system tools, software packages and
software you have developed in-house. Hardware includes not only PCs
and servers but also any peripheral device connected to them. If the
configuration of any of these things is not standard, and that is how they
are normally used, then if one of them should fail and need to be set up
again you will have difficulty doing this if the configuration information
is not documented and that documentation is not kept up to date.. System administration procedures can include details
of how systems run day to day, how things like new user accounts are to be
set up, how occasional situations (eg disks becoming full) are dealt with
- anything to do with the day-to-day running of your IT systems. If
these are not documented, or not kept up to date, it’s not only dealing
with problems that arise that would be problematic, but whenever an
intermittent problem turns it has to be solved by re-inventing the wheel
again and again. Good system administration documentation saves time
and therefore money. You might say that experienced system managers
and operators know this stuff and don’t need the documentation, but what
if they are off sick, or on holiday, or (god forbid) encounter the
underside of the proverbial bus? Data dictionaries detail the way in which your
databases and associated applications work: what is in a table, what is in
a field, how is this data used by the system, what do programs do with the
data? Without adequate and up to date data dictionary information,
the maintenance and enhancement of those systems takes much longer, and is
less likely to be effective. Again, time costs money. Backup procedures include schedules of tapes to be
used, where they go, what goes on them, what to do with them when the
backup is finished, how to check that the backup worked, how to restore
things from backup, where tapes are stored (on-site and off), how to get
tapes back from offsite storage. Most IT departments have their
backup procedures documented, but things change and if the documentation
is not kept up to date, problems can arise. IT department staff come
and go and what happens in the middle of the night when the relatively new
system operator is on his or her own and something that they haven’t run
across before goes wrong and there’s no documentation to help them?
They can’t do their job and those vital backups don’t get done. Disaster recovery plans deal with every possible
eventuality that may befall a computer system, from minor problems to
their complete destruction and the need to restore the entire system, and
how it is done. Most large organisations have long since recognised
the need for a disaster recovery plan and have implemented one.
Keeping it up to date is essential. The details of new hardware and
software and their configurations (see above) must be included whenever
they come along. Hopefully a disaster recovery plan will never be
needed, but if it is, and it hasn’t been kept up to date, you may find
that you restore your system as it was three years ago, instead of last
week. Help desk knowledge bases are immensely useful, for
those who have them, for storing information about how problems have been
dealt with in the past and how to deal with them if they arise again.
Keeping these up to date is vital to ensure that mission-critical systems
keep running. Document management systems are often used as the
repositories for all of the information upon which a business depends to
keep it running. Ensuring that everyone has access to them,
recognises their importance and keeps them up to date (though this needs
to be controlled) is an essential part of ensuring the continued efficient
running of any organisation. While the hardware and software that an organisation
uses is important, it is useless without the people who use it and the
people who ensure that it runs smoothly and keeps on running.
Implementing, and most importantly having continuing commitment to a
quality management system such as ISO9000:2000 is one means to ensure that
those people recognise the need to maintain things like system management
documentation and to continually review and improve it. May 2003. This article may be reproduced only with the permission of HCi (email HCi ). Copyright HCi, 2001-3. |
|
| Back to Journal Second Quarter 2003 | |
HCi has formed a new consulting arm called Realisation. Click here to visit the Realisation site for further information.