Collected ramblings about overview analysis

[Return to the Contents List]

Contents List

Thoughts on attributes of an ideal overview database and analysis scenario (13 June 1991)


Desirable features

Information which needs to be stored

Questions which the scenario needs to answer on request

Other functions needed



The next step

Notes added later

Things which we should have done differently! (7 June 2000)

Don't change centre/group numbers

Once a centre/group number has been allocated, it should never be changed, even when it becomes evident that the centre/group has additional numbers already. It is a continuing nuisance to go back to old files and to find that trial numbers have been altered in the past, rendering some of the old paperwork quite difficult to use.

The most absurd example is the Hodgkin's Disease overview, where someone decided to change all of the centre/group numbers to correspond with one of the other overviews (probably Breast Cancer). This meant that all of the paperwork and filing had to be changed, and that it bore no relationship to the patient data, overview and project structure in the computer any more. "Fortunately" that overview was abandoned for other reasons before the full consequences had to be dealt with.

Don't force a quart into a pint pot

"Data form" encoding schemes inevitably impose prejudices which can lead to the loss or perversion of valuable information. Whereas it is difficult to avoid shedding things when a standard structure is imposed on other people's data, some unfortunate things have happened. For example, in the Breast Cancer overview the coding of recurrences, death causes and second malignancies suffered from early assumptions (e.g. that death causes were so unreliable as to be of little interest, could just be lumped into a few categories and were irrelevant if any recurrence ever happened; that local events occurring after distant ones should be ignored; that second malignancies were of little interest unless in the contralateral breast, etc).

With hindsight, we should in the early days have made far more use of the "comments field" even for such elementary things as the patient's name and hospital number (to aid identification for flagging later) and, where supplied, additional events, ICD codes, measurements of ER and PR, tumour details etc. Also, we shouldn't have forced patient identifiers into a 6-digit field when they wouldn't fit - unpacking them again later was a nuisance.

Keep an audit trail

It is now quite difficult to unravel some of the early changes made to some of the data sets. Sometimes one would like to know why changes have occurred, which entails identifying when they occurred and thence locating the source of the information. It is important to keep a system which enables the identification of the point at which a given change occurred and, ideally, also to make it possible to roll back the overview to well-defined past states. The analysis system needs to retain the ability to reproduce past results: people keep asking for them years and years later. It is also important not to duplicate, which causes much uncertainty about what is 'definitive', or to keep large amounts of stuff of uncertain provenance on an ad hoc basis.

... but thank goodness we did this!

Keep your nerve

Don't lose sight of the advantage of a single, unified approach to everything. It has been possible to achieve this in the past by integrating all of the overviews in GALAXY so that they can be managed and analysed using a single set of tools, with the great benefit that every new step forward is automatically available for every overview. It takes only a little care to keep this asset by remembering the 'big picture', however great the pressure for a quick fix, and with hindsight the flexibility of having everything integrated has been wonderful. A future dream is that the project management side of overviews can once again be integrated with the data and analysis - although impossible in 2000, this should be feasible by 2005.

The Grand Reunification of GALAXY (5 September 2000)

Until about 1993 the development of the project management, data processing and scientific analysis facets of GALAXY existed as an integrated whole on the VAXen (later Alphas). Until 1990 or so, indeed, all the document processing applications had to be written from scratch or dumped onto overworked secretarial staff using small DEC Rainbows or Macs - PCs were rare and limited in ability.
As PCs improved, it was natural to hive many of the project management and word processing aspects of GALAXY off to them, using good off-the-shelf software. For the 1995 Breast Cancer overview meeting a major part of the organisational work was done on networked PCs, although serious efforts were made to keep everything in the GALAXY project system complete and up to date. Some weirdnesses occurred at first, but proper attention to synchronisation between the live parallel databases on the VAXen/Alphas and numerous PCs was quickly learnt. Naturally, the whole idea of parallel databases is a horror story born of expediency.

It was hoped that it would be possible to move all of the GALAXY stuff into a SQL server long before the 2000 Breast Cancer meeting. Although most of the project management work was moved to Ingres - at the cost of some difficulties in implementing changes quickly (and the mysterious loss of all accented letters, a rather unprofessional gaffe for a high-profile international project) - experimentation during 1998 revealed that when trying to access patient data and to conduct analyses the transfer rate was truly abysmal, and unreliable to the point where many transactions failed to complete properly at all (or 'ports were busy' etc) - it was difficult to say how the blame should have been apportioned between the serving machine, the clients and the interconnexions, but it was evidently essential to keep any such system right off the critical path.

Before 2005, however, it is very much hoped that the performance of the SQL systems will have improved enough to make it possible to store, manage and analyse the patient data and associated information in a single, integrated database once again. The yardsticks are that the access speed, the reliability of transactions and the ease and flexibility of making changes should be at least as good as with the present suite of Alpha-based FORTRAN-95 software.
As a side issue, the Alpha-based GALAXY system has been upgraded to run equally on PCs, either standalone or networked. Its performance on PCs provides serious competition for that on the Alphas. There might be a middle way forward, by adding SQL capability to the existing system, rather than by starting again from scratch and losing much of the accumulated honing of 15 years' continuous critical use. On the other hand, it might be time to move on and to adapt other packages for future use.

[Return to the Contents List]

[End of document, updated to 5 September 2000]