GALAXY at CTSU: information about overviews

[Return to the Contents List]

Contents List


GALAXY is a multi-user software environment that evolved as the Oxford Galaxy Group used computers in astrophysics research. It has a long history (1971 ff.) and many people have to share the responsibility for what it has become today. The point of it all is to provide slick ways of doing cutting-edge research without sinking into bureaucracy or nerdism. At CTSU, the 'main' machine is currently A6:: where it is used for several overviews, project management, miscellaneous analysis, graphics, archiving etc.

Each user is given a private subdirectory and is 'put' there when identifying themself after login. A 'visitor' can use ZW:, the 'scratch' directory. A focused set of instructions for an impatient overview user is in JG$DK:EMMA.DAT. One day in the not very distant future someone hopes to make time to write a proper manual: it could be quite large.

Data for each overview are constructed in a uniform way. A set of 'universal' software is available for processing and analysis. Once the user has become familiar with the system, it is not necessary to perform craniotomies when moving from one subject to another. Further overviews can be incorporated without limit. Brief information about files and software follows.

General reference files

Combinations of treatment descriptions for GAZEBO JG$DK:COMBI.DAT
Impatient user's guide to overviews JG$DK:EMMA.DAT
Master list of overview endpoints JG$DK:ENDPOINT.DAT
GALAXY information (also in HELP) KK:OVERKILL.DAT
This document (also in HELP) JG$DK:OVERVIEW.DAT
Master list of treatment agents, synonyms and abbreviations JG$DK:PLANET.DAT
Master list of overview strata JG$DK:STRATA.DAT
How to update data JG$DK:UPDATE.DAT

Overview directories

The data for each overview are stored in a separate directory tree. These are currently held in A6::UDV5:[] except for FB: which is in V2::UDEV:[GALAXY.FB]. To select an overview, type its name (e.g. BC to select the breast cancer overview; all software will then operate on the files in BC):
ALLC: childhood acute lymphoblastic leukaemia trials
APT: anti-platelet trials
BC: breast cancer trials
BC$AK: pre-1990 'frozen' version of breast cancer trials (held in A6::UDV5:[GALAXY.BC.ARCHIVE])
BC$KK: pre-1995 'frozen' version of breast cancer trials (held in A6::UDV5:[GALAXY.BC.CLAMP])
BC$KK2: pre-2000 'frozen' version of breast cancer trials (held in A6::UDV5:[GALAXY.BC.CLAMP2])
BC$KK3 pre-2005 'frozen' version of breast cancer trials (held in A6::UDV5:[GALAXY.BC.CLAMP3])
CLL: chronic lymphoblastic leukaemia trials
CRC: colorectal cancer trials
CRC$KK: pre-2000 'frozen' version of colorectal cancer trials (held in A6::UDV5:[GALAXY.CRC.CLAMP})
EB: breast cancer epidemiology
FB: fibrinolytic trials
HD: Hodgkin's disease trials
PC: prostate cancer trials
PC$KK:  pre-1997 'frozen' version of prostate cancer trials (held in A6::UDV5:[GALAXY.PC.CLAMP])
PC$KK2:  pre-2000 'frozen' version of prostate cancer trials (held in A6::UDV5:[GALAXY.PC.CLAMP2])
PC$KK3: pre-2002 'frozen' version of prostate cancer trials (held in A6::UDV5:[GALAXY.PC.CLAMP3])

General files in each overview directory (e.g. BC:BRIEF.DAT)

Master list of abbreviated centre/group and trial/stratum names (e.g. SWOG, SWOG 8313 resp.) BRIEF.DAT
Recipes for data conditioning and run-time assumptions DIKTAT.DAT
Master list of endpoint definitions ENDPOINT.DAT
Master list of trial/stratum peculiarities, data and collaboration status etc EXTRAS.DAT
Master list of finish years for trials/strata FINISH.DAT
Condensed list of problems, peculiarities and special cases (full information is in LBCx files) GROUCH.DAT
Master list of entry criteria for trials/strata (BC only at present) GUESS.DAT
Synopsis of data processing procedure INFO.DAT
Labyrinth-friendly list of all centres/groups, trials/strata and treatment arms

(please note that JLIST goes out of date when LIST is altered, and needs to be re-created using JLIST)

Master list of all centres/groups, trials/strata and treatment arms (condensed version YLIST; laser-friendly ZLIST) LIST.DAT
Master list of deliberately unequal randomisation allocations LURCHE.DAT
Master numerical key of treatments (BC only at present) PLANET.DAT
Linked list of abbreviated centre/group and trial/stratum names together with year mnemonics 

(please note that RGBRIEF goes out of date when BRIEF or SERIAL are altered, and needs to be re-created using RGBRIEF)

Master list of year mnemonics (e.g. 84J2) for trials/strata SERIAL.DAT
Master list of sizes of trials/strata

(please note that TOTAL goes out of date when data are altered, and needs to be re-created using TOTALS)

Master list of trial/stratum update years UPDATE.DAT
Master list of numbers of patients in no-data trials XTOTAL
Condensed list of all centres/groups, trials/strata and treatment arms, with drug names abbreviated 

(please note that YLIST goes out of date when LIST or SERIAL are altered, and needs to be re-created using TERSE)

Master list of names, postal and E-mail addresses, telephone and FAX numbers ZAA.DAT
Laser-friendly list of all centres/groups, trials/strata and treatment arms 

(please note that ZLIST goes out of date when LIST is altered, and needs to be re-created using CUTTER)


Files for trial centre/group no. x in overview directory (e.g. BC:LBC123.DAT)

Original input file more or less as supplied from source (unique name)
Standard format ('pink form' style) file of all patient records x.DAT
'Verbose' copy of standard format file with full text of ICD codes appended x.ICD
(Optional) update mask for patient records, in 'pink form' style format xBIT.DAT
(Optional) second update mask for patient records, in 'pink form' style format xGRIP.DAT
Errors and omissions detected by STARE program DBAx.DAT
Imbalances detected by INQUIST program DBCx.DAT
Information, bibliography, diary etc LBCx.DAT
N.B: in some cases, x.NEW, xBIT.NEW and xGRIP.NEW are also present; these are more up-to-date than the .DAT versions and should be used in preference except when the out-of-date versions are needed to 'chase audit trails'. The software recognises the latest versions of files automatically. Similarly, .OLD, .OLDER and .OLDEST versions sometimes also exist. Where 'frozen' versions exist, they are suffixed .ICE, .ICE2 etc for successive ice ages. To use a frozen dataset, append $ICE (or $ICE2 etc) to the relevant name when selecting an overview, e.g. type BC$ICE2 to use the second 'frozen' dataset in the breast cancer overview.

Subdirectories (e.g. BC$AK:)

$AK: safekeeping place for outdated files
$DK: transient files, e.g. print-outs, overviews, graphics
$HB: steering files


Each trial centre or group is given an arbitrary serial number, e.g. 123. Some centres or groups may have more than one serial number, e.g. because data were received via different routes.

Each trial (or trial stratum, if the trial is not monolithic) from a given centre or group is given a serial number formed as 100 x (trial centre/group) + (2-digit number), e.g. 12301, 12302.

Each treatment arm of each trial or stratum is given a serial number formed as 100 x (trial/stratum number) + (2-digit number), e.g. 1230101, 1230102.

The serial numbers are 'permanent' and are attached to each standard format ('pink form' style) patient record. Trials are also given year mnemonics, e.g. 84J (the 'J'th trial listed as beginning in 1984). Where a trial is divided into strata the stratum numbers are appended, e.g. 84J1, 84J2. The year mnemonics are subject to change as new information is received.


There is a comprehensive suite of programs to condition, test, revise and analyse trial information; these are described in HELP. Most programs need some instructions to run; these are typed after the program name as a series of 'parameters' separated by spaces. Parameters can be names, letters, numbers etc depending on circumstances.

To find out what parameters to give a program, type a question mark after its name - e.g. to find out what parameters to give BLACKBOX, type BLACKBOX ?. If a 'blank' is needed in the sequence, use a dot - e.g. to give GAZEBO a first and third parameter only type GAZEBO firstparameter . thirdparameter.

Please don't alter the database without warning Jon Godwin (hold a séance if dead). Briefly:

  1. Data conditioning - every data set received has idiosyncrasies ranging from the lack of a trial/stratum identifier at the beginning of each patient record to the need for complete conversion from some alien format or even just tabular results.
    1. Universal program which calls an ad hoc subroutine for each data set received to convert it into 'pink form' style format:
      1. TRAUMA
      If updating information has previously been received, there will also be a 'difference mask' (Section 3).
    2. CTSU diktats about the interpretation of death causes etc vary continually. They can be applied uniformly, in a removable way, to the 'pink form' style format file(s). Universal program to apply diktats of the moment:
      1. DIKTAT (HD only at present)
  2. Data testing - having converted into 'pink form' style format using TRAUMA, there are four main components to the testing process. The results are usually returned to trialists for checking:
    1. Universal program to check for logical errors in the data (e.g. missing items, meaningless codes and impossible dates):
      1. STARE
    2. Universal program to tabulate a breakdown of variables and check for imbalances between groups and prognostic categories:
      1. INQUIST
    3. Universal program to list patients with lapsed follow-up, uncertain cause of death and certain other questions requiring special attention:
      1. LBASIC
    4. Universal program to create cumulative curves of patients randomised, proportion still alive on-study and raw life-table curves:
      1. LIFE
  3. Revision - when corrected or updated information is received, a 'difference' is formed (in 'pink form' style format) against the previous version and is used to create a new one. This makes it possible to audit and possibly remove changes which sometimes prove deleterious:
    1. Universal program to compare two data sets and form a 'difference mask':
      1. DIFFFE
    2. Universal program to update a data set using a 'difference mask':
      1. UPDATE
    3. Difference mask amalgamator:
      1. LAYBIT
  4. Analysis - having built the necessary data sets, having formed a list of comparisons for a particular scientific question and created a steering file, the final step is to create and present the 'overview':
    1. Universal program to calculate annual (O - E) and V etc for a particular overview:
      1. BLACKBOX
    2. Universal supplementary program to form subtracted annual (O - E) and V etc for certain endpoints of a particular overview:
      1. POSTER
    3. Universal program to create 'forest' plots:
      1. GAZEBO
    4. Universal program to create life-table curves:
      1. ABSEIL
    5. Universal program to generate histograms of prognostic variables for overview patients:
      1. HISTER
  5. In addition, there is a project management system to keep track of centre/group and trial/stratum information, data status and checks, steering files, personnel, addresses, correspondence, bibliography, the diary of events and action items.
    1. Universal program to view the status of the whole project:
      1. OVERLOOK
    2. Universal project program to produce detailed summary of a particular centre/group and its personnel, trials/strata, data checks, flags, steering files, bibliography, diary etc:
      1. RUGRAT
    3. Universal program to review the full names, prognostic and treatment allocation details etc for one or more trials/strata:
      1. WORLDE
    4. Universal program to review the year code, brief name and linked trials/strata for a particular trial/stratum:
      1. SERBIA
    5. Universal program to check the data flags for a particular trial/stratum:
      1. WRECKE
    6. 'Abbreviated trial list' generator (creates file YLIST from files LIST and SERIAL):
      1. TERSE
    7. Cosmeticiser for overview trial list prior to laser printing or export to IBM system:
      1. CUTTER
    8. 'RGBRIEF' generator (creates file RGBRIEF from files BRIEF and SERIAL):
      1. RGBRIEF
    9. Universal program to list steering file details:
      1. TITLES
    10. Universal program to list centre/group and trial/stratum details:
      1. TLIST
    11. Universal program to summarise data and check EXTRAS file:
      1. WCHECK (BC only at present)
    12. Universal program to generate version of data file(s) with ICD code text appended to records, if required:
      1. ICDADD
    13. Universal program to list year codes, serial numbers and brief names for a particular steering file:
      1. GROWLE
    14. Universal program to check for entries missing from the BRIEF and SERIAL files:
      1. SBCHECK
    15. Universal program to check for various anomalies in LIST file:
      1. LCHECK
    16. Universal program to build TOTAL and summarise overview details:
      1. TOTALS
    17. Universal program to build web-ready version of LIST file:
      1. ARACHNE
    18. 'Labyrinth-friendly' overview trial list generator:
      1. JLIST (BC only at present)
  6. Finally, there are various useful 'interactive' miscellanea:
    1. Laser printing for documents:
      1. BURGER (and variants ABURGER ... HBURGER for different printers)
    2. Logrank evaluator:
      1. FOREST
    3. Heterogeneity evaluator:
      1. HETERO
    4. ICD code evaluator:
      1. ICD
    5. Laser printing for graphics:
      1. LASER (and variants ALASER ... HLASER for different printers)
    6. Evaluator for various distribution functions:
      1. NAGBAG
    7. Odds ratio and significance evaluator:
      1. ZOE

Database interface

There is a library of FORTRAN-callable functions which retrieve information from the database, e.g. CBRIEF, COMBI, CSERIAL, CTOTAL, CWORLD, GUESS, LFINIS, LURCHE, LWRECK etc. These are described separately [link here]. A graphics interface is also available (calls resemble Culham GHOST).

[Return to the Contents List]

[End of document, updated to 13 MAY 2005]