BC 2005 - breast cancer - data structure and protocols

[Return to the Contents List]


Contents List


BC 2005 data preparation: brief written protocol

Strict confidentiality of trial results is observed. Information is held in the Clinical Trial Service Unit computers in a form which can be accessed only by known individuals.

All patient records are converted into 'pink form' (2005) format (described below) if not already supplied in it. Results received as tables are converted into sets of synthetic records. The following routine checks (where appropriate) are performed on every compilation:

The total numbers of patients and the distributions of randomisation age, menopausal status, axillary nodal status, oestrogen receptor status and progesterone receptor status are checked for any significant imbalance between treatment groups. These five distributions are compared as follows. Patients are grouped into three categories according to randomisation age (below 50 years; 50 - 69 years or unknown; 70 years or above) and a chi-squared test is applied to the population of the three categories found in each treatment group. Similarly, three categories are formed for menopausal status (pre- or perimenopausal; unknown; postmenopausal), for axillary nodal status (negative; unknown; positive), for oestrogen receptor status (poor; unknown; positive) and for progesterone receptor status (poor; unknown; positive) and these are tested in the same way as the categories formed for randomisation ages.

If an event such as the recurrence of disease is reported at a date later than the quoted last follow-up date, the last follow-up date is automatically changed to the later date. The completeness of follow-up is then calculated for the end of each calendar year. The distributions of randomisation dates, randomisation ages and time elapsed since last follow-up are checked for any significant imbalance between treatment groups in two ways as follows. Firstly, a t-test is applied to the difference between the mean value of each distribution for patients in each group with the corresponding mean for patients in the remainder. Secondly, an F-ratio is calculated for each distribution by comparing the variance between the groups with the variance within the groups. The distribution of time elapsed since last follow-up is also checked in these two ways for any significant imbalance between those patients with and those patients without a recorded recurrence of disease. Finally, the distribution of time elapsed since last follow-up is checked in the same two ways for any significant imbalance between patients in two categories of menopausal status (pre- or perimenopausal; postmenopausal), two categories of axillary nodal status (negative; positive), two categories of oestrogen receptor status (poor; positive) and two categories of progesterone receptor status (poor; positive).

Where patient serial numbers form an obvious sequence it is checked for missing numbers.

A tabulated breakdown of variables is produced for each trial, together (where relevant) with lists of patents in 'problematical' categories such as those with lapsed follow-up, uncertain death cause or second malignancy site. Graphs of accrual date and the proportion of living patients still on follow-up as a function of time from randomisation by treatment allocation are also produced, together with Kaplan-Meier life-table curves. Before trial data are finally incorporated into the overview, the analyses described above are sent to the participating trialist(s) for checking and approval.

Contact

Please address inquiries concerning data preparation and checking to:

Specification of BC 2005 'pink form' format

Item Description FORTRAN Columns Details Abbreviation
Trial/stratum identifying code  I6  1 - 6  Trial 
Patient identifier (or sequence number)  A12 8 - 19  Patient Identifier
Randomisation date  I8  21 - 28  DDMMYYYY  Random.
Treatment group allocated (as on master list)  I1  30  Group
Randomisation age  I3  32 - 34  years  Age 
5 Menopausal status at randomisation I1 36 
Value Description Abbreviation
Pre-menopausal  Pre 
Peri-menopausal  Peri 
Post-menopausal  Post 
Artificial  Arti 
 
Meno 
Surgery: first mastectomy  I2  38 - 39 
Value Description Abbreviation
Radical  Radical 
Total with clearance Total Y
Simple without clearance  SimpleN 
Partial with clearance Partl.Y 
Partial without clearance  Partl.N 
Lumpectomy with clearance  LumpecY 
Lumpectomy without clearance  LumpecN 
Partial, clearance unknown  Partl.? 
Lumpectomy, clearance unknown  Lumpec? 
10  Subcutaneous  SubcutN 
11  Simple with clearance  SimpleY 
12  Other  Other 
13  None  None 
14 Wide local excision WideLoc
15 Patey mastectomy Patey
16 Mastectomy NOS MastNOS
17 Simple, clearance unknown Simple?
 
Surg. 
Axillary status at randomisation I2  41 - 42 
Value Description Abbreviation
N0 (clearance)  pN0 
N1-3 (clearance)  pN1-3
N4+ (clearance)  pN4+ 
N- (sample only)  sN- 
N+ (sample only)  sN+ 
N- (clinical)  cN- 
N+ (clinical)  cN+ 
N- (method unknown)  ?N- 
N+ (method unknown)  ?N+ 
10  N+ (clearance)  pN+ 
11  Benign lesion  Benign 
12  N- (clinical) N0 (clearance)  cN-pN0 
13  N- (clinical) N+ (clearance)  cN-pN+ 
14  N+ (clinical) N0 (clearance)  cN+pN0 
15  N+ (clinical) N+ (clearance)  cN+pN+ 
16  Not breast cancer  Not BC 
17 N1+ (clearance) pN1+
 
Axilla 
8 Oestrogen receptor (ER) coding I1 44
Value Description Abbreviation
1 fmol/mg cytosol protein  f/m
Percent cells staining  %cst
Other codes
4 Categorical data, fmol/mg cytosol protein f/m
5 Qualitative data, percent cells staining %cst
6 Qualitative data, other codes
7 Remmele IRS score IRS
8 fmol/µg DNA × 100 (isoelectric focusing) f/m
9 fmol/µg DNA × 100 (Abbott EIA) f/m
10 "Histoscore" - NEW Scor
 
Units
9 Oestrogen receptor measurement (> 0: as (8)) I4  46 - 49 
Value Description Abbreviation
fmol/mg protein 
-1  Negative  ERpoor 
-2  Marginal  ERpoor 
-3  Positive ER+ 
-4  < 10 fmol/mg protein  ERpoor 
-5  10 - 19 fmol/mg protein  ER+ 
-6  20 - 29 fmol/mg protein ER+ 
-7 30 - 49 fmol/mg protein  ER+ 
-8  50 - 99 fmol/mg protein  ER+ 
-9  100+ fmol/mg protein  ER++
-10  10 - 29 fmol/mg protein  ER+ 
-11  30 - 100 fmol/mg protein  ER+ 
-12  10 - 99 fmol/mg protein  ER+ 
-13  0 fmol/mg protein / 0% cells staining  ER0 
-14  10 - 49 fmol/mg protein  ER+ 
-15 < 20 fmol/mg protein ERpoor
-16 < 5 fmol/mg protein ERpoor
-17 5 - 9 fmol/mg protein ERpoor
-18 1 - 10% cells staining - NEW ERpoor
-19 11 - 33% cells staining - NEW ER+
-20 34 - 66% cells staining - NEW ER+
-21 67 - 100% cells staining - NEW ER+
 
ER
10 Progesterone receptor (PR) coding I1 51
Value Description Abbreviation
fmol/mg cytosol protein  f/m
Percent cells staining  %cst
Other codes
4 Categorical data, fmol/mg cytosol protein f/m
5 Qualitative data, percent cells staining %cst
6 Qualitative data, other codes
7 Remmele IRS score IRS
8 fmol/µg DNA × 100 (isoelectric focusing) f/m
9 fmol/µg DNA × 100 (Abbott EIA) f/m
10 "Histoscore" - NEW Scor
 
Units
11 Progesterone receptor measurement (> 0: as (10)) I4  53 - 56 
Value Description Abbreviation
fmol/mg protein 
-1  Negative  PRpoor 
-2  Marginal  PRpoor 
-3  Positive  PR+ 
-4  < 10 fmol/mg protein  PRpoor 
-5  10 - 19 fmol/mg protein  PR+ 
-6  20 - 29 fmol/mg protein  PR+ 
-7  30 - 49 fmol/mg protein  PR+ 
-8  50 - 99 fmol/mg protein  PR+ 
-9  100+ fmol/mg protein  PR++ 
-10  10 - 29 fmol/mg protein  PR+ 
-11  30 - 100 fmol/mg protein  PR+ 
-12  10 - 99 fmol/mg protein  PR+ 
-13  0 fmol/mg protein / 0% cells staining  PR0 
-14  10 - 49 fmol/mg protein  PR+ 
-15 < 20 fmol/mg protein PRpoor
-16 < 5 fmol/mg protein PRpoor
-17 5 - 9 fmol/mg protein PRpoor
-18 1 - 10% cells staining - NEW PRpoor
-19 11 - 33% cells staining - NEW PR+
-20 34 - 66% cells staining - NEW PR+
-21 67 - 100% cells staining - NEW PR+
 
PR
12 Laterality of primary tumour(s) I1 58
Value Description Abbreviation
Left Left
Right Right
Bilateral Bilat
4 Not assessable N/A
 
Side
13 Size of primary tumour (> 0: mm) I3 60 - 62
Value Description Abbreviation
mm
-1  1 - 20 mm T1
-2  21 - 50 mm T2
-3  51 - 100 mm T3
-4 > 100 mm T3+
-5 T1: < 21 mm T1
-6 T1a: < 6 mm T1a
-7 T1b: 6 - 10 mm T1b
-8 T1c: 11 - 20 mm T1c
-9 T2: 21 - 50 mm T2
-10 T2a: 21 - 30 mm T2a
-11 T2b: 31 - 40 mm T2b
-12 T2c: 41 - 50 mm T2c
-13 T3: > 50 mm T3
-14 T4: any size, direct extension to chest wall or skin T4
-15 T4a: extension to chest wall T4a
-16 T4b: oedema or ulceration of breast skin T4b
-17 T4c: T4a + T4b T4c
-18 T4d: inflammatory carcinoma T4d
-19 Multiple tumours Multi
-20 Tis: carcinoma in situ Tis
-21 1 - 30 mm <31mm
-22 > 30 mm >30mm
-23 1 - 50 mm <51mm
-24 > 50 mm >50mm
-25 > 30 mm >30mm
-26 > 20 mm >20mm
-27 < 10 mm - NEW <10mm
-28 10 mm + - NEW >9mm
 
Size
14 Histological grade of primary tumour I2 63 - 64
Value Description Abbreviation
Well differentiated Well-
Moderately differentiated Modt.
Poorly differentiated Poor.
4 G1: well differentiated G1
5 G2: moderately differentiated G2
6 G3: poorly differentiated G3
7 GX: not assessable GX
8 G4: undifferentiated G4
9 Bloom-Richardson grade 1 (well) BR 1
10 Bloom-Richardson grade 2 (moderately) BR 2
11 Bloom-Richardson grade 3 (poor) BR 3
12 Moderately or poorly differentiated Md/Pr
13 "Two out of risk factors" 2 RF
14 "Three out of risk factors" 3 RF
15 Nottingham grade 1 - NEW NG 1
16 Nottingham grade 2 - NEW NG 2
17 Nottingham grade 3 - NEW NG 3
18 Nottingham grade 4 - NEW NG 4
19 Nottingham grade 5 - NEW NG 5
 
Grade
15 Contralateral I1 66
Value Description Abbreviation
No No
Yes Yes
3 carcinoma in situ, N- - NEW CIS N-
4 carcinoma in situ, N+ - NEW CIS N+
5 carcinoma in situ, N? - NEW CIS N?
6 Invasive - NEW Invasiv
7 Invasive +N - NEW Inv +N
 
Contra.
16 Date of contralateral breast cancer I8 68 - 75 DDMMYYYY Contra. Date
17 Second malignancy (ICD) A5  77 - 81    2Ca. ICD
18 ICD revision for second malignancy I1 82 - 83 7-10
19 Date of second malignancy  I8  85 - 92 DDMMYYYY 2Ca. Date 
20 No. of additional malignancies I1 94
21  Distant/unknown-site recurrence  I1  96 
Value Description Abbreviation
No No 
Distant  Distant 
Unknown site  Unknown
4 Uncertain Uncert.
5 CIS, unknown - NEW CIS unk
 
Distant
22  Date of first distant/unknown-site recurrence  I8  98 - 105 DDMMYYYY  Distant Date 
23  Prior local recurrence  I1  107 
Value Description Abbreviation
No  No 
Yes  Yes 
Ipsilateral breast  Ipsilat 
Other locoregional  Other 
5 Ipsilateral axilla (where available) - NEW Ipsi.Ax
6 Recurrent carcinoma in situ, N- - NEW CIS N-
7 Recurrent carcinoma in situ, N+ - NEW CIS N+
8 Recurrent carcinoma in situ, N? - NEW CIS N?
9 Invasive - NEW Invasiv
 
Local 
24  Date of prior local recurrence  I8  109 - 116 DDMMYYYY Local Date 
25  State when last traced  I2  118 - 119 
Value Description Abbreviation
Alive  Alive 
Dead  Dead 
Lost  Lost 
Utterly lost  Lost+
Alive, ineligible for protocol  Alive/in 
Dead, ineligible for protocol  Dead/in. 
Lost, presumed dead  Lost/D? 
10  Lost and ineligible  Lost/in 
11  Utterly lost and ineligible  Los+/in 
12 Last trace date from flagging Flagged
 
Status 
26  Date died or last traced  I8  121 - 128 DDMMYYYY  Status Date 
27  ICD revision for death cause I2  130 - 131 7-10
28  Cause of death (ICD) A5  133 - 137  ICD  D. ICD
29  Cause of death I2 138 - 139 
Value Description Abbreviation
-2 Code not understood ??
-1 Further decision required Wait?
Iatrogenic  Iatro. 
Pneumonia  Pneumon 
Lymphatic and haematopoietic, non-AML Leukaem 
Other second malignancy, site specified Neopl.2 
Ischaemic heart disease IHeartD
DVT and pulmonary embolism DVT/PE
Cerebrovascular  Cervasc 
External cause  Extra. 
Other specified disease Not BC 
10  Unknown, but not breast cancer Not BC 
11  Breast cancer or its metastases  BC/Mets 
12  Unascertainable cause  Unknown 
13  Non-pneumonia respiratory  Respira 
14  Hepatic disease Hepatic 
15  Infectious/parasitic (excluding viral hepatitis) Infective 
16  Other circulatory OthCirc
17 AML (definite / probable / possible) AML
18 Second malignancy, primary unspecified Neopl.2
19 Endometrial cancer Endo.Ca
20 Colorectal cancer Colo.Ca
21 Primary liver cancer Hepa.Ca
22 Gastric cancer Gast.Ca
23 Ovarian cancer Ovar.Ca
24 Unknown second malignancy, non-breast Neopl.2
25 Lung cancer Lung Ca
26 Uterine cancer, part unspecified UterCa?
27 Liver cancer, unspecified HepatC?
28 Heart failure HeartF
29 Chronic obstructive lung disease COPD
 
"Cause" 
30  Name (if given) and comments  141 - end  Extra information 
Missing or unknown items are left blank or set to zero.

Death cause codes in BC 2005 data format

 
Value Description ICD-7 ICD-8 ICD-9 ICD-10
2 Pneumonia 490-493 480-486 480-486 J12-J18
3 Lymphatic and haematopoietic, non-AML 200-203
204.0-204.1
205
206 - NEW
207.1 - NEW
200-204
205.1
206.1
207.1
275.5
200-204
205.1
206.1
207.1
208.1
273.3
C81-C91
C92.1
C93.1
C94.1
C95.1
C96
4 Other second malignancy, site specified 140-197,199 - NEW
Excluding:
151
153-156
162 (except 162.2)
170
172
174-175
199.9
140-195 - NEW
Excluding:
151
153-155
162
174
182-183
140-195 - NEW
Excluding:
151
153-155
162
174
179
182-183
C00-C76 - NEW
C97
Excluding:
C16
C18-C22
C33-C34
C50
C54-C57
5 Ischaemic heart disease 420
422.1
410-414 410-414 I20-I25
6 DVT and pulmonary embolism 463-466
684
450-451
453
673
415
451
453
673
I26
I80
I82
O88.2
7 Cerebrovascular 330-334 430-438 430-438 I60-I69
8 External cause E800-E999
N800-N999
E800-E999
N800-N999
800-999
E800-E999
S00-Y98
9 Other specified disease All other ICD codes All other ICD codes All other ICD codes All other ICD codes
10 Unknown, but not breast cancer (Note 6) - - - -
11 Breast cancer or its metastases 170 174 174 C50
12 Unascertainable cause (Note 6) - - - -
13 Non-pneumonia respiratory 470-527
783
Excluding:
490-493
502
527
460-519
783
Excluding:
480-486
491-492
460-519
786
Excluding:
480-486
491-492
496
J00-J99
R04-R09
Excluding:
J12-J18
J41-J44
14 Hepatic disease 092
580-583
070
570-573
070
570-573
B15-B19
K70-K77
15 Infectious/parasitic (excluding viral hepatitis) 001-138
Excluding:
092
000-136
Excluding:
070
001-139
279.1
Excluding:
070
A00-A99
B00-B99
Excluding:
B15-B19
16 Other circulatory 400-468
782
795.2
Excluding:
420
422.1
434.1-434.2
463-466
390-458
782
795
Excluding:
410-414
427.0-427.1
430-438
450-451
453
390-459
785
798
Excluding:
410-415
428
430-438
451
453
I00-I99
R00-R02
R57
R96
Excluding:
I20-I26
I50
I60-I69
I80
I82
17 AML (definite / probable / possible) 204.2-204.4
207.0 - NEW
205-207
Excluding:
205.1
206.1
207.1
205-208
Excluding:
205.1
206.1
207.1
208.1
C92-C95
Excluding:
C92.1
C93.1
C94.1
C95.1
18 Second malignancy, primary unspecified 156
198 - NEW
196-199 - NEW
Excluding:
197.8
199.9
196-199 - NEW
Excluding:
199.9
C77-C80 - NEW
Excluding:
C80.9
19 Endometrial cancer 172 182
Excluding:
182.9
182 C54
20 Colorectal cancer 153-154 153-154 153-154 C18-C21
21 Primary liver cancer 155 155 155.0-155.1 C22 - NEW
Excluding:
C22.9
22 Gastric cancer 151 151 151 C16
23 Ovarian cancer 175 183 183 C56-C57
24 Unknown second malignancy, non-breast (Note 6) 199.9 - NEW 199.9 - NEW 199.9 - NEW C80.9 - NEW
25 Lung cancer 162
Excluding:
162.2
162 162 C33-C34
26 Uterine cancer, part unspecified 174 182.9 179 C55
27 Liver cancer, unspecified - 197.8 155.2 C22.9 - NEW
D37.6 - NEW
28 Heart failure 434.1-434.2 427.0-427.1 428 I50
29 Chronic obstructive lung disease 502
527
491-492 491-492
496
J41-J44

Notes

1. If a textual description of the cause of death is given, this should be put in the comments field using the (d. ....) convention. If the description includes labels (1a, 1b etc), these should also be entered as they help to determine the underlying cause of death.

2. If a patient does not have a 'recurrence' recorded before her death (i.e. she has had neither an isolated local recurrence, nor a distant recurrence or recurrence of unknown site, nor a contralateral breast cancer), the "cause" (item 29) needs to be put into the record. This code should be set to the underlying cause of death using the standard ICD rules for selecting the underlying cause, except that pulmonary embolism should take precedence over the cause of the pulmonary embolism. For deaths after a 'recurrence', the item-29 "cause" does not need to be put into the record.

3. If the trialist supplies a single ICD code for the underlying cause of death, that and the ICD revision should be put into the record (items 28 and 27 respectively). ICD codes are ranged left and omit the point character '.', where relevant. If the trialist has supplied more than one ICD code for a patient's death, and the underlying cause of death is not clear, all the ICD codes should be put into the comments field using the ICDx[xxx,xxx,...] convention.

4. ICD7[199.9], ICD8[199.9], ICD9[199.9] and ICD10[C80.9] are special flags to indicate that the trialist has tried and failed to obtain information on the site of a malignancy.

5. External causes of death should be coded according to the cause rather than the nature of the injury (i.e. E codes in ICD revisions 7-9 and V-Y codes in revision 10), where possible.

6. "Cause" codes 10, 12 and 24 are used to indicate that the trialist has tried and failed to obtain more specific information.

7. "Cause" code 1 (iatrogenic) is no longer used. Such deaths are now coded to their appropriate place according to ICD rules (e.g. in ICD-9, death due to toxicity from antineoplastic drugs is E933.1, death due to acute myeloid leukaemia probably caused by antineoplastic drugs is 205.0 and death due to pulmonary fibrosis that is possible radiation-induced is 515).


[Return to the Contents List]

[End of document, updated to 9 September 2009]