- Introduced `prepare_data.R` for merging disease and other data from CSV files. - Added `prepare_data.py` for processing UK Biobank data, including: - Mapping field IDs to human-readable names. - Handling date variables and converting them to offsets. - Processing disease events and constructing tabular features. - Splitting data into training, validation, and test sets. - Saving processed data to binary and CSV formats.
2.7 KiB
2.7 KiB
| 1 | field_instance | full_name | var_name |
|---|---|---|---|
| 2 | 31-0.0 | Sex | sex |
| 3 | 34-0.0 | Year of birth | year |
| 4 | 48-0.0 | Waist circumference | waist_circumference |
| 5 | 49-0.0 | Hip circumference | hip_circumference |
| 6 | 50-0.0 | Standing height | standing_height |
| 7 | 52-0.0 | Month of birth | month |
| 8 | 53-0.0 | Date of attending assessment centre | date_of_assessment |
| 9 | 74-0.0 | Fasting time | fasting_time |
| 10 | 102-0.0 | Pulse rate automated reading | pulse_rate |
| 11 | 1239-0.0 | Current tobacco smoking | smoking |
| 12 | 1558-0.0 | Alcohol intake frequency. | alcohol |
| 13 | 4079-0.0 | Diastolic blood pressure automated reading | dbp |
| 14 | 4080-0.0 | Systolic blood pressure automated reading | sbp |
| 15 | 20150-0.0 | Forced expiratory volume in 1-second (FEV1) Best measure | fev1_best |
| 16 | 20151-0.0 | Forced vital capacity (FVC) Best measure | fvc_best |
| 17 | 20258-0.0 | FEV1/ FVC ratio Z-score | fev1_fvc_ratio |
| 18 | 21001-0.0 | Body mass index (BMI) | bmi |
| 19 | 21003-0.0 | Age when attended assessment centre | age_at_assessment |
| 20 | 30000-0.0 | White blood cell (leukocyte) count | WBC |
| 21 | 30010-0.0 | Red blood cell (erythrocyte) count | RBC |
| 22 | 30020-0.0 | Haemoglobin concentration | hemoglobin |
| 23 | 30030-0.0 | Haematocrit percentage | hematocrit |
| 24 | 30040-0.0 | Mean corpuscular volume | MCV |
| 25 | 30050-0.0 | Mean corpuscular haemoglobin | MCH |
| 26 | 30060-0.0 | Mean corpuscular haemoglobin concentration | MCHC |
| 27 | 30080-0.0 | Platelet count | Pc |
| 28 | 30100-0.0 | Mean platelet (thrombocyte) volume | MPV |
| 29 | 30120-0.0 | Lymphocyte count | LymC |
| 30 | 30130-0.0 | Monocyte count | MonC |
| 31 | 30140-0.0 | Neutrophill count | NeuC |
| 32 | 30150-0.0 | Eosinophill count | EosC |
| 33 | 30160-0.0 | Basophill count | BasC |
| 34 | 30170-0.0 | Nucleated red blood cell count | nRBC |
| 35 | 30250-0.0 | Reticulocyte count | RC |
| 36 | 30260-0.0 | Mean reticulocyte volume | MRV |
| 37 | 30270-0.0 | Mean sphered cell volume | MSCV |
| 38 | 30280-0.0 | Immature reticulocyte fraction | IRF |
| 39 | 30300-0.0 | High light scatter reticulocyte count | HLSRC |
| 40 | 30500-0.0 | Microalbumin in urine | MicU |
| 41 | 30510-0.0 | Creatinine (enzymatic) in urine | CreaU |
| 42 | 30520-0.0 | Potassium in urine | PotU |
| 43 | 30530-0.0 | Sodium in urine | SodU |
| 44 | 30600-0.0 | Albumin | Alb |
| 45 | 30610-0.0 | Alkaline phosphatase | ALP |
| 46 | 30620-0.0 | Alanine aminotransferase | Alanine |
| 47 | 30630-0.0 | Apolipoprotein A | ApoA |
| 48 | 30640-0.0 | Apolipoprotein B | ApoB |
| 49 | 30650-0.0 | Aspartate aminotransferase | AA |
| 50 | 30660-0.0 | Direct bilirubin | DBil |
| 51 | 30670-0.0 | Urea | Urea |
| 52 | 30680-0.0 | Calcium | Calcium |
| 53 | 30690-0.0 | Cholesterol | Cholesterol |
| 54 | 30700-0.0 | Creatinine | Creatinine |
| 55 | 30710-0.0 | C-reactive protein | CRP |
| 56 | 30720-0.0 | Cystatin C | CystatinC |
| 57 | 30730-0.0 | Gamma glutamyltransferase | GGT |
| 58 | 30740-0.0 | Glucose | Glu |
| 59 | 30750-0.0 | Glycated haemoglobin (HbA1c) | HbA1c |
| 60 | 30760-0.0 | HDL cholesterol | HDL |
| 61 | 30770-0.0 | IGF-1 | IGF1 |
| 62 | 30780-0.0 | LDL direct | LDL |
| 63 | 30790-0.0 | Lipoprotein A | LpA |
| 64 | 30800-0.0 | Oestradiol | Oestradiol |
| 65 | 30810-0.0 | Phosphate | Phosphate |
| 66 | 30820-0.0 | Rheumatoid factor | Rheu |
| 67 | 30830-0.0 | SHBG | SHBG |
| 68 | 30840-0.0 | Total bilirubin | TotalBil |
| 69 | 30850-0.0 | Testosterone | Testosterone |
| 70 | 30860-0.0 | Total protein | TotalProtein |
| 71 | 30870-0.0 | Triglycerides | Tri |
| 72 | 30880-0.0 | Urate | Urate |
| 73 | 30890-0.0 | Vitamin D | VitaminD |
| 74 | 40000-0.0 | Date of death | Death |