migraine-risk-prediction

Data Dictionary

This document describes the raw variables and derived features used in the migraine study dataset. It includes variable types, definitions, ranges, units, and relevant notes for interpretation and modelling.


1. Raw Variables

Variable Type Description Range Units Notes
rownames Integer Record index 1–4152 Unique per row; not a clinical identifier
id Integer Unique patient identifier 1–133 Each patient has multiple longitudinal records
time Integer Temporal index relative to migraine treatment −? to 23 days Negative values: days before treatment start; 0 = treatment start; positive = days after
dos Integer Study day count 98–1239 days Days elapsed since study start (Jan 1 of first year)
hatype Categorical Migraine subtype Aura, No Aura, Mixed Nominal variable with 3 categories
age Integer Patient age 18–66 years Adult population
airq Numeric Air quality index 3–73 AQI Likely PM2.5 or general pollution index
medication Categorical Medication regimen continuing, reduced, none Treatment status
headache Categorical Headache occurrence yes, no Binary target variable
sex Categorical Patient sex female, male Binary

2. Derived Variables

Variable Type Description Range Units Notes
phase Categorical Treatment phase indicator no treatment / treatment Derived from time
month_study Numeric Calendar month 1–12 month Computed from dos using origin date 1997‑01‑01
season_study Categorical Season of the year winter, spring, summer, autumn Month-to-season mapping
days_since_first_visit Numeric Days since patient’s first visit 0–128 days Longitudinal difference per patient
age_band Categorical Age groups 18–30, 31–40, 41–50, 51–66 years Binned