Heart Disease NLPTransformer comparison on medical reports
A transformer comparison on medical report classification — DistilBERT, DistilRoBERTa, and BETO evaluated honestly on the same preprocessing pipeline.
What it had to solve.
The interesting question wasn't whether a transformer can classify medical text — it was which transformer family generalises well when you hold preprocessing, tokenisation, and the evaluation split constant. Comparison papers in this space often quietly vary all three; the goal here was to vary only the model.
Pipeline, end-to-end.
Shared preprocessing (cleaning, tokenisation per model, attention-mask construction) into three fine-tuned encoders. Each model trained with the same hyperparameter envelope; results reported on a single held-out test split.
What the comparison actually said.
Model comparison
All three transformers fine-tuned on identical preprocessing and split. DistilRoBERTa came out ahead, with DistilBERT and BETO tied behind it.
Choices that shaped the comparison.
Same preprocessing for every model
Tokeniser changes per model (BERT vs RoBERTa vs BETO use different subword vocabularies), but the cleaning, normalisation, and split were identical. The comparison only means something if everything except the model is constant.
Attention masks built explicitly
Padding masks, special-token masks, and segment IDs constructed per model's expectations rather than relying on tokeniser defaults. A few percentage points of accuracy lived in this detail alone.
Reported the comparison, not just the winner
DistilRoBERTa led, but DistilBERT and BETO were close. The honest takeaway is 'three lightweight encoders are all viable on this task' — not 'DistilRoBERTa is the answer'.
Held the core hyperparameters fixed
Same learning rate (5e-6) and batch sizes across all three models. DistilRoBERTa ran for 100 training epochs; DistilBERT and BETO ran for 50. No early stopping was applied. Per-model tuning could shift the ranking; the goal was to read base behaviour, not optimise a leaderboard.