Prediction of Drug-Induced Autoimmunity Using X Gradient Boost Machine Learning
PDF

Keywords

Drug-induced autoimmunity
XGBoost
molecular descriptors
SMOTE
preclinical screening
feature importance
gradient boosting
cross-validation

Abstract

Drug-induced autoimmunity (DIA) comprises immune mediated adverse events such as lupus, hepatitis, and uveitis that  can arise after extended drug exposure, complicating prospective risk assessment. We built a gradient-boosted tree (XGBoost) classifier using 196 RDKit-derived molecular descriptors for 477 compounds[1] and addressed class imbalance with SMOTE. On a held-out test set, the model achieved ROC-AUC of 0.888 with  66.7% recall and 57.1% precision for the positive class; five-fold cross-validation indicated strong generalization (ROC-AUC 0.974 ± 0.067). Gain-based feature importance highlighted topological complexity, aromaticity, and polarity-related descriptors as salient. The framework enables rapid, cost-effective screening of autoimmune risk during early discovery to prioritize compounds for deeper evaluation. 

PDF