Abstract
Drug-induced autoimmunity (DIA) comprises immune mediated adverse events such as lupus, hepatitis, and uveitis that can arise after extended drug exposure, complicating prospective risk assessment. We built a gradient-boosted tree (XGBoost) classifier using 196 RDKit-derived molecular descriptors for 477 compounds[1] and addressed class imbalance with SMOTE. On a held-out test set, the model achieved ROC-AUC of 0.888 with 66.7% recall and 57.1% precision for the positive class; five-fold cross-validation indicated strong generalization (ROC-AUC 0.974 ± 0.067). Gain-based feature importance highlighted topological complexity, aromaticity, and polarity-related descriptors as salient. The framework enables rapid, cost-effective screening of autoimmune risk during early discovery to prioritize compounds for deeper evaluation.
