Machine Learning–Based Predictors for RA Relapse Evaluated

Researchers found that the extreme gradient boosting (XGBoost) predictor could more accurately predict rheumatoid arthritis (RA) relapse than logistic regression and random forest.

When evaluating machine learning (ML)-based predictors of relapse among patients with rheumatoid arthritis (RA), researchers found that the extreme gradient boosting (XGBoost) predictor performed the best.

XGBoost had a higher accuracy (area under the receiver operator characteristic curve [AUC] = 0.747) than the 2 other classifiers, logistic regression (AUC = 0.701) and random forest (AUC = 0.719).

The logistic regression model is a traditional, generalized, linear model used for binary classification on clinical prediction, and the random forest model is an ensemble algorithm that combines multiple decision trees. Similar to random forest, XGBoost is a decision tree–based ensemble algorithm, but it instead uses gradient boosting to achieve more accurate predictions.

“XGBoost algorithm selects one feature when there is a high correlation between variables, whereas random forest randomly selects a feature and learns the correlations of different features across the model,” the study authors wrote. “Therefore, XGBoost was considered more accurate in feature selection because it could select a smaller number and more efficient features.”

According to the authors, whose research was published in Scientific Reportsthese results suggest ML-based predictors can accurately predict RA relapse, and therefore similar predictive algorithms can potentially facilitate personalized treatment plans for patients.

After exclusions, the study included 210 patients with RA who were enrolled in the KURAMA cohort in 2015 and had available follow-up and ultrasound data in 2017. These patients were divided into 2 groups, with 150 patients who achieved remission in 2017 in the “ remission” group and 60 patients with RA relapse in 2017 in the “relapse” group.

Using ultrasound examination and blood test data, the study authors found that several clinical and biological markers associated with RA disease activity were significantly higher among patients with relapse compared with markers in patients with remission:

  • Disease activity score on 28 joints-CRP
  • Simplified disease activity index
  • Clinical disease activity index
  • Health Assessment Questionnaire
  • Patient global assessment with visual analog scale

They then applied a recursive feature elimination selection algorithm to improve accuracy, using gender, disease duration, age, wrist superb microvascular imaging (SMI) score, metatarsophalangeal (MTP) SMI score, erythrocyte sedimentation rate ESR, C-reactive protein, rheumatoid factor,
anti–cyclic citrullinated peptide, and matrix metalloproteinase-3.

When comparing all 10 features’ values ​​between the 2 groups, wrist and MTP SMI scores were significantly higher in patients in the relapse group compared with patients in the remission group. The authors noted, however, that height and alanine aminotransferase were significantly lower in patients with relapse. No other significant differences were noted.

According to the authors, these findings reflect an improved model for predicting relapse in RA patients through ML.

“The combination of data on US” [ultrasound] examination and blood test was a unique approach of this study, and US data were shown to be essential for prediction,” they said. “The findings may lead to a better assessment of relapse risk and enable the selection of personalized treatment strategies for RA patients.”


Matsuo H, Kamada M, Imamura A, et al. Machine learning-based prediction of relapse in rheumatoid arthritis patients using data on ultrasound examination and blood test. Sci Rep† Published online May 4, 2022. doi:10.1038/s41598-022-11361-y

Leave a Comment

Your email address will not be published. Required fields are marked *