ADVERTISEMENT
A machine learning approach for predicting bone metastases and its three-month prognostic risk factors in hepatocellular carcinoma patients using SEER data
Hepatocellular carcinoma (HCC) frequently spreads to the bones, and those who do so have a worse prognosis than those who do not. The purpose of this study is to identify the predictors and three-month prognostic indicators of bone metastasis(BM) in patients with HCC and to identify the best machine learning (ML) model for it, which will be used to assist with clinical practice.
Data from the Surveillance, Epidemiology, and End Results (SEER) database of HCC patients diagnosed between 2010 and 2017 were retrospectively analyzed. Logistic regression univariate and multivariate analysis were used to identify independent predictors for BM and the 3-month mortality risk factors of hepatocellular carcinoma (HCC) patients with bone metastases. ML, which uses algorithmic methods, is a rapidly expanding field that is frequently used in the biomedical industry. Random forest (RF) and artificial neural networks (ANN) were used to create prediction models. Both ML models were 5-fold cross-validated, and the best-performing model was chosen based on the area under the curve (AUC) value and F score. We used the Easy Ensemble (EE) approach for resampling instead of the questioned SMOTE-like methods because the data were significantly class-imbalanced.
Among 34,861 patients, 1,265 (3.62%) had bone metastasis, and 841 were eligible for the three-month prognostic model. Independent risk factors for BM in HCC patients were sex, Race, lung metastasis, tumor size, stage of disease, and N stage .Independent three-month prognostic factors for HCC patients with bone metastasis were radiotherapy, chemotherapy, surgery, and lung metastasis. Our experiment showed that using the RF on the pure data (i.e. without resampling) is the best choice (AUC=0.9999, accuracy=0.9999, F1=0.9992) if compared to ANN(AUC=0.99107, accuracy=0.94027, F1=0.54876). However, when using the EE classifier the prediction results were perfect and approaching one, particularly when using one-hot encoding compared to the normal label encoding. Similar results were achieved for the 3-month mortality risk model.
Our findings may help clinicians make an early diagnosis and enhance patients' short-term prognosis. EE classifier is an optimal approach for predicting the risk of bone metastasis and three-month mortality risk factors in HCC patients using SEER class imbalanced data. This approach may be applicable to other medical class imbalanced datasets, rather than using questionable SMOTE-like methods.
The authors.
Has not received any funding.
All authors have declared no conflicts of interest.