Back to Journals » Journal of Inflammation Research » Volume 17

Development of Biomarkers and Prognosis Model of Mortality Risk in Patients with COVID-19

Authors Zhang Z, Tang L , Guo Y, Guo X , Pan Z, Ji X, Gao C 

Received 12 November 2023

Accepted for publication 4 April 2024

Published 22 April 2024 Volume 2024:17 Pages 2445—2457

DOI https://doi.org/10.2147/JIR.S449497

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 3

Editor who approved publication: Dr Adam D Bachstetter



Zhishuo Zhang,1,* Lujia Tang,1,* Yiran Guo,1 Xin Guo,2 Zhiying Pan,2 Xiaojing Ji,1,* Chengjin Gao1,*

1Department of Emergency, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China; 2School of Information Science and Technology, Sanda University, Shanghai, Pudong District, 201209, China

*These authors contributed equally to this work

Correspondence: Xiaojing Ji; Chengjin Gao, Email [email protected]; [email protected]

Background: As of 30 April 2023, the COVID-19 pandemic has resulted in over 6.9 million deaths worldwide. The virus continues to spread and mutate, leading to continuously evolving pathological and physiological processes. It is imperative to reevaluate predictive factors for identifying the risk of early disease progression.
Methods: A retrospective study was conducted on a cohort of 1379 COVID-19 patients who were discharged from Xin Hua Hospital affiliated with Shanghai Jiao Tong University School of Medicine between 15 December 2022 and 15 February 2023. Patient symptoms, comorbidities, demographics, vital signs, and laboratory test results were systematically documented. The dataset was split into testing and training sets, and 15 different machine learning algorithms were employed to construct prediction models. These models were assessed for accuracy and area under the receiver operating characteristic curve (AUROC), and the best-performing model was selected for further analysis.
Results: AUROC for models generated by 15 machine learning algorithms all exceeded 90%, and the accuracy of 10 of them also surpassed 90%. Light Gradient Boosting model emerged as the optimal choice, with accuracy of 0.928 ± 0.0006 and an AUROC of 0.976 ± 0.0028. Notably, the factors with the greatest impact on in-hospital mortality were growth stimulation expressed gene 2 (ST2,19.3%), interleukin-8 (IL-8,17.2%), interleukin-6 (IL-6,6.4%), age (6.1%), NT-proBNP (5.1%), interleukin-2 receptor (IL-2R, 5%), troponin I (TNI,4.6%), congestive heart failure (3.3%) in Light Gradient Boosting model.
Conclusion: ST-2, IL-8, IL-6, NT-proBNP, IL-2R, TNI, age and congestive heart failure were significant predictors of in-hospital mortality among COVID-19 patients.

Keywords: COVID-19, machine learning, prognosis model, ST2, IL-8, TNI, IL-6, IL-2R, congestive heart failure

Introduction

It has been three years since the World Health Organization announced the definition of Coronavirus Disease 2019 (COVID-19) on 30 January 2020. The main clinical manifestations of patients with COVID-19 include fever, cough, pharyngalgia, etc. Some of these patients may develop into acute respiratory distress syndrome (ARDS) and multiple organ dysfunction syndrome (MODS), especially the elders or those patients with underlying diseases which can lead to fatal outcomes. As of 30 April 2023, the COVID-19 pandemic has resulted in over 6.9 million deaths worldwide. In December 2022, the Chinese government issued the new policy for the management of COVID-19. This policy led to a sudden increase in the number of newly infected patients, posing significant challenges to clinical work and experience.1 The statistics information about COVID-19 can be downloaded at: https://covid19.who.int/. Low lymphocyte (LYM) and white-blood-cell (WBC), high CRP and ferritin were effective in the diagnosis of COVID-19.2,3 Indicators were found that high ESR, international-normalized-ratio (INR), prothrombin-time (PT), CRP, D-dimer, ferritin and red-cell-distribution width (RDW) values, respectively, were the most effective predictive mortality risk biomarkers of COVID-19.4 The accurate and reliable estimation of Oxidant/Antioxidant levels in COVID-19 patients, utilizing biomarkers such as LYM, ferritin, D-dimer, WBC, and CRP, can facilitate the diagnosis and prognosis.5

Most studies on COVID-19 were conducted during the initial outbreak, in which virus frequently mutated in the widespread, and genetic recombination occurs when different subtypes infect the human body. Gene mutation or recombination can affect virus’s biological characteristics,6,7 influencing the whole pathophysiological process. Patients with severe pneumonia caused by COVID-19 infection can be challenging to reverse once the case develops into ARDS, so early identifying cases who may progress to severe are crucial. This study aims to statistically analyze the clinical data of infected cases during the special period, summarize clinical treatment experiences, and identify factors that may affect the prognosis, understand the characteristics of current viral infection process and provide evidence-based guidance for later treatment and early key risk factors.

In recent years, we have observed a significant increase in the utilization of artificial intelligence technologies in various fields, especially in the field of medicine.8 Artificial intelligence technologies have started to be used frequently in the diagnosis, prognosis and treatment processes of diseases. The most important reason for this is the machine learning (ML) algorithms have the power to reveal hidden relationship structures between features.9–11 When the literature is reviewed, there are many attempts using the ML methods to predict the diagnosis and mortality of COVID-19.12–15Considering that many clinical data have the characteristics of non-linearity, complexity and heterogeneity, this study tries to utilize machine learning to identify essential prognostic factors. Ultimately, the result may remind physicians which patient should be pay attention to guide clinical treatment.

Materials and Methods

Participants

This study collected clinical data of all patients diagnosed with COVID-19 and hospitalized in Xinhua Hospital Affiliated to Shanghai Jiao Tong University from December 15, 2022 to January 15, 2023. The inclusion criteria were (1) the patient had a positive nucleic acid result or a positive antigen test at the first visit (2) hospitalization (3) sufficient clinical data. The exclusion criteria were (1) patients who required hospitalization for reasons other than the respiratory system (2) failure to complete treatment or patients who were discharged voluntarily (3) under 18-year-old. Finally, a total of 1379 cases were included. Flowchart is shown in Figure 1.

Figure 1 (a) Flowchart showing the exclusion and enrolment of COVID- 19 patients; (b) Flowchart of model development and performance by machine learning method. XH hospital: Xinhua Hospital Affiliated to Shanghai Jiao Tong University.

Abbreviations: Labs, Laboratory tests; SHAP, SHapley Additive exPlanations.

Data Collection

The study collected clinical data under the guidance of a multidisciplinary team of experienced clinicians and informaticists. Data collection involved 45 characteristics, including age, sex, past medical history (Charlson comorbidity index, congestive heart failure, chronic pulmonary disease, rheumatic disease, renal disease, liver disease, diabetes), length of stay, prognosis and laboratory tests including C-reactive protein (CRP), white blood cell (WBC), neutrophil number (ne_num), lymphocyte number (ly_num), monocyte number (mo_num), hemoglobin (Hgb), red blood cell distribution width (rdw), thrombocyte (plt), thrombocyte distribution width (pdw), partial pressure of carbon dioxide (pCO2), apolipoprotein E (APOE), low-density lipoprotein (LDLC), high-density lipoprotein (HDLC), total cholesterol (TCH), triglyceride (TG), troponin I (TNI), N-terminal pro-brain natriuretic peptide (NT-proBNP), international normalized ratio (INR), activated partial thromboplastin time (APTT), fibrinogen (Fb), D-dimer (DD), total protein (tp), albumin (alb), total bilirubin (tbil), alanine transaminase (alt), aspartate transaminase (ast), creatinine (crea), interleukin-8 (IL-8), Interleukin-1β (IL-1β), interleukin-6 (IL-6) tumor necrosis factor (TNF), interleukin-2 receptor (IL-2R), growth stimulation expressed gene (ST2). Biochemical tests were analyzed by the Roche Cobas E702 Fully Automatic Biochemical Analyzer (Roche, Germany). Sysmex XS-1000i Hematology System (Sysmex Corporation, Kobe, Japan) was used to carry out cell blood count. Researchers independently entered and double-checked the data. The first test results at admission were selected as eigenvalues for many examinations to ensure the accuracy of prognosis prediction.

Data Preprocessing

The K-Nearest Neighbor method (R version 4.2.1, package: DMwR2) was employed to fill the missing data.16 The reference variable “age” was subsequently employed. Age values exceeding three times the standard deviation of the mean squared were considered outliers and excluded from the analysis. In total, 1369 patients were incorporated into the dataset.

Model Development

We divided the patient ‘s final outcome at discharge into two groups that are death or survival. The whole data set was randomly divided into training set (80%) and test set (20%). Because the training set is divided by random values, in order to avoid contingency, we repeated the modeling of each algorithm 10 times, and took the average and variance of the 10 results as the final result. We have built a total of 15 models, their brief introduction is as follows: (1) Light Gradient Boosting Machine (LightGBM) is a gradient boosting framework based on decision trees, utilizing a histogram-based algorithm and a leaf-wise tree growth strategy to reduce memory consumption and enhance computational efficiency. Its features include efficient handling of categorical features and support for large-scale data training through distributed learning. (2) Ridge Regression is a technique that incorporates an L2 norm regularization term into the linear regression loss function to address multicollinearity problems and control model complexity. This method balances the ability to fit data with the need to keep model coefficients small by adjusting the regularization parameter. (3) Logistic Regression is a widely used linear model for binary classification problems, employing a sigmoid function to map the output of the linear model to the (0,1) interval, representing probabilities. The parameters of the logistic regression model are estimated using the Maximum Likelihood Estimation (MLE) method. (4) Adaptive Boosting (AdaBoost) algorithm iteratively trains a series of weak learners, increasing the weight of samples that were misclassified in the previous round, thereby focusing subsequent learners on these difficult-to-classify samples to improve the overall model performance. (5) CatBoost is a gradient boosting decision tree algorithm optimized for categorical feature handling. It reduces overfitting and enhances model accuracy through unique ordered boosting techniques and efficient encoding of categorical features. (6) Linear Discriminant Analysis is a classic linear classification method that finds the best linear classification plane by maximizing the criterion of between-class distance and minimizing the within-class variance. It also serves as a dimensionality reduction technique, representing data through the most discriminative linear combination.

(7) Decision Trees are a non-parametric supervised learning method used for classification and regression tasks. They recursively partition the dataset to build a tree structure, where each node represents a decision rule based on an attribute, and leaf nodes represent decision outcomes. (8) The Extremely Randomized Trees (Extra-Trees) algorithm introduces randomness in the training process of decision trees by choosing random features and split points to grow trees, thereby increasing model diversity and reducing the risk of overfitting. (9) Random Forest is an ensemble learning method that improves prediction accuracy and stability by constructing multiple decision trees and aggregating their predictions. It introduces bootstrap sampling of the samples and random selection of features to increase the independence of decision trees. (10) Extreme Gradient Boosting (XGBoost) is an efficient gradient boosting algorithm that prevents overfitting through optimized computational resource usage and advanced regularization techniques, such as L1 and L2 regularization. It offers advanced features for parallel processing, tree pruning, and automatic handling of missing values. (11) Multi-Layer Perceptron (MLP) is a type of feedforward artificial neural network, modeling complex nonlinear relationships through multiple layers of nodes (or neurons) and nonlinear activation functions. MLPs consist of at least three layers: an input layer, one or more hidden layers, and an output layer. (12) The K-Nearest Neighbor algorithm is an instance-based learning method that classifies or regresses by measuring distances between different feature values. For a given query point, the algorithm identifies the K nearest neighbors in the training data and predicts the outcome based on these neighbors’ information. (13) Gradient Boosting is an ensemble technique that incrementally adds predictors and optimizes them to reduce the overall model’s loss. It models the residuals at each step to gradually enhance the model’s predictive capability. (14) The naive Bayes classifier is based on Bayes’ theorem and assumes independence among features. Despite this assumption often being unrealistic in practice, naive Bayes can still deliver strong performance in various scenarios, especially in text classification and spam detection. (15) Support Vector Machine (SVM) is a powerful classification technique that finds the optimal hyperplane for separating different categories. SVMs with Radial Basis Function (RBF) kernels address classification problems in nonlinear feature spaces by mapping the input space to a higher-dimensional feature space to find the optimal separating hyperplane.15,17

Model Evaluation

Accurate value is used to evaluate the accuracy of the model in the prediction task, which represents the ratio of the number of samples correctly predicted by the model to the total number of samples. For example, if a decision tree model correctly predicts 80 samples out of 100 samples, the accuracy is 80%. ROC curve is a visual index to evaluate the performance of the model. Its main function is to judge the recognition ability of a classifier to samples at each threshold. AUC is the area under the ROC curve. The higher the AUC value, the better the classification effect of the machine learning algorithm.

In machine learning and data mining tasks, by ranking the importance of features, we can determine which features play a key role in the prediction of target variables, thereby simplifying the model and reducing computational complexity, so that we can better understand potential risk factors and incorporate them into the decision-making process.

SHapley Additive exPlanations (SHAP) is a method used to interpret the prediction results of machine learning models. It bases on the Shapley value in game theory, each sample is treated as a player in the game, the combination of each eigenvalues is regarded as the cooperation strategy of the participants, and the marginal contribution of features can be measured. We plotted the SHAP value of each feature of each sample to visually show how these variables affect the prognosis of COVID-19 patients.

The confusion matrix allows us to further evaluate the performance of the model. TP (True Positive) is the number of positive examples that the model predicts correctly, FN (False Negative) is the number of positive cases predicted by the model is wrong, FP (False Positive) is the number of negative cases predicted by the model is wrong, TN (True Negative) is the number of negative cases predicted by the model is correct. On this basis, F1 score, sensitivity and specificity are calculated to comprehensively evaluate the model, and the calculation formula is shown:

(1) accuracy = (TP + TN)/(TP + FP + FN + TN), (2) sensitivity/R = TPR = TP/(TP + FN), (3) specificity = TN/(FP + TN), (4) precision/P = TP/(TP + FP), (5) F1 score = 2PR/(P + R).

Statistical Analysis

Continuous variables are presented as median (interquartile ranges [IQR]), while categorical variables as counts and percentages (%). Python version 3.7.16 was used to program. Train_test_split function from sklearn.model_selection module was used to split the data proportionally into test and validation sets. Accuracy_score, roc_auc_score, f1_score, and confusion_matrix function from sklearn.metrics module were used to generate accuracy value, AUC, f1 score, precision, specificity, and sensitivity, respectively.

Results

Characteristics of Participants

The study collected the data of patients who were diagnosed as COVID-19 infection and hospitalized in Xinhua Hospital affiliated to Shanghai Jiaotong University from December 15, 2022 to January 15, 2023. The patients with nucleic acid negative at admission were excluded. Finally, 1379 cases were included in the study. The results showed that the median length of stay was 10 days, with the longest length stay lasting 58 days, and the mortality rate was 11% (Table 1). We additionally extracted data on deceased patients into a separate table (Table 2). It showed the proportion of patients with heart disease increased significantly, and the male-to-female ratio remained basically unchanged.

Table 1 Baseline Characteristics of All Individuals

Table 2 Baseline Characteristics of the Non-Survivors

Model Performance

Fifteen different machine learning algorithms were employed to construct prediction models, and the dataset was split into testing and training sets to evaluate the model. All AUROC for models generated by 15 machine learning algorithms exceeded 90%, and the accuracy of 10 of them also surpassed 90% (Table 3, Figure 2). It shows that these 15 algorithms can accurately predict the prognosis of COVID-19’s patients. The Light Gradient Boosting model emerged as the optimal choice, with accuracy of 0.928 ± 0.0006 and an AUROC of 0.976 ± 0.0028. In addition, naive Bayes (0.946 ± 0.0043) and Extreme Gradient Boosting (0.918 ± 0.0013) also have excellent prediction ability. In order to better judge the performance of the model, we use the confusion matrix to evaluate the model. The results are shown in Figure 3 and Table 4. Boosting, a pivotal category of ensemble learning algorithms, operates on the principle of sequentially constructing a series of “weak learners.” These learners are then aggregated to develop a robust final model. The process involves the successive generation of weak learners, each aiming to address the residuals of the cumulative model by aligning with the negative gradient of its loss function. This approach ensures that the introduction of each new learner reduces the overall model’s loss, enhancing predictive accuracy. XGBoost builds upon the foundational concept of boosting, incorporating several enhancements to improve its efficiency and accuracy. These include the use of second-order derivatives for a more precise loss function, regularization terms to prevent overfitting of trees, and block storage structures that facilitate parallel computations. Similarly, LightGBM’s primary advantage lies in its innovative modifications to the training algorithm, which significantly expedite the process and often lead to the development of more effective models. Notably, the XGBoost incorporate inherent algorithms for variable ranking, whereas both LGBM and the naive Bayes model employ a feature_importance function to determine variable significance.

Table 3 Predictive Performance of Models

Table 4 Confusion Matrix Evaluation of Three Models

Figure 2 The receiver operating characteristic curves of 15 different models. X-axis represents false positive rate, y-axis represents true positive rate.

Figure 3 Confusion matrix of the three models, where true positive is 9,14,4 in naïve Bayes model (a), Light Gradient Boosting model (b) and Extreme Gradient Boosting model (c) respectively.

Analyzing Importance of Features Included in Models

A total of 44 features are included in the construction of machine algorithms. The relative importance rank of all 43 variables is analyzed through the best three machine algorithms, Light Gradient Boosting model, naive Bayes and Extreme Gradient Boosting. In Light Gradient Boosting model, ST2 was most contributing to prognosis (19.3%), the second is IL-8 (17.2%), IL-6 (6.4%), age (6.1%), NT-proBNP (5.1%), IL-2R (5%), TNI (4.6%), congestive heart failure (3.3%), the variables that have little effect on prognosis were tumor, rheumatic disease, diabetes; the result in naïve Bayes model shown ST2 is still most contributing to prognosis (18.4%), after that is IL-8 (17.2%), age (7.8%), IL-2R (6.6%), TNI (5%), NT-proBNP (4.1%), IL-6 (3.8%), d-dimer (3.8%), those with a weight of 0 were tumor, rheumatism disease, chronic pulmonary disease, kidney disease and diabetes; The ranking of feature importance in the Extreme Gradient Boosting model showed that ST2 had the highest weight on prognosis (25.6%), followed by IL-8 (14.3%), age (6.6%), TNI (5.7%), congestive heart failure (4.3%), IL-6 (3.5%), AST (3.1%), NT-proBNP (3%), at the end of the ranking, chronic pulmonary disease, rheumatism and tumor made almost no contribution to prognosis (Figure 4). The results showed that the eigenvalues with high weight in the three algorithms were basically the same, the stable type of the model was excellent, and inflammatory factors and the related indexes of cardiac function had great influence on the prognosis.

Figure 4 Rank of importance to prognosis of COVID-19 patients in Light Gradient Boosting model (a), naive Bayes model (b), Extreme Gradient Boosting model (c), x-axis represent proportion of weight, y-axis represent features, st2 is most important feature accounting for 25.6%, 19.3%, 18.4% in the three models respectively. IL-8 is second one accounting for 14.3%, 17.2%, 17.2% in the three models respectively; SHAP value of each feature of every sample in Light Gradient Boosting model (d), naive Bayes model (e), Boosting model (f), red dot represents the positive effect and the blue dot represents the negative effect.

Abbreviations: SHAP, SHapley Additive exPlanations; crp, c-reactive protein; WBC, white blood cell; ne_num, neutrophil number; ly_num, lymphocyte number; mo_num, monocyte number; hgb, hemoglobin; rdw, red blood cell distribution width; plt, thrombocyte; pdw, thrombocyte distribution width; pco2, partial pressure of carbon oxide; apoe, apolipoprotein E; ldlc, low-density lipoprotein; hdlc, high-density lipoprotein; tch, total cholesterol; tg, triglyceride; tni, troponin I; nt-probnp, n-terminal pro-brain natriuretic peptide; INR, international normalized ratio; APTT, activated partial thromboplastin time; fb, fibrinogen; dd, d-dimer; tp, total protein; alb, albumin; tbil, total bilirubin; alt, alanine transaminase; ast, aspartate transaminase; crea: creacreatinine; il-8, interleukin-8; il-1β, interleukin-1β; il-6, interleukin-6; tnf, tumor necrosis factor; il-2r, interleukin-2 receptor; st2, growth stimulation expressed gene.

The honeycomb diagram showed 20 characteristic variables in our models, in which the red dot represents the positive effect and the blue dot represents the negative effect. It can be seen that ST2, IL-8, IL-6, heart disease, TNI, and NT-proBNP have significant effects on the poor prognosis of COVID-19 patients.

Discussion

The study collected the data during a major COVID-19 outbreak in Shanghai, a special period, which has better timeliness than the cases in Wuhan. We improved accuracy and stability by using machine learning algorithms rather than traditional retrospective analysis. The clinical information extracted from this study is easy to obtain, making it suitable for broader applications and highly practical. Most blood test can be easily finished when patients are admitted to hospital even in the emergency department. Physicians can collect all the data and predict patients’ outcomes and intervene the treatment plan upon hospital admission. A limitation of the study is that all cases were from a single-center data cohort (Xinhua Hospital affiliated to Shanghai Jiaotong University), the machine algorithm adopted internal verification, lacked external verification, and had certain limitations and regionality. If conditions permit, we would try to collect multi-center data for external verification in the future.

Key findings of the study included the significance of “ST2”, “IL-8”, “TNI”, “d-dimer”, “age”, “congestive heart failure”, “NT-proBNP”, “IL-6”, “IL-2R” in predicting COVID-19 patients’ prognosis. During a COVID-19 infection, the immune system’s response can sometimes become dysregulated, leading to an excessive release of inflammatory cytokines. This overactive immune response, often referred to as a “cytokine storm”, can result in widespread inflammation in the body and lead to severe respiratory distress and other complications.18,19 IL-8, IL-6 and IL-2R are interleukins, a kind of cytokines, were identified as significant factors in predicting COVID-19 patient outcomes, which was proved in our study. These molecules exert profound biological effects even at low concentrations and play a pivotal role in modulating immune responses and the growth of immune cells.20,21 Diane Marie found that the elevated serum concentration level of IL-6 and IL-8 serve as independent predictors of survival in 1484 patients with COVID-19;22 Bo Diao found that COVID-19 patients requiring ICU treatment exhibited lower IL-6 and IL-10 levels;23 Aiping Ma‘s retrospective study further proved that IL-8 and IL-2R were related to the duration and severity of COVID-19.24 Based on these studies, IL-8, IL-6 and IL-2R are important risk factors which physicians should pay attention to. Our study underscores the importance of monitoring interleukin levels in clinical work, with a particular emphasis on the pronounced significance of IL-8. IL-8, also known as CXC8, can produce chemotaxis to neutrophils by binding to G protein-coupled receptors CXCR1 and CXCR2.25 Researchers observed that neutrophil extracellular traps (NETs) in the interstitium and peribronchiolar epithelium were positively correlated with IL-8 mRNA level in lung tissues of dead patients with COVID-19, significantly higher than non-COVID-19 related ARDS patients.26 Furthermore, abnormal regulation of CXCL8-axis formed by IL-8 (CXCL8) and its receptor CXCR1/CXCR2 was associated with respiratory diseases in other studies. The CXCL8-axis can enhance airway permeability to inflammatory responses by recruiting neutrophils and stimulating airway epithelium27 and also can increase airway hyperresponsiveness.28 We believe that the role played by IL-8 in COVID-19 deserves further exploration. IL-6 has garnered significant attention because it is proved to be associated with the hyperinflammatory response that can occur in severe cases of COVID-19.29–31 Tocilizumab has been employed in the management of COVID-19 patients by disrupting the interaction between IL-6 to its receptor. However, due to the difficulty in obtaining tocilizumab, we only able to administer it in two cases regrettably without achieving a positive reversal in their fatal outcomes. Further clinical studies are needed to ascertain the appropriate timing and actual effect of tocilizumab.

In this study, the prognostic significance of congestive heart failure in COVID-19 patients surpasses that of other factors. Congestive heart failure, in conjunction with crucial biomarkers such as ST2, NT-proBNP, and TNI, acts as pivotal indicators of cardiac function. These parameters suggest that patients with underlying heart disease or when the virus has caused an impact on the heart are at a heightened risk of experiencing adverse outcomes. These indexes have more important clinical significance compared with other organ functions, which strongly resonates with our experiences during the course of clinical treatment. This perspective is further fortified by a lot of preexisting research,32,33 serving to underscore its validity and importance. Rather than only focusing on the heart disease history, we believe that physicians should not ignore the special damage caused by the virus itself to the cardiac system. Physicians must try to reduce the risk factors associated with death during the course of the disease. Currently, the precise mechanisms through which the virus triggers or exacerbates myocardial injury remain unclear. Some researchers believe that the collective basic energy consumption caused by virus infection and the hypoxia environment increases the heart burden, which may be related to angiotensin-converting enzyme 2 (ACE2).34,35 Moreover, electrolyte imbalances and adverse drug reactions may also promote the process of heart failure. Therefore, in clinical treatment, physicians should ensure adequate oxygen supply for patients, use sedatives to reduce energy consumption when necessary, and avoid using negative myodynamia drugs, so as to protect cardiac function and avoid poor prognosis.

Conclusions

The machine model algorithm can accurately predict the prognosis of COVID-19 patients. The CatBoost Classifier model was identified as the optimal choice, with “ST-2”, “IL-8”, “TNI”, “d-dimer”, “Age” and “congestive heart failure” were significant predictors of in-hospital mortality among COVID-19 patients.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Xinhua Hospital Affiliated to Shanghai Jiao Tong University (protocol code XHEC-D-2023-194).

Data Sharing Statement

We encourage all authors of articles published in MDPI journals to share their research data. In this section, please provide details regarding where data supporting reported results can be found, including links to publicly archived datasets analyzed or generated during the study. Where no new data were created, or where data are unavailable due to privacy or ethical restrictions, a statement is still required. Suggested Data Availability Statements are available in section “MDPI Research Data Policies” at https://www.mdpi.com/ethics.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

Key Supporting Subject Researching Project of Shanghai Municipal Health Commission (No. 2023ZDFC0106).

Disclosure

The authors declare no conflicts of interest in this work.

References

1. Huang S, Gao Z, Wang S. China’s COVID-19 reopening measures—warriors and weapons. Lancet. 2023;401(10377):643–644. doi:10.1016/S0140-6736(23)00213-1

2. Huyut MT, İlkbahar F. The effectiveness of blood routine parameters and some biomarkers as a potential diagnostic tool in the diagnosis and prognosis of Covid-19 disease. Int Immunopharmacol. 2021;98:107838. doi:10.1016/j.intimp.2021.107838

3. Mertoglu C, Huyut MT, Arslan Y, Ceylan Y, Coban TA. How do routine laboratory tests change in coronavirus disease 2019? Scand J Clin Lab Invest. 2021;81(1):24–33. doi:10.1080/00365513.2020.1855470

4. Tahir Huyut M, Huyut Z, İlkbahar F, Mertoğlu C. What is the impact and efficacy of routine immunological, biochemical and hematological biomarkers as predictors of COVID-19 mortality? Int Immunopharmacol. 2022;105:108542. doi:10.1016/j.intimp.2022.108542

5. Huyut MT, Huyut Z. Forecasting of oxidant/antioxidant levels of COVID-19 patients by using expert models with biomarkers used in the diagnosis/prognosis of COVID-19. Int Immunopharmacol. 2021;100:108127. doi:10.1016/j.intimp.2021.108127

6. Hirabara SM, Serdan TDA, Gorjao R, et al. SARS-COV-2 variants: differences and potential of immune evasion. Front Cell Infect Microbiol. 2022;11:781429. doi:10.3389/fcimb.2021.781429

7. Badua CLDC, Baldo KAT, Medina PMB. Genomic and proteomic mutation landscapes of SARS‐CoV‐2. J med Virol. 2021;93(3):1702–1721. doi:10.1002/jmv.26548

8. Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20):1920–1930. doi:10.1161/CIRCULATIONAHA.115.001593

9. Mertoglu C, Huyut MT, Olmez H, Tosun M, Kantarci M, Coban TA. COVID-19 is more dangerous for older people and its severity is increasing: a case-control study. Med Gas Res. 2022;12(2):51–54. doi:10.4103/2045-9912.325992

10. Huyut MT, Üstündağ H. Prediction of diagnosis and prognosis of COVID-19 disease by blood gas parameters using decision trees machine learning model: a retrospective observational study. Med Gas Res. 2022;12(2):60–66. doi:10.4103/2045-9912.326002

11. Velichko A, Huyut MT, Belyaev M, Izotov Y, Korzun D. Machine learning sensors for diagnosis of COVID-19 disease using routine blood values for internet of things application. Sensors. 2022;22(20):7886. doi:10.3390/s22207886

12. Huyut MT, Velichko A. Diagnosis and prognosis of COVID-19 disease using routine blood values and lognnet neural network. Sensors. 2022;22(13):4820. doi:10.3390/s22134820

13. Huyut MT, Velichko A, Belyaev M. Detection of risk predictors of COVID-19 mortality with classifier machine learning models operated with routine laboratory biomarkers. Appl Sci. 2022;12(23):12180. doi:10.3390/app122312180

14. Huyut MT, Huyut Z. Effect of ferritin, INR, and D-Dimer immunological parameters levels as predictors of COVID-19 mortality: a strong prediction with the decision trees. Heliyon. 2023;9(3):e14015. doi:10.1016/j.heliyon.2023.e14015

15. Huyut MT. Automatic detection of severely and mildly infected COVID-19 patients with supervised machine learning models. Ing Rech Biomed. 2023;44(1):100725. doi:10.1016/j.irbm.2022.05.006

16. Liao SG, Lin Y, Kang DD, et al. Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC Bioinf. 2014;15(1):346. doi:10.1186/s12859-014-0346-6

17. Adamidi ES, Mitsis K, Nikita KS. Artificial intelligence in clinical care amidst COVID-19 pandemic: a systematic review. Comput. Struct. Biotechnol J. 2021;19:2833–2850. doi:10.1016/j.csbj.2021.05.010

18. Fajgenbaum DC, June CH. Cytokine Storm. N Engl J Med. 2020;383(23):2255–2273. doi:10.1056/NEJMra2026131

19. Hu B, Huang S, Yin L. The cytokine storm and COVID‐19. J med Virol. 2021;93(1):250–256. doi:10.1002/jmv.26232

20. Zhu Q, Kang J, Miao H, et al. Low‐dose cytokine‐induced neutral ceramidase secretion from INS ‐1 cells via exosomes and its anti‐apoptotic effect. FEBS J. 2014;281(12):2861–2870. doi:10.1111/febs.12826

21. Kato A. Group 2 innate lymphoid cells in airway diseases. Chest. 2019;156(1):141–149. doi:10.1016/j.chest.2019.04.101

22. Del Valle DM, Kim-Schulze S, Huang -H-H, et al. An inflammatory cytokine signature predicts COVID-19 severity and survival. Nat Med. 2020;26(10):1636–1643. doi:10.1038/s41591-020-1051-9

23. Diao B, Wang C, Tan Y, et al. Reduction and functional exhaustion of T cells in patients with coronavirus disease 2019 (COVID-19); preprint. Infect Dis. 2020. doi:10.1101/2020.02.18.20024364

24. Ma A, Zhang L, Ye X, et al. High levels of circulating IL-8 and soluble IL-2R are associated with prolonged illness in patients with severe COVID-19. Front Immunol. 2021;12:626235. doi:10.3389/fimmu.2021.626235

25. Ha H, Debnath B, Neamati N. Role of the CXCL8-CXCR1/2 axis in cancer and inflammatory diseases. Theranostics. 2017;7(6):1543–1588. doi:10.7150/thno.15625

26. Melero I, Villalba-Esparza M, Recalde-Zamacona B, et al. Neutrophil extracellular traps, local il-8 expression, and cytotoxic t-lymphocyte response in the lungs of patients with fatal COVID-19. CHEST. 2022;162(5):1006–1016. doi:10.1016/j.chest.2022.06.007

27. Smit JJ, Lukacs NW. The missing link: chemokine receptors and tissue matrix breakdown in COPD. Trends Pharmacol Sci. 2006;27(11):555–557. doi:10.1016/j.tips.2006.09.003

28. Govindaraju V, Michoud M-C, Al-Chalabi M, Ferraro P, Powell WS, Martin JG. Interleukin-8: novel roles in human airway smooth muscle cell contraction and migration. Am J Physiol Cell Physiol. 2006;291(5):C957–C965. doi:10.1152/ajpcell.00451.2005

29. Zizzo G, Tamburello A, Castelnovo L, et al. Immunotherapy of COVID-19: inside and beyond IL-6 Signalling. Front Immunol. 2022;13:795315. doi:10.3389/fimmu.2022.795315

30. Coomes EA, Haghbayan H. Interleukin‐6 in Covid‐19: a systematic review and meta‐analysis. Rev Med Virol. 2020;30(6):1–9. doi:10.1002/rmv.2141

31. McConnell MJ, Kawaguchi N, Kondo R, et al. Liver injury in COVID-19 and IL-6 trans-signaling-induced endotheliopathy. J Hepatol. 2021;75(3):647–658. doi:10.1016/j.jhep.2021.04.050

32. Li X, Pan X, Li Y, et al. Cardiac injury associated with severe disease or ICU admission and death in hospitalized patients with COVID-19: a meta-analysis and systematic review. Crit Care. 2020;24(1):468. doi:10.1186/s13054-020-03183-z

33. He F, Quan Y, Lei M, et al. Clinical features and risk factors for ICU admission in COVID-19 patients with cardiovascular diseases. Aging and Disease. 2020;11(4):763. doi:10.14336/AD.2020.0622

34. Martínez-Gómez LE, Herrera-López B, Martinez-Armenta C, et al. ACE and ACE2 gene variants are associated with severe outcomes of COVID-19 in men. Front Immunol. 2022;13:812940. doi:10.3389/fimmu.2022.812940

35. Wang W, Bodiga S, Das SK, Lo J, Patel V, Oudit GY. Role of ACE2 in diastolic and systolic heart failure. Heart Fail Rev. 2012;17(4–5):683–691. doi:10.1007/s10741-011-9259-x

Creative Commons License © 2024 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.