Welcome to Visited Lingnan Modern Clinics In Surgery, Today is

Lingnan Modern Clinics in Surgery ›› 2020, Vol. 20 ›› Issue (03): 273-279.DOI: 10.3969/j.issn.1009-976X.2020.03.002

• Original Articles and Clinical Research • Previous Articles     Next Articles

Development and validation of prognosis nomogram to predict overall survival in patients with de novo stage Ⅳ breast cancer: a study based on machine learning algorithms

TAN Yu-jie, HE Zi-fan, YU Yun-fang, YAO He-rui   

  1. Department of Medical Oncology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
  • Contact: YAO He-rui, yaoherui@mail.sysu.edu.cn

首诊Ⅳ期乳腺癌生存预测模型建立并验证:一项基于机器算法的研究

谭钰洁, 何子凡, 余运芳, 姚和瑞*   

  1. 中山大学孙逸仙纪念医院肿瘤内科,广州 510120
  • 通讯作者: *姚和瑞,Email: yaoherui@mail.sysu.edu.cn
  • 基金资助:
    国家科技重大专项(2020ZX09201021); 中山大学孙逸仙纪念医院医学人工智能重点培育专项(YXRGZN201902); 国家自然科学基金(81572596, 81972471, U1601223); 广东省自然科学基金(2017A030313828); 广州市科技计划项目(201704020131); 中山大学5010计划临床研究项目(2018007); 中山大学临床研究培育基金(SYS-C-201801); 广东省科学技术厅(2017B030314026)

Abstract: Objective The aim of this study was to construct a prognosis nomogram for patients with de novo stage Ⅳ breast cancer, screening out those who could benefit from locoregional surgery.Methods The clinicopathologic characteristics of 7379 patients with de novo stage Ⅳ breast cancer in SEER database from 1973-2015 were analyzed. Overall survival(OS) was estimated using the Kaplan-Meier method and the log-rank test. Least Absolute Shrinkage and Selector Operation (LASSO) regression analysis were used to screen out the clinicopathologic characteristics which related to the prognosis of patients. The risk score equation was established by multivariate Cox regression analysis and the risk prognosis model was constructed. The predictive accuracy of nomogram was assessed by using operating characteristic curve (ROC) analysis, calculating the area under the curves (AUC), and concordance index (C-index). Results Among 7379 patients included in this study, 2703 patients (36.6%) received locoregional surgery and 4676 patients (63.4%) underwent no surgery. LASSO regression analysis screened out 10 clinicopathologic characteristics (age, histologic type, clinical tumor stage, ER status, PR status, HER-2 status, bone metastasis, liver metastasis, lung metastasis, lymph metastasis) which were independent prognosis factors and could be used to constructed risk model for predicting the prognosis of patients. The model predicted well in 1-year and 3-year OS in development cohort (AUCs for 1-, 3-year OS of 0.75, 0.73, respectively) and validation cohort (AUCs for 1-, 3-year OS of 0.72, 0.75, respectively). C-index of the model was 0.700 (95%CI: 0.69-0.71) and 0.695 (95%CI: 0.67-0.71) respectively in development cohort and validation cohort. According to risk score, patients could divide into low-risk group, medium-risk group, and high-risk group. Kaplan-Meier analyses showed that patients from low-risk and medium-risk group could benefit from locoregional surgery(low-riskgroup: development cohort:HR=0.49, 95%CI:0.42~0.57, P<0.001;validation cohort: HR=0.43, 95%CI: 0.34~0.55, P<0.001; medium-risk group:development cohort: HR=0.75, 95%CI:0.65~0.86, P<0.001; validation cohort:HR=0.72, 95%CI:0.57~0.90, P=0.003), whereas patients underwent locoregional surgery from high-risk group couldn't improve OS(development cohort: HR=0.65, 95%CI: 0.41~1.02, P=0.06; validation cohort: HR=0.83, 95%CI: 0.41~1.69, P=0.61). Conclusion The prognosis nomogram of patients with de novo stage Ⅳ breast cancer was constructed based on machine learning algorithms, which could effectively distinguish patients between low-risk group, medium-risk group, and high-risk group. Moreover, locoregional surgery was not recommended for patients from high-risk group (> 360).

Key words: surgery, breast cancer, LASSO regression analysis, stage Ⅳ

摘要: 目的 建立首诊Ⅳ期乳腺癌的生存预测模型,筛选适合行原发灶手术切除的首诊Ⅳ期乳腺癌中适合行的患者。方法 收集美国国立癌症研究所监测、流行病学和结果(SEER)数据库中1973~2015年间确诊为首诊Ⅳ期乳腺癌患者病例。采用Kaplan Meier法进行生存分析,采用log-rank检验分析比较生存率的差别。利用LASSO回归分析筛选出与患者预后相关的临床病理性特征,进一步利用多因素Cox回归分析建立风险评分(risk score)方程及预测模型,使用受试者工作特征曲线(ROC)曲线下面积(AUC)来评价模型的灵敏度和特异度。结果 本研究共纳入7379例首诊Ⅳ期乳腺癌患者,其中手术患者2703例(36.6%),非手术患者4676例(63.4%)。LASSO回归分析显示年龄、病理类型、肿瘤临床分期、ER状态、PR状态、HER-2状态、骨转移状态、肝转移状态、肺转移状态、淋巴结转移状态是首诊Ⅳ期乳腺癌患者独立预后影响因素。进一步建立首诊Ⅳ期乳腺癌的风险评分和nomogram预后模型,在预测1年和3年总生存中表现出良好的准确性(训练组AUC:1年总生存:0.75,3年总生存:0.73;验证组AUC:1年总生存:0.72,3年总生存:0.75),在训练组及验证组的一致性指数分别为0.700(95%CI:0.69-0.71)、0.695(95%CI:0.67-0.71)。在风险评分中取最佳cutoff值,将患者分为低、中、高危风险评分组,进一步分析发现低危及中危风险评分组患者能从手术获益(低危风险评分组:训练组:HR=0.49,95%CI:0.42-0.57,P<0.001;验证组:HR=0.43,95%CI:0.34-0.55,P<0.001; 中危风险评分组:训练组:HR=0.75,95%CI:0.65-0.86,P<0.001;验证组:HR=0.72,95%CI:0.57-0.90,P=0.003),但高危风险评分组患者则无法从手术获益(训练组:HR=0.65,95%CI:0.41-1.02,P=0.066;验证组:HR=0.83,95%CI:0.41-1.69,P=0.610)。结论 本研究基于机器算法建立首诊Ⅳ期乳腺癌的风险评估模型,能有效区分首诊Ⅳ期乳腺癌低危风险、中危风险和高危风险患者,且不推荐高风险(评分>360)患者进行手术治疗。

关键词: LASSO回归, 期乳腺癌, Ⅳ, 手术

CLC Number: