RESUMO
BACKGROUND & AIMS: Current hepatocellular carcinoma (HCC) risk scores do not reflect changes in HCC risk resulting from liver disease progression/regression over time. We aimed to develop and validate two novel prediction models using multivariate longitudinal data, with or without cell-free DNA (cfDNA) signatures. METHODS: A total of 13,728 patients from two nationwide multicenter prospective observational cohorts, the majority of whom had chronic hepatitis B, were enrolled. aMAP score, as one of the most promising HCC prediction models, was evaluated for each patient. Low-pass whole-genome sequencing was used to derive multi-modal cfDNA fragmentomics features. A longitudinal discriminant analysis algorithm was used to model longitudinal profiles of patient biomarkers and estimate the risk of HCC development. RESULTS: We developed and externally validated two novel HCC prediction models with a greater accuracy, termed aMAP-2 and aMAP-2 Plus scores. The aMAP-2 score, calculated with longitudinal data on the aMAP score and alpha-fetoprotein values during an up to 8-year follow-up, performed superbly in the training and external validation cohorts (AUC 0.83-0.84). The aMAP-2 score showed further improvement and accurately divided aMAP-defined high-risk patients into two groups with 5-year cumulative HCC incidences of 23.4% and 4.1%, respectively (p = 0.0065). The aMAP-2 Plus score, which incorporates cfDNA signatures (nucleosome, fragment and motif scores), optimized the prediction of HCC development, especially for patients with cirrhosis (AUC 0.85-0.89). Importantly, the stepwise approach (aMAP -> aMAP-2 -> aMAP-2 Plus) stratified patients with cirrhosis into two groups, comprising 90% and 10% of the cohort, with an annual HCC incidence of 0.8% and 12.5%, respectively (p <0.0001). CONCLUSIONS: aMAP-2 and aMAP-2 Plus scores are highly accurate in predicting HCC. The stepwise application of aMAP scores provides an improved enrichment strategy, identifying patients at a high risk of HCC, which could effectively guide individualized HCC surveillance. IMPACT AND IMPLICATIONS: In this multicenter nationwide cohort study, we developed and externally validated two novel hepatocellular carcinoma (HCC) risk prediction models (called aMAP-2 and aMAP-2 Plus scores), using longitudinal discriminant analysis algorithm and longitudinal data (i.e., aMAP and alpha-fetoprotein) with or without the addition of cell-free DNA signatures, based on 13,728 patients from 61 centers across mainland China. Our findings demonstrated that the performance of aMAP-2 and aMAP-2 Plus scores was markedly better than the original aMAP score, and any other existing HCC risk scores across all subsets, especially for patients with cirrhosis. More importantly, the stepwise application of aMAP scores (aMAP -> aMAP-2 -> aMAP-2 Plus) provides an improved enrichment strategy, identifying patients at high risk of HCC, which could effectively guide individualized HCC surveillance.
Assuntos
Carcinoma Hepatocelular , Ácidos Nucleicos Livres , Hepatite B Crônica , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/epidemiologia , Carcinoma Hepatocelular/etiologia , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/epidemiologia , Neoplasias Hepáticas/etiologia , alfa-Fetoproteínas , Estudos de Coortes , Cirrose Hepática/diagnóstico , Cirrose Hepática/genética , Cirrose Hepática/complicações , Hepatite B Crônica/complicaçõesRESUMO
Longitudinal discriminant analysis (LoDA) can be used to classify patients into prognostic groups based on their clinical history, which often involves longitudinal measurements of various clinically relevant markers. Patients' longitudinal data is first modelled using multivariate generalised linear mixed models, allowing markers of different types (e.g. continuous, binary, counts) to be modelled simultaneously. We describe three approaches to calculating a patient's posterior group membership probabilities which have been outlined in previous studies, based on the marginal distribution of the longitudinal markers, conditional distribution and distribution of the random effects. Here we compare the three approaches, first using data from the Mayo Primary Biliary Cirrhosis study and then by way of simulation study to explore in which situations each of the three approaches is expected to give the best prediction. We demonstrate situations in which the marginal or random-effects approach perform well, but find that the conditional approach offers little extra information to the random-effects and marginal approaches.
Assuntos
Biometria/métodos , Análise Discriminante , Humanos , Cirrose Hepática Biliar/diagnóstico , Cirrose Hepática Biliar/terapia , Estudos Longitudinais , Modelos Estatísticos , Análise Multivariada , PrognósticoRESUMO
Recently developed methods of longitudinal discriminant analysis allow for classification of subjects into prespecified prognostic groups using longitudinal history of both continuous and discrete biomarkers. The classification uses Bayesian estimates of the group membership probabilities for each prognostic group. These estimates are derived from a multivariate generalised linear mixed model of the biomarker's longitudinal evolution in each of the groups and can be updated each time new data is available for a patient, providing a dynamic (over time) allocation scheme. However, the precision of the estimated group probabilities differs for each patient and also over time. This precision can be assessed by looking at credible intervals for the group membership probabilities. In this paper, we propose a new allocation rule that incorporates credible intervals for use in context of a dynamic longitudinal discriminant analysis and show that this can decrease the number of false positives in a prognostic test, improving the positive predictive value. We also establish that by leaving some patients unclassified for a certain period, the classification accuracy of those patients who are classified can be improved, giving increased confidence to clinicians in their decision making. Finally, we show that determining a stopping rule dynamically can be more accurate than specifying a set time point at which to decide on a patient's status. We illustrate our methodology using data from patients with epilepsy and show how patients who fail to achieve adequate seizure control are more accurately identified using credible intervals compared to existing methods.
Assuntos
Teorema de Bayes , Classificação/métodos , Probabilidade , Simulação por Computador , Tomada de Decisões , Análise Discriminante , Epilepsia/diagnóstico , Epilepsia/terapia , Humanos , Modelos Lineares , Estudos Longitudinais , Análise Multivariada , Prognóstico , Indução de Remissão , Sensibilidade e EspecificidadeRESUMO
Mixed models are a useful way of analysing longitudinal data. Random effects terms allow modelling of patient specific deviations from the overall trend over time. Correlation between repeated measurements are captured by specifying a joint distribution for all random effects in a model. Typically, this joint distribution is assumed to be a multivariate normal distribution. For Gaussian outcomes misspecification of the random effects distribution usually has little impact. However, when the outcome is discrete (e.g. counts or binary outcomes) generalised linear mixed models (GLMMs) are used to analyse longitudinal trends. Opinion is divided about how robust GLMMs are to misspecification of the random effects. Previous work explored the impact of random effects misspecification on the bias of model parameters in single outcome GLMMs. Accepting that these model parameters may be biased, we investigate whether this affects our ability to classify patients into clinical groups using a longitudinal discriminant analysis. We also consider multiple outcomes, which can significantly increase the dimensions of the random effects distribution when modelled simultaneously. We show that when there is severe departure from normality, more flexible mixture distributions can give better classification accuracy. However, in many cases, wrongly assuming a single multivariate normal distribution has little impact on classification accuracy.
Assuntos
Estudos Longitudinais , Viés , Humanos , Modelos LinearesRESUMO
BACKGROUND: Early diagnosis of necrotising enterocolitis (NEC) may improve prognosis but there are no proven biomarkers. OBJECTIVE: To investigate changes in faecal volatile organic compounds (VOCs) as potential biomarkers for NEC. DESIGN: Multicentre prospective study. SETTINGS: 8 UK neonatal units. PATIENTS: Preterm infants <34 weeks gestation. METHODS: Daily faecal samples were collected prospectively from 1326 babies of whom 49 subsequently developed definite NEC. Faecal samples from 32 NEC cases were compared with samples from frequency-matched controls without NEC. Headspace, solid phase microextraction gas chromatography/mass spectrometry was performed and VOCs identified from reference libraries. VOC samples from cases and controls were compared using both discriminant and factor analysis methods. RESULTS: VOCs were found to cluster into nine groups (factors), three were associated with NEC and indicated the possibility of disease up to 3-4 days before the clinical diagnosis was established. For one factor, a 1 SD increase increased the odds of developing NEC by 1.6 times; a similar decrease of the two other factors was associated with a reduced risk (OR 0.5 or 0.7, respectively). Discriminant analyses identified five individual VOCs, which are associated with NEC in babies at risk, each with an area under the receiver operating characteristics curve of 0.75-0.76, up to 4 days before the clinical diagnosis was made. CONCLUSIONS: Faecal VOCs are altered in preterm infants with NEC. These data are currently insufficient to enable reliable cotside detection of babies at risk of developing NEC and further work is needed investigate the role of VOCs in clarifying the aetiology of NEC.