Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 19(5): e1011001, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37126495

RESUMO

The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization.


Assuntos
Aprendizado Profundo , Metagenoma , Metagenoma/genética , Genômica/métodos , Análise de Sequência de DNA/métodos , Metagenômica , Software
2.
Bioinformatics ; 36(10): 3011-3017, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32096824

RESUMO

MOTIVATION: Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. RESULTS: We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. CONCLUSIONS: DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. AVAILABILITY AND IMPLEMENTATION: DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metagenoma , Software , Bactérias , Simulação por Computador , Metagenômica , Análise de Sequência de DNA
3.
J Bone Miner Res ; 39(8): 1103-1112, 2024 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-38836468

RESUMO

Fracture prediction is essential in managing patients with osteoporosis and is an integral component of many fracture prevention guidelines. We aimed to identify the most relevant clinical fracture risk factors in contemporary populations by training and validating short- and long-term fracture risk prediction models in 2 cohorts. We used traditional and machine learning survival models to predict risks of vertebral, hip, and any fractures on the basis of clinical risk factors, T-scores, and treatment history among participants in a nationwide Swiss Osteoporosis Registry (N = 5944 postmenopausal women, median follow-up of 4.1 yr between January 2015 and October 2022; a total of 1190 fractures during follow-up). The independent validation cohort comprised 5474 postmenopausal women from the UK Biobank with 290 incident fractures during follow-up. Uno's C-index and the time-dependent area under the receiver operating characteristics curve were calculated to evaluate the performance of different machine learning models (Random survival forest and eXtreme Gradient Boosting). In the independent validation set, the C-index was 0.74 [0.58, 0.86] for vertebral fractures, 0.83 [0.7, 0.94] for hip fractures, and 0.63 [0.58, 0.69] for any fractures at year 2, and these values further increased for longer estimations of up to 7 yr. In comparison, the 10-yr fracture probability calculated with FRAX Switzerland was 0.60 [0.55, 0.64] for major osteoporotic fractures and 0.62 [0.49, 0.74] for hip fractures. The most important variables identified with Shapley additive explanations values were age, T-scores, and prior fractures, while number of falls was an important predictor of hip fractures. Performances of both traditional and machine learning models showed similar C-indices. We conclude that fracture risk can be improved by including the lumbar spine T-score, trabecular bone score, numbers of falls and recent fractures, and treatment information has a significant impact on fracture prediction.


Fracture prediction is essential in managing patients with osteoporosis. We developed and validated traditional and machine learning models to predict short- and long-term fracture risk and identify the most relevant clinical fracture risk factors for vertebral, hip, and any fractures in contemporary populations. We used data from 5944 postmenopausal women in a Swiss Osteoporosis Registry and validated our findings with 5474 women from the UK Biobank. Our machine learning models performed well, with C-index values of 0.74 for vertebral fractures, 0.83 for hip fractures, and 0.63 for any fractures at year 2, and these values further increased for longer estimations of up to 7 years. In contrast, FRAX Switzerland had lower C-index values (0.60 for major fractures and 0.62 for hip fracture probabilities over 10 years). Key predictors identified included age, T-scores, prior fractures, and number of falls. We conclude that incorporating a broader range of clinical factors, as well as lumbar spine T-scores, fall history, recent fractures, and treatment information, can improve fracture risk assessments in osteoporosis management. Both traditional and machine learning models showed similar effectiveness in predicting fractures.


Assuntos
Aprendizado de Máquina , Pós-Menopausa , Humanos , Feminino , Reino Unido/epidemiologia , Suíça/epidemiologia , Idoso , Pessoa de Meia-Idade , Estudos Prospectivos , Fatores de Risco , Bancos de Espécimes Biológicos , Medição de Risco , Fraturas Ósseas/epidemiologia , Fraturas por Osteoporose/epidemiologia , Biobanco do Reino Unido
4.
JAMA Cardiol ; 2024 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-39292486

RESUMO

Importance: Risk estimation is an integral part of cardiovascular care. Local recalibration of guideline-recommended models could address the limitations of existing tools. Objective: To provide a machine learning (ML) approach to augment the performance of the American Heart Association's Predicting Risk of Cardiovascular Disease Events (AHA-PREVENT) equations when applied to a local population while preserving clinical interpretability. Design, Setting, and Participants: This cohort study used a New England-based electronic health record cohort of patients without prior atherosclerotic cardiovascular disease (ASCVD) who had the data necessary to calculate the AHA-PREVENT 10-year risk of developing ASCVD in the event period (2007-2016). Patients with prior ASCVD events, death prior to 2007, or age 79 years or older in 2007 were subsequently excluded. The final study population of 95 326 patients was split into 3 nonoverlapping subsets for training, testing, and validation. The AHA-PREVENT model was adapted to this local population using the open-source ML model (MLM) Extreme Gradient Boosting model (XGBoost) with minimal predictor variables, including age, sex, and AHA-PREVENT. The MLM was monotonically constrained to preserve known associations between risk factors and ASCVD risk. Along with sex, race and ethnicity data from the electronic health record were collected to validate the performance of ASCVD risk prediction in subgroups. Data were analyzed from August 2021 to February 2024. Main Outcomes and Measures: Consistent with the AHA-PREVENT model, ASCVD events were defined as the first occurrence of either nonfatal myocardial infarction, coronary artery disease, ischemic stroke, or cardiovascular death. Cardiovascular death was coded via government registries. Discrimination, calibration, and risk reclassification were assessed using the Harrell C index, a modified Hosmer-Lemeshow goodness-of-fit test and calibration curves, and reclassification tables, respectively. Results: In the test set of 38 137 patients (mean [SD] age, 64.8 [6.9] years, 22 708 [59.5]% women and 15 429 [40.5%] men; 935 [2.5%] Asian, 2153 [5.6%] Black, 1414 [3.7%] Hispanic, 31 400 [82.3%] White, and 2235 [5.9%] other, including American Indian, multiple races, unspecified, and unrecorded, consolidated owing to small numbers), MLM-PREVENT had improved calibration (modified Hosmer-Lemeshow P > .05) compared to the AHA-PREVENT model across risk categories in the overall cohort (χ23 = 2.2; P = .53 vs χ23 > 16.3; P < .001) and sex subgroups (men: χ23 = 2.1; P = .55 vs χ23 > 16.3; P < .001; women: χ23 = 6.5; P = .09 vs. χ23 > 16.3; P < .001), while also surpassing a traditional recalibration approach. MLM-PREVENT maintained or improved AHA-PREVENT's calibration in Asian, Black, and White individuals. Both MLM-PREVENT and AHA-PREVENT performed equally well in discriminating risk (approximate ΔC index, ±0.01). Using a clinically significant 7.5% risk threshold, MLM-PREVENT reclassified a total of 11.5% of patients. We visualize the recalibration through MLM-PREVENT ASCVD risk charts that highlight preserved risk associations of the original AHA-PREVENT model. Conclusions and Relevance: The interpretable ML approach presented in this article enhanced the accuracy of the AHA-PREVENT model when applied to a local population while still preserving the risk associations found by the original model. This method has the potential to recalibrate other established risk tools and is implementable in electronic health record systems for improved cardiovascular risk assessment.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA