Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Thorax ; 79(4): 307-315, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38195644

RESUMO

BACKGROUND: Low-dose CT screening can reduce lung cancer-related mortality. However, most screen-detected pulmonary abnormalities do not develop into cancer and it often remains challenging to identify malignant nodules, particularly among indeterminate nodules. We aimed to develop and assess prediction models based on radiological features to discriminate between benign and malignant pulmonary lesions detected on a baseline screen. METHODS: Using four international lung cancer screening studies, we extracted 2060 radiomic features for each of 16 797 nodules (513 malignant) among 6865 participants. After filtering out low-quality radiomic features, 642 radiomic and 9 epidemiological features remained for model development. We used cross-validation and grid search to assess three machine learning (ML) models (eXtreme Gradient Boosted Trees, random forest, least absolute shrinkage and selection operator (LASSO)) for their ability to accurately predict risk of malignancy for pulmonary nodules. We report model performance based on the area under the curve (AUC) and calibration metrics in the held-out test set. RESULTS: The LASSO model yielded the best predictive performance in cross-validation and was fit in the full training set based on optimised hyperparameters. Our radiomics model had a test-set AUC of 0.93 (95% CI 0.90 to 0.96) and outperformed the established Pan-Canadian Early Detection of Lung Cancer model (AUC 0.87, 95% CI 0.85 to 0.89) for nodule assessment. Our model performed well among both solid (AUC 0.93, 95% CI 0.89 to 0.97) and subsolid nodules (AUC 0.91, 95% CI 0.85 to 0.95). CONCLUSIONS: We developed highly accurate ML models based on radiomic and epidemiological features from four international lung cancer screening studies that may be suitable for assessing indeterminate screen-detected pulmonary nodules for risk of malignancy.


Assuntos
Neoplasias Pulmonares , Nódulos Pulmonares Múltiplos , Humanos , Neoplasias Pulmonares/diagnóstico , Detecção Precoce de Câncer , Radiômica , Tomografia Computadorizada por Raios X , Canadá , Nódulos Pulmonares Múltiplos/patologia , Aprendizado de Máquina , Estudos Retrospectivos
2.
Nat Commun ; 15(1): 1014, 2024 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-38307875

RESUMO

A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .


Assuntos
Algoritmos , Aprendizado de Máquina , Tecnologia , Conscientização , Aprendizado de Máquina Supervisionado , Análise de Célula Única
3.
Genome Biol ; 25(1): 159, 2024 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-38886757

RESUMO

BACKGROUND: The advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset? RESULTS: Here, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful. CONCLUSIONS: Supervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.


Assuntos
RNA-Seq , Análise da Expressão Gênica de Célula Única , Animais , Humanos , Análise por Conglomerados , Biologia Computacional/métodos , Aprendizado de Máquina , RNA-Seq/métodos , Análise de Sequência de RNA/métodos , Aprendizado de Máquina Supervisionado
4.
Nat Biotechnol ; 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38429430

RESUMO

Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties-aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.

5.
Genome Biol ; 25(1): 191, 2024 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-39026273

RESUMO

BACKGROUND: The encoding of cell intrinsic drug resistance states in breast cancer reflects the contributions of genomic and non-genomic variations and requires accurate estimation of clonal fitness from co-measurement of transcriptomic and genomic data. Somatic copy number (CN) variation is the dominant mutational mechanism leading to transcriptional variation and notably contributes to platinum chemotherapy resistance cell states. Here, we deploy time series measurements of triple negative breast cancer (TNBC) single-cell transcriptomes, along with co-measured single-cell CN fitness, identifying genomic and transcriptomic mechanisms in drug-associated transcriptional cell states. RESULTS: We present scRNA-seq data (53,641 filtered cells) from serial passaging TNBC patient-derived xenograft (PDX) experiments spanning 2.5 years, matched with genomic single-cell CN data from the same samples. Our findings reveal distinct clonal responses within TNBC tumors exposed to platinum. Clones with high drug fitness undergo clonal sweeps and show subtle transcriptional reversion, while those with weak fitness exhibit dynamic transcription upon drug withdrawal. Pathway analysis highlights convergence on epithelial-mesenchymal transition and cytokine signaling, associated with resistance. Furthermore, pseudotime analysis demonstrates hysteresis in transcriptional reversion, indicating generation of new intermediate transcriptional states upon platinum exposure. CONCLUSIONS: Within a polyclonal tumor, clones with strong genotype-associated fitness under platinum remained fixed, minimizing transcriptional reversion upon drug withdrawal. Conversely, clones with weaker fitness display non-genomic transcriptional plasticity. This suggests CN-associated and CN-independent transcriptional states could both contribute to platinum resistance. The dominance of genomic or non-genomic mechanisms within polyclonal tumors has implications for drug sensitivity, restoration, and re-treatment strategies.


Assuntos
Resistencia a Medicamentos Antineoplásicos , Análise de Célula Única , Transcriptoma , Neoplasias de Mama Triplo Negativas , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Humanos , Animais , Resistencia a Medicamentos Antineoplásicos/genética , Feminino , Camundongos , Variações do Número de Cópias de DNA , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Transição Epitelial-Mesenquimal/genética
6.
NEJM Evid ; 1(5): EVIDe2200062, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-38319201

RESUMO

The Basics of Machine LearningWhen a person is pregnant, a key question is how to establish the "date" of the pregnancy. Classically, the date was based on the last menstrual period (LMP). For the past 3 decades or more, in high-resource countries, this has been done using "hospital-grade" ultrasound machines, with testing performed by trained sonographers. In many parts of the world, neither the machines nor the trained sonographers are accessible. In an article published in NEJM Evidence, Pokaprakarn et al.1 asked whether a low-cost handheld ultrasound device combined with artificial intelligence (AI) could substitute for the expensive machines and trained sonographers.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA