Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(Suppl 1): 153, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627615

RESUMO

BACKGROUND: With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database. RESULTS: We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2. CONCLUSION: This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.


Assuntos
Metagenômica , Viverridae , Animais , Metagenômica/métodos , Redes Neurais de Computação , Metagenoma , Aprendizado de Máquina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
2.
Am J Infect Control ; 51(4): 413-419, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37010998

RESUMO

BACKGROUND: Temporary isolation wards have been introduced to meet demands for airborne-infection-isolation-rooms (AIIRs) during the COVID-19 pandemic. Environmental sampling and outbreak investigation was conducted in temporary isolation wards converted from general wards and/or prefabricated containers, in order to evaluate the ability of such temporary isolation wards to safely manage COVID-19 cases over a period of sustained use. METHODS: Environmental sampling for SARS-CoV-2 RNA was conducted in temporary isolation ward rooms constructed from pre-fabricated containers (N = 20) or converted from normal-pressure general wards (N = 47). Whole genome sequencing (WGS) was utilized to ascertain health care-associated transmission when clusters were reported amongst HCWs working in isolation areas from July 2020 to December 2021. RESULTS: A total of 355 environmental swabs were collected; 22.4% (15/67) of patients had at least one positive environmental sample. Patients housed in temporary isolation ward rooms constructed from pre-fabricated containers (adjusted-odds-ratio, aOR = 10.46, 95% CI = 3.89-58.91, P = .008) had greater odds of detectable environmental contamination, with positive environmental samples obtained from the toilet area (60.0%, 12/20) and patient equipment, including electronic devices used for patient communication (8/20, 40.0%). A single HCW cluster was reported amongst staff working in the temporary isolation ward constructed from pre-fabricated containers; however, health care-associated transmission was deemed unlikely based on WGS and/or epidemiological investigations. CONCLUSION: Environmental contamination with SARS-CoV-2 RNA was observed in temporary isolation wards, particularly from the toilet area and smartphones used for patient communication. However, despite intensive surveillance, no healthcare-associated transmission was detected in temporary isolation wards over 18 months of prolonged usage, demonstrating their capacity for sustained use during succeeding pandemic waves.


Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Pandemias , RNA Viral , Hospitais
3.
Nucleic Acids Res ; 51(D1): D1242-D1248, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36259664

RESUMO

Extensive in vitro cancer drug screening datasets have enabled scientists to identify biomarkers and develop machine learning models for predicting drug sensitivity. While most advancements have focused on omics profiles, cancer drug sensitivity scores precalculated by the original sources are often used as-is, without consideration for variabilities between studies. It is well-known that significant inconsistencies exist between the drug sensitivity scores across datasets due to differences in experimental setups and preprocessing methods used to obtain the sensitivity scores. As a result, many studies opt to focus only on a single dataset, leading to underutilization of available data and a limited interpretation of cancer pharmacogenomics analysis. To overcome these caveats, we have developed CREAMMIST (https://creammist.mtms.dev), an integrative database that enables users to obtain an integrative dose-response curve, to capture uncertainty (or high certainty when multiple datasets well align) across five widely used cancer cell-line drug-response datasets. We utilized the Bayesian framework to systematically integrate all available dose-response values across datasets (>14 millions dose-response data points). CREAMMIST provides easy-to-use statistics derived from the integrative dose-response curves for various downstream analyses such as identifying biomarkers, selecting drug concentrations for experiments, and training robust machine learning models.


Assuntos
Antineoplásicos , Bases de Dados Factuais , Neoplasias , Humanos , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Teorema de Bayes , Biomarcadores , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética
5.
Infect Control Hosp Epidemiol ; 44(6): 1014-1018, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35473629

RESUMO

Sporadic clusters of healthcare-associated coronavirus disease 2019 (COVID-19) occurred despite intense rostered routine surveillance and a highly vaccinated healthcare worker (HCW) population, during a community surge of the severe acute respiratory coronavirus virus 2 (SARS-CoV-2) B.1.617.2 δ (delta) variant. Genomic analysis facilitated timely cluster detection and uncovered additional linkages via HCWs moving between clinical areas and among HCWs sharing a common lunch area, enabling early intervention.


Assuntos
COVID-19 , Viroses , Humanos , SARS-CoV-2/genética , Hospitais
6.
Front Med (Lausanne) ; 9: 964640, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35979220

RESUMO

Shigella flexneri is a major diarrhoeal pathogen, and the emergence of multidrug-resistant S. flexneri is of public health concern. We report the detection of a clonal cluster of multidrug-resistant serotype 1c (7a) S. flexneri in Singapore in April 2022. Long-read whole-genome sequence analysis found five S. flexneri isolates to be clonal and harboring the extended-spectrum ß-lactamases bla CTX-M-15 and bla TEM-1. The isolates were phenotypically resistant to ceftriaxone and had intermediate susceptibility to ciprofloxacin. The S. flexneri clonal cluster was first detected in a tertiary hospital diagnostic laboratory (sentinel-site), to which the S. flexneri isolates were sent from other hospitals for routine serogrouping. Long-read whole-genome sequence analysis was performed in the sentinel-site near real-time in view of the unusually high number of S. flexneri isolates received within a short time frame. This study demonstrates that near real-time sentinel-site sequence-based surveillance of convenience samples can detect possible clonal outbreak clusters and may provide alerts useful for public health mitigations at the earliest possible opportunity.

7.
Microbiol Spectr ; 10(3): e0079122, 2022 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-35543562

RESUMO

Immunocompromised hosts with prolonged severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections have been implicated in the emergence of highly mutated SARS-CoV-2 variants. Spike mutations are of particular concern because the spike protein is a key target for vaccines and therapeutics for SARS-CoV-2. Here, we report the emergence of spike mutations in two immunocompromised patients with persistent SARS-CoV-2 reverse transcription (RT)-PCR positivity (>90 days). Whole-genome sequence analysis of samples obtained before and after coronavirus disease 2019 (COVID-19) treatment demonstrated the development of partial therapeutic escape mutations and increased intrahost SARS-CoV-2 genome diversity over time. This case series thus adds to the accumulating evidence that immunocompromised hosts with persistent infections are important sources of SARS-CoV-2 genome diversity and, in particular, clinically important spike protein diversity. IMPORTANCE The emergence of clinically important mutations described in this report highlights the need for sustained vigilance and containment measures when managing immunocompromised patients with persistent COVID-19. Even as jurisdictions across the globe start lifting pandemic control measures, immunocompromised patients with persistent COVID-19 constitute a unique group that requires close genomic monitoring and enhanced infection control measures, to ensure early detection and containment of mutations and variants of therapeutic and public health importance.


Assuntos
COVID-19 , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus , COVID-19/virologia , Humanos , Hospedeiro Imunocomprometido , Mutação , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genética
8.
Microbiol Spectr ; 10(1): e0222321, 2022 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-35019683

RESUMO

Rapid onsite whole-genome sequencing of two suspected severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) N gene diagnostic escape samples revealed a previously unreported N gene point mutation at genome position 29195. Because the G29195T mutation occurs within a region probed by a commonly referenced U.S. CDC N gene reverse transcription (RT)-PCR assay, we hypothesize that the G29195T mutation rendered the N gene target of a proprietary commercial assay undetectable. The putative diagnostic escape G29195T mutation demonstrates the need for nearly real-time surveillance, as emergence of a novel SARS-CoV-2 variant with the potential to escape diagnostic tests continues to be a threat. IMPORTANCE Accurate diagnostic detection of SARS-CoV-2 currently depends on the large-scale deployment of RT-PCR assays. SARS-CoV-2 RT-PCR assays target predetermined regions in the viral genomes by complementary binding of primers and probes to nucleic acid sequences in the clinical samples. Potential diagnostic escapes, such as those of clinical samples harboring the G29195T mutation, may result in false-negative SARS-CoV-2 RT-PCR results. The rapid detection and sharing of potential diagnostic escapes are essential for diagnostic laboratories and manufacturers around the world, to optimize their assays as SARS-CoV-2 continues to evolve.


Assuntos
COVID-19/diagnóstico , Mutação Puntual , SARS-CoV-2/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa
9.
Genome Med ; 13(1): 189, 2021 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-34915921

RESUMO

While understanding molecular heterogeneity across patients underpins precision oncology, there is increasing appreciation for taking intra-tumor heterogeneity into account. Based on large-scale analysis of cancer omics datasets, we highlight the importance of intra-tumor transcriptomic heterogeneity (ITTH) for predicting clinical outcomes. Leveraging single-cell RNA-seq (scRNA-seq) with a recommender system (CaDRReS-Sc), we show that heterogeneous gene-expression signatures can predict drug response with high accuracy (80%). Using patient-proximal cell lines, we established the validity of CaDRReS-Sc's monotherapy (Pearson r>0.6) and combinatorial predictions targeting clone-specific vulnerabilities (>10% improvement). Applying CaDRReS-Sc to rapidly expanding scRNA-seq compendiums can serve as in silico screen to accelerate drug-repurposing studies. Availability: https://github.com/CSB5/CaDRReS-Sc .


Assuntos
Neoplasias , Transcriptoma , Células Clonais , Perfilação da Expressão Gênica , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Medicina de Precisão , Análise de Sequência de RNA , Análise de Célula Única , Software
10.
Front Med (Lausanne) ; 8: 790662, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34970567

RESUMO

Background: The ongoing COVID-19 pandemic is a global health crisis caused by the spread of SARS-CoV-2. Establishing links between known cases is crucial for the containment of COVID-19. In the healthcare setting, the ability to rapidly identify potential healthcare-associated COVID-19 clusters is critical for healthcare worker and patient safety. Increasing sequencing technology accessibility has allowed routine clinical diagnostic laboratories to sequence SARS-CoV-2 in clinical samples. However, these laboratories often lack specialized informatics skills required for sequence analysis. Therefore, an on-site, intuitive sequence analysis tool that enables clinical laboratory users to analyze multiple genomes and derive clinically relevant information within an actionable timeframe is needed. Results: We propose CalmBelt, an integrated framework for on-site whole genome characterization and outbreak tracking. Nanopore sequencing technology enables on-site sequencing and construction of draft genomes for multiple SARS-CoV-2 samples within 12 h. CalmBelt's interactive interface allows users to analyse multiple SARS-CoV-2 genomes by utilizing whole genome information, collection date, and additional information such as predefined potential clusters from epidemiological investigations. CalmBelt also integrates established SARS-CoV-2 nomenclature assignments, GISAID clades and PANGO lineages, allowing users to visualize relatedness between samples together with the nomenclatures. We demonstrated multiple use cases including investigation of potential hospital transmission, mining transmission patterns in a large outbreak, and monitoring possible diagnostic-escape. Conclusions: This paper presents an on-site rapid framework for SARS-CoV-2 whole genome characterization. CalmBelt interactive web application allows non-technical users, such as routine clinical laboratory users in hospitals to determine SARS-CoV-2 variants of concern, as well as investigate the presence of potential transmission clusters. The framework is designed to be compatible with routine usage in clinical laboratories as it only requires readily available sample data, and generates information that impacts immediate infection control mitigations.

11.
Bioinformatics ; 37(Supplement_1): i76-i83, 2021 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-34000002

RESUMO

MOTIVATION: Large-scale cancer omics studies have highlighted the diversity of patient molecular profiles and the importance of leveraging this information to deliver the right drug to the right patient at the right time. Key challenges in learning predictive models for this include the high-dimensionality of omics data and heterogeneity in biological and clinical factors affecting patient response. The use of multi-task learning techniques has been widely explored to address dataset limitations for in vitro drug response models, while domain adaptation (DA) has been employed to extend them to predict in vivo response. In both of these transfer learning settings, noisy data for some tasks (or domains) can substantially reduce the performance for others compared to single-task (domain) learners, i.e. lead to negative transfer (NT). RESULTS: We describe a novel multi-task unsupervised DA method (TUGDA) that addresses these limitations in a unified framework by quantifying uncertainty in predictors and weighting their influence on shared feature representations. TUGDA's ability to rely more on predictors with low-uncertainty allowed it to notably reduce cases of NT for in vitro models (94% overall) compared to state-of-the-art methods. For DA to in vivo settings, TUGDA improved over previous methods for patient-derived xenografts (9 out of 14 drugs) as well as patient datasets (significant associations in 9 out of 22 drugs). TUGDA's ability to avoid NT thus provides a key capability as we try to integrate diverse drug-response datasets to build consistent predictive models with in vivo utility. AVAILABILITYAND IMPLEMENTATION: https://github.com/CSB5/TUGDA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
Bioinformatics ; 34(22): 3907-3914, 2018 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-29868820

RESUMO

Motivation: As we move toward an era of precision medicine, the ability to predict patient-specific drug responses in cancer based on molecular information such as gene expression data represents both an opportunity and a challenge. In particular, methods are needed that can accommodate the high-dimensionality of data to learn interpretable models capturing drug response mechanisms, as well as providing robust predictions across datasets. Results: We propose a method based on ideas from 'recommender systems' (CaDRReS) that predicts cancer drug responses for unseen cell-lines/patients based on learning projections for drugs and cell-lines into a latent 'pharmacogenomic' space. Comparisons with other proposed approaches for this problem based on large public datasets (CCLE and GDSC) show that CaDRReS provides consistently good models and robust predictions even across unseen patient-derived cell-line datasets. Analysis of the pharmacogenomic spaces inferred by CaDRReS also suggests that they can be used to understand drug mechanisms, identify cellular subtypes and further characterize drug-pathway associations. Availability and implementation: Source code and datasets are available at https://github.com/CSB5/CaDRReS. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Antineoplásicos/uso terapêutico , Neoplasias , Humanos , Neoplasias/tratamento farmacológico , Farmacogenética , Medicina de Precisão , Software
14.
Cell ; 173(2): 305-320.e10, 2018 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-29625049

RESUMO

The Cancer Genome Atlas (TCGA) has catalyzed systematic characterization of diverse genomic alterations underlying human cancers. At this historic junction marking the completion of genomic characterization of over 11,000 tumors from 33 cancer types, we present our current understanding of the molecular processes governing oncogenesis. We illustrate our insights into cancer through synthesis of the findings of the TCGA PanCancer Atlas project on three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the microenvironment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing.


Assuntos
Carcinogênese/genética , Genômica , Neoplasias/patologia , Reparo do DNA/genética , Bases de Dados Genéticas , Genes Neoplásicos , Humanos , Redes e Vias Metabólicas/genética , Instabilidade de Microssatélites , Mutação , Neoplasias/genética , Neoplasias/imunologia , Transcriptoma , Microambiente Tumoral/genética
15.
Cell ; 173(2): 371-385.e18, 2018 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-29625053

RESUMO

Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.


Assuntos
Neoplasias/patologia , Algoritmos , Antígeno B7-H1/genética , Biologia Computacional , Bases de Dados Genéticas , Entropia , Humanos , Instabilidade de Microssatélites , Mutação , Neoplasias/genética , Neoplasias/imunologia , Análise de Componente Principal , Receptor de Morte Celular Programada 1/genética
16.
Cancer Res ; 78(1): 290-301, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29259006

RESUMO

Existing cancer driver prediction methods are based on very different assumptions and each of them can detect only a particular subset of driver genes. Here we perform a comprehensive assessment of 18 driver prediction methods on more than 3,400 tumor samples from 15 cancer types, all to determine their suitability in guiding precision medicine efforts. We categorized these methods into five groups: functional impact on proteins in general (FI) or specific to cancer (FIC), cohort-based analysis for recurrent mutations (CBA), mutations with expression correlation (MEC), and methods that use gene interaction network-based analysis (INA). The performance of driver prediction methods varied considerably, with concordance with a gold standard varying from 9% to 68%. FI methods showed relatively poor performance (concordance <22%), while CBA methods provided conservative results but required large sample sizes for high sensitivity. INA methods, through the integration of genomic and transcriptomic data, and FIC methods, by training cancer-specific models, provided the best trade-off between sensitivity and specificity. As the methods were found to predict different subsets of driver genes, we propose a novel consensus-based approach, ConsensusDriver, which significantly improves the quality of predictions (20% increase in sensitivity) in patient subgroups or even individual patients. Consensus-based methods like ConsensusDriver promise to harness the strengths of different driver prediction paradigms.Significance: These findings assess state-of-the-art cancer driver prediction methods and develop a new and improved consensus-based approach for use in precision oncology. Cancer Res; 78(1); 290-301. ©2017 AACR.


Assuntos
Algoritmos , Neoplasias/genética , Medicina de Precisão/métodos , Biologia Computacional/métodos , Genes , Humanos , Mutação , Transcriptoma
17.
BMC Genomics ; 16 Suppl 11: S4, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26576648

RESUMO

BACKGROUND: Traditional approaches to studying molecular networks are based on linking genes or proteins. Higher-level networks linking gene sets or pathways have been proposed recently. Several types of gene set networks have been used to study complex molecular networks such as co-membership gene set networks (M-GSNs) and co-enrichment gene set networks (E-GSNs). Gene set networks are useful for studying biological mechanism of diseases and drug perturbations. RESULTS: In this study, we proposed a new approach for constructing directed, regulatory gene set networks (R-GSNs) to reveal novel relationships among gene sets or pathways. We collected several gene set collections and high-quality gene regulation data in order to construct R-GSNs in a comparative study with co-membership gene set networks (M-GSNs). We described a method for constructing both global and disease-specific R-GSNs and determining their significance. To demonstrate the potential applications to disease biology studies, we constructed and analysed an R-GSN specifically built for Alzheimer's disease. CONCLUSIONS: R-GSNs can provide new biological insights complementary to those derived at the protein regulatory network level or M-GSNs. When integrated properly to functional genomics data, R-GSNs can help enable future research on systems biology and translational bioinformatics.


Assuntos
Redes Reguladoras de Genes , Biologia de Sistemas/métodos , Doença de Alzheimer/genética , Bases de Dados Genéticas , Humanos
18.
Bioinformatics ; 31(12): i250-7, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26072489

RESUMO

In this article, we described a new database framework to perform integrative "gene-set, network, and pathway analysis" (GNPA). In this framework, we integrated heterogeneous data on pathways, annotated list, and gene-sets (PAGs) into a PAG electronic repository (PAGER). PAGs in the PAGER database are organized into P-type, A-type and G-type PAGs with a three-letter-code standard naming convention. The PAGER database currently compiles 44 313 genes from 5 species including human, 38 663 PAGs, 324 830 gene-gene relationships and two types of 3 174 323 PAG-PAG regulatory relationships-co-membership based and regulatory relationship based. To help users assess each PAG's biological relevance, we developed a cohesion measure called Cohesion Coefficient (CoCo), which is capable of disambiguating between biologically significant PAGs and random PAGs with an area-under-curve performance of 0.98. PAGER database was set up to help users to search and retrieve PAGs from its online web interface. PAGER enable advanced users to build PAG-PAG regulatory networks that provide complementary biological insights not found in gene set analysis or individual gene network analysis. We provide a case study using cancer functional genomics data sets to demonstrate how integrative GNPA help improve network biology data coverage and therefore biological interpretability. The PAGER database can be accessible openly at http://discovery.informatics.iupui.edu/PAGER/.


Assuntos
Bases de Dados Genéticas , Redes Reguladoras de Genes , Humanos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...