Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 113
Filtrar
Mais filtros

País/Região como assunto
Intervalo de ano de publicação
1.
Nature ; 623(7987): 608-615, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37938768

RESUMO

Cell therapies have yielded durable clinical benefits for patients with cancer, but the risks associated with the development of therapies from manipulated human cells are understudied. For example, we lack a comprehensive understanding of the mechanisms of toxicities observed in patients receiving T cell therapies, including recent reports of encephalitis caused by reactivation of human herpesvirus 6 (HHV-6)1. Here, through petabase-scale viral genomics mining, we examine the landscape of human latent viral reactivation and demonstrate that HHV-6B can become reactivated in cultures of human CD4+ T cells. Using single-cell sequencing, we identify a rare population of HHV-6 'super-expressors' (about 1 in 300-10,000 cells) that possess high viral transcriptional activity, among research-grade allogeneic chimeric antigen receptor (CAR) T cells. By analysing single-cell sequencing data from patients receiving cell therapy products that are approved by the US Food and Drug Administration2 or are in clinical studies3-5, we identify the presence of HHV-6-super-expressor CAR T cells in patients in vivo. Together, the findings of our study demonstrate the utility of comprehensive genomics analyses in implicating cell therapy products as a potential source contributing to the lytic HHV-6 infection that has been reported in clinical trials1,6-8 and may influence the design and production of autologous and allogeneic cell therapies.


Assuntos
Linfócitos T CD4-Positivos , Herpesvirus Humano 6 , Imunoterapia Adotiva , Receptores de Antígenos Quiméricos , Ativação Viral , Latência Viral , Humanos , Linfócitos T CD4-Positivos/imunologia , Linfócitos T CD4-Positivos/virologia , Ensaios Clínicos como Assunto , Regulação Viral da Expressão Gênica , Genômica , Herpesvirus Humano 6/genética , Herpesvirus Humano 6/isolamento & purificação , Herpesvirus Humano 6/fisiologia , Imunoterapia Adotiva/efeitos adversos , Imunoterapia Adotiva/métodos , Encefalite Infecciosa/complicações , Encefalite Infecciosa/virologia , Receptores de Antígenos Quiméricos/imunologia , Infecções por Roseolovirus/complicações , Infecções por Roseolovirus/virologia , Análise da Expressão Gênica de Célula Única , Carga Viral
2.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36869848

RESUMO

Sampling circulating tumor DNA (ctDNA) using liquid biopsies offers clinically important benefits for monitoring cancer progression. A single ctDNA sample represents a mixture of shed tumor DNA from all known and unknown lesions within a patient. Although shedding levels have been suggested to hold the key to identifying targetable lesions and uncovering treatment resistance mechanisms, the amount of DNA shed by any one specific lesion is still not well characterized. We designed the Lesion Shedding Model (LSM) to order lesions from the strongest to the poorest shedding for a given patient. By characterizing the lesion-specific ctDNA shedding levels, we can better understand the mechanisms of shedding and more accurately interpret ctDNA assays to improve their clinical impact. We verified the accuracy of the LSM under controlled conditions using a simulation approach as well as testing the model on three cancer patients. The LSM obtained an accurate partial order of the lesions according to their assigned shedding levels in simulations and its accuracy in identifying the top shedding lesion was not significantly impacted by number of lesions. Applying LSM to three cancer patients, we found that indeed there were lesions that consistently shed more than others into the patients' blood. In two of the patients, the top shedding lesion was one of the only clinically progressing lesions at the time of biopsy suggesting a connection between high ctDNA shedding and clinical progression. The LSM provides a much needed framework with which to understand ctDNA shedding and to accelerate discovery of ctDNA biomarkers. The LSM source code has been available in the IBM BioMedSciAI Github (https://github.com/BiomedSciAI/Geno4SD).


Assuntos
DNA Tumoral Circulante , Neoplasias , Humanos , Biomarcadores Tumorais/genética , Neoplasias/tratamento farmacológico , DNA de Neoplasias/genética , DNA Tumoral Circulante/genética , Biópsia , Mutação
3.
Bioinformatics ; 40(Supplement_1): i199-i207, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940159

RESUMO

MOTIVATION: The emergence of COVID-19 (C19) created incredible worldwide challenges but offers unique opportunities to understand the physiology of its risk factors and their interactions with complex disease conditions, such as metabolic syndrome. To address the challenges of discovering clinically relevant interactions, we employed a unique approach for epidemiological analysis powered by redescription-based topological data analysis (RTDA). RESULTS: Here, RTDA was applied to Explorys data to discover associations among severe C19 and metabolic syndrome. This approach was able to further explore the probative value of drug prescriptions to capture the involvement of RAAS and hypertension with C19, as well as modification of risk factor impact by hyperlipidemia (HL) on severe C19. RTDA found higher-order relationships between RAAS pathway and severe C19 along with demographic variables of age, gender, and comorbidities such as obesity, statin prescriptions, HL, chronic kidney failure, and disproportionately affecting Black individuals. RTDA combined with CuNA (cumulant-based network analysis) yielded a higher-order interaction network derived from cumulants that furthered supported the central role that RAAS plays. TDA techniques can provide a novel outlook beyond typical logistic regressions in epidemiology. From an observational cohort of electronic medical records, it can find out how RAAS drugs interact with comorbidities, such as hypertension and HL, of patients with severe bouts of C19. Where single variable association tests with outcome can struggle, TDA's higher-order interaction network between different variables enables the discovery of the comorbidities of a disease such as C19 work in concert. AVAILABILITY AND IMPLEMENTATION: Code for performing TDA/RTDA is available in https://github.com/IBM/Matilda and code for CuNA can be found in https://github.com/BiomedSciAI/Geno4SD/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , Hiperlipidemias , Síndrome Metabólica , Sistema Renina-Angiotensina , SARS-CoV-2 , Humanos , Síndrome Metabólica/epidemiologia , COVID-19/epidemiologia , Hiperlipidemias/epidemiologia , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Comorbidade , Hipertensão/epidemiologia , Fatores de Risco
4.
Blood ; 142(5): 421-433, 2023 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-37146250

RESUMO

Although BCL2 mutations are reported as later occurring events leading to venetoclax resistance, many other mechanisms of progression have been reported though remain poorly understood. Here, we analyze longitudinal tumor samples from 11 patients with disease progression while receiving venetoclax to characterize the clonal evolution of resistance. All patients tested showed increased in vitro resistance to venetoclax at the posttreatment time point. We found the previously described acquired BCL2-G101V mutation in only 4 of 11 patients, with 2 patients showing a very low variant allele fraction (0.03%-4.68%). Whole-exome sequencing revealed acquired loss(8p) in 4 of 11 patients, of which 2 patients also had gain (1q21.2-21.3) in the same cells affecting the MCL1 gene. In vitro experiments showed that CLL cells from the 4 patients with loss(8p) were more resistant to venetoclax than cells from those without it, with the cells from 2 patients also carrying gain (1q21.2-21.3) showing increased sensitivity to MCL1 inhibition. Progression samples with gain (1q21.2-21.3) were more susceptible to the combination of MCL1 inhibitor and venetoclax. Differential gene expression analysis comparing bulk RNA sequencing data from pretreatment and progression time points of all patients showed upregulation of proliferation, B-cell receptor (BCR), and NF-κB gene sets including MAPK genes. Cells from progression time points demonstrated upregulation of surface immunoglobulin M and higher pERK levels compared with those from the preprogression time point, suggesting an upregulation of BCR signaling that activates the MAPK pathway. Overall, our data suggest several mechanisms of acquired resistance to venetoclax in CLL that could pave the way for rationally designed combination treatments for patients with venetoclax-resistant CLL.


Assuntos
Antineoplásicos , Leucemia Linfocítica Crônica de Células B , Humanos , Antineoplásicos/farmacologia , Compostos Bicíclicos Heterocíclicos com Pontes/farmacologia , Resistencia a Medicamentos Antineoplásicos/genética , Sequenciamento do Exoma , Leucemia Linfocítica Crônica de Células B/tratamento farmacológico , Leucemia Linfocítica Crônica de Células B/genética , Leucemia Linfocítica Crônica de Células B/patologia , Proteína de Sequência 1 de Leucemia de Células Mieloides/genética , Proteínas Proto-Oncogênicas c-bcl-2
5.
Genome Res ; 31(11): 2131-2137, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34479875

RESUMO

The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.


Assuntos
Microbiota , Microbiota/genética , Filogenia
6.
Nat Methods ; 18(6): 618-626, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33986544

RESUMO

Accurate microbial identification and abundance estimation are crucial for metagenomics analysis. Various methods for classification of metagenomic data and estimation of taxonomic profiles, broadly referred to as metagenomic profilers, have been developed. Nevertheless, benchmarking of metagenomic profilers remains challenging because some tools are designed to report relative sequence abundance while others report relative taxonomic abundance. Here we show how misleading conclusions can be drawn by neglecting this distinction between relative abundance types when benchmarking metagenomic profilers. Moreover, we show compelling evidence that interchanging sequence abundance and taxonomic abundance will influence both per-sample summary statistics and cross-sample comparisons. We suggest that the microbiome research community pay attention to potentially misleading biological conclusions arising from this issue when benchmarking metagenomic profilers, by carefully considering the type of abundance data that were analyzed and interpreted and clearly stating the strategy used for metagenomic profiling.


Assuntos
Benchmarking/métodos , Metagenômica , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Microbiota/genética , Análise de Sequência de DNA/métodos
7.
Mol Biol Evol ; 38(5): 1809-1819, 2021 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-33481022

RESUMO

India represents an intricate tapestry of population substructure shaped by geography, language, culture, and social stratification. Although geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to understand Indian population structure. To date, no study has attempted to model and evaluate how these factors have interacted to shape the patterns of genetic diversity within India. We merged all publicly available data from the Indian subcontinent into a data set of 891 individuals from 90 well-defined groups. Bringing together geography, genetics, and demographic factors, we developed Correlation Optimization of Genetics and Geodemographics to build a model that explains the observed population genetic substructure. We show that shared language along with social structure have been the most powerful forces in creating paths of gene flow in the subcontinent. Furthermore, we discover the ethnic groups that best capture the diverse genetic substructure using a ridge leverage score statistic. Integrating data from India with a data set of additional 1,323 individuals from 50 Eurasian populations, we find that Indo-European and Dravidian speakers of India show shared genetic drift with Europeans, whereas the Tibeto-Burman speaking tribal groups have maximum shared genetic drift with East Asians.


Assuntos
Etnicidade/genética , Variação Genética , Idioma , Modelos Genéticos , Fatores Sociológicos , Geografia , Humanos , Índia
8.
Biometrics ; 78(3): 1155-1167, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-33914902

RESUMO

Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.


Assuntos
Microbiota , Algoritmos , Reprodutibilidade dos Testes
9.
BMC Genomics ; 22(Suppl 5): 518, 2021 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-34789161

RESUMO

BACKGROUND: All diseases containing genetic material undergo genetic evolution and give rise to heterogeneity including cancer and infection. Although these illnesses are biologically very different, the ability for phylogenetic retrodiction based on the genomic reads is common between them and thus tree-based principles and assumptions are shared. Just as the different frequencies of tumor genomic variants presupposes the existence of multiple tumor clones and provides a handle to computationally infer them, we postulate that the different variant frequencies in viral reads offers the means to infer multiple co-infecting sublineages. RESULTS: We present a common methodological framework to infer the phylogenomics from genomic data, be it reads of SARS-CoV-2 of multiple COVID-19 patients or bulk DNAseq of the tumor of a cancer patient. We describe the Concerti computational framework for inferring phylogenies in each of the two scenarios.To demonstrate the accuracy of the method, we reproduce some known results in both scenarios. We also make some additional discoveries. CONCLUSIONS: Concerti successfully extracts and integrates information from multi-point samples, enabling the discovery of clinically plausible phylogenetic trees that capture the heterogeneity known to exist both spatially and temporally. These models can have direct therapeutic implications by highlighting "birth" of clones that may harbor resistance mechanisms to treatment, "death" of subclones with drug targets, and acquisition of functionally pertinent mutations in clones that may have seemed clinically irrelevant. Specifically in this paper we uncover new potential parallel mutations in the evolution of the SARS-CoV-2 virus. In the context of cancer, we identify new clones harboring resistant mutations to therapy.


Assuntos
COVID-19 , Neoplasias , Células Clonais , Humanos , Mutação , Neoplasias/genética , Filogenia , SARS-CoV-2
10.
Int J Mol Sci ; 22(17)2021 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-34502564

RESUMO

Papillomaviruses (PVs) are a heterogeneous group of DNA viruses that can infect fish, birds, reptiles, and mammals. PVs infecting humans (HPVs) phylogenetically cluster into five genera (Alpha-, Beta-, Gamma-, Mu- and Nu-PV), with differences in tissue tropism and carcinogenicity. The evolutionary features associated with the divergence of Papillomaviridae are not well understood. Using a combination of k-mer distributions, genetic metrics, and phylogenetic algorithms, we sought to evaluate the characteristics and differences of Alpha-, Beta- and Gamma-PVs constituting the majority of HPV genomes. A total of 640 PVs including 442 HPV types, 27 non-human primate PV types, and 171 non-primate animal PV types were evaluated. Our analyses revealed the highest genetic diversity amongst Gamma-PVs compared to the Alpha and Beta PVs, suggesting reduced selective pressures on Gamma-PVs. Using a sequence alignment-free trimer (k = 3) phylogeny algorithm, we reconstructed a phylogeny that grouped most HPV types into a monophyletic clade that was further split into three branches similar to alignment-based classifications. Interestingly, a subset of low-risk Alpha HPVs (the species Alpha-2, 3, 4, and 14) split from other HPVs and were clustered with non-human primate PVs. Surprisingly, the trimer-constructed phylogeny grouped the Gamma-6 species types originally isolated from the cervicovaginal region with the main Alpha-HPV clade. These data indicate that characterization of papillomavirus heterogeneity via orthogonal approaches reveals novel insights into the biological understanding of HPV genomes.


Assuntos
DNA Viral/genética , Evolução Molecular , Variação Genética , Genoma Viral/genética , Papillomaviridae/genética , Algoritmos , Animais , Análise por Conglomerados , Códon/genética , Ilhas de CpG/genética , Metilação de DNA , DNA Viral/análise , Humanos , Papillomaviridae/classificação , Papillomaviridae/fisiologia , Infecções por Papillomavirus/virologia , Filogenia , Análise de Sequência de DNA/métodos
11.
Bioinformatics ; 35(18): 3279-3286, 2019 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-30689725

RESUMO

SUMMARY: Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies on highly heterozygous autotetraploid potato have shown that available methods do not deliver satisfying results in practice. We propose an optimal method to assemble haplotypes of highly heterozygous polyploids from Illumina short-sequencing reads. Our method is based on a generalization of the existing minimum fragment removal model to the polyploid case and on new integer linear programs to reconstruct optimal haplotypes. We validate our methods experimentally by means of a combined evaluation on simulated and experimental data based on 83 previously sequenced autotetraploid potato cultivars. Results on simulated data show that our methods produce highly accurate haplotype assemblies, while results on experimental data confirm a sensible improvement over the state of the art. AVAILABILITY AND IMPLEMENTATION: Executables for Linux at http://github.com/Computational Genomics/HaplotypeAssembler. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Solanum tuberosum , Algoritmos , Haplótipos , Programação Linear , Análise de Sequência de DNA , Software
12.
Bull World Health Organ ; 98(7): 495-504, 2020 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-32742035

RESUMO

OBJECTIVE: To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). METHODS: Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were from infected patients in 68 countries. We identified variants by extracting pairwise alignment to the reference genome NC_045512, using the EMBOSS needle. Nucleotide variants in the coding regions were converted to corresponding encoded amino acid residues. For clade analysis, we used the open source software Bayesian evolutionary analysis by sampling trees, version 2.5. FINDINGS: We identified 5775 distinct genome variants, including 2969 missense mutations, 1965 synonymous mutations, 484 mutations in the non-coding regions, 142 non-coding deletions, 100 in-frame deletions, 66 non-coding insertions, 36 stop-gained variants, 11 frameshift deletions and two in-frame insertions. The most common variants were the synonymous 3037C > T (6334 samples), P4715L in the open reading frame 1ab (6319 samples) and D614G in the spike protein (6294 samples). We identified six major clades, (that is, basal, D614G, L84S, L3606F, D448del and G392D) and 14 subclades. Regarding the base changes, the C > T mutation was the most common with 1670 distinct variants. CONCLUSION: We found that several variants of the SARS-CoV-2 genome exist and that the D614G clade has become the most common variant since December 2019. The evolutionary analysis indicated structured transmission, with the possibility of multiple introductions into the population.


Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/epidemiologia , Pneumonia Viral/epidemiologia , COVID-19 , Saúde Global , Humanos , Pandemias , RNA Viral/genética , SARS-CoV-2
13.
PLoS Comput Biol ; 15(8): e1007332, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31469830

RESUMO

The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.


Assuntos
Algoritmos , DNA de Neoplasias/genética , Neoplasias Hematológicas/classificação , Neoplasias Hematológicas/genética , Modelos Genéticos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Frequência do Gene , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Aprendizado de Máquina , Polimorfismo de Nucleotídeo Único , RNA não Traduzido/genética , Processos Estocásticos , Sequenciamento Completo do Genoma
14.
BMC Genomics ; 20(Suppl 2): 194, 2019 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-30967115

RESUMO

BACKGROUND: A metagenome is a collection of genomes, usually in a micro-environment, and sequencing a metagenomic sample en masse is a powerful means for investigating the community of the constituent microorganisms. One of the challenges is in distinguishing between similar organisms due to rampant multiple possible assignments of sequencing reads, resulting in false positive identifications. We map the problem to a topological data analysis (TDA) framework that extracts information from the geometric structure of data. Here the structure is defined by multi-way relationships between the sequencing reads using a reference database. RESULTS: Based primarily on the patterns of co-mapping of the reads to multiple organisms in the reference database, we use two models: one a subcomplex of a Barycentric subdivision complex and the other a Cech complex. The Barycentric subcomplex allows a natural mapping of the reads along with their coverage of organisms while the Cech complex takes simply the number of reads into account to map the problem to homology computation. Using simulated genome mixtures we show not just enrichment of signal but also microbe identification with strain-level resolution. CONCLUSIONS: In particular, in the most refractory of cases where alternative algorithms that exploit unique reads (i.e., mapped to unique organisms) fail, we show that the TDA approach continues to show consistent performance. The Cech model that uses less information is equally effective, suggesting that even partial information when augmented with the appropriate structure is quite powerful.


Assuntos
Algoritmos , Bactérias/classificação , Bactérias/genética , Análise de Dados , Metagenoma , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala
15.
BMC Cancer ; 19(1): 114, 2019 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-30709382

RESUMO

BACKGROUND: Significant numbers of variants detected in cancer patients are often left labeled only as variants of unknown significance (VUS). In order to expand precision medicine to a wider population, we need to extend our knowledge of pathogenicity and drug response in the context of VUS's. METHODS: In this study, we analyzed variants from AACR Project GENIE Consortium APG (Cancer Discov 7:818-831, 2017) and compared them to the COSMIC database Forbes et al. (Nucleic Acids Res 43:D805-811, 2015) to identify recurrent variants that would merit further study. We filtered out known hotspot variants, inactivating variants in tumor suppressors, and likely benign variants by comparing with COSMIC and ExAC Lee et al. (Science 337:967-971, 2012). RESULTS: We have identified 45,933 novel variants with unknown significance unique to GENIE. In our analysis, we found on average six variants per patient where two could be considered as pathogenic or likely pathogenic and the majority are VUS's. More importantly, we have discovered 730 recurrent variants that appear more than 3 times in GENIE but less than 3 in COSMIC. If we combine the recurrences of GENIE and COSMIC for all variants, 2586 are newly identified as occurring more than 3 times than when using COSMIC alone. CONCLUSIONS: Although it would be inappropriate to blindly accept these recurrent variants as pathogenic, they may warrant higher priority than other observed VUS's. These newly identified recurrent variants might affect the molecular profiles of approximately 1 in 6 patients. Further analysis and characterization of these variants in both research and clinical contexts will improve patient treatments and the development of new therapeutics.


Assuntos
Biomarcadores Tumorais/genética , Bases de Dados Genéticas , Variação Genética , Neoplasias/genética , Medicina de Precisão/métodos , Frequência do Gene , Genes Neoplásicos/genética , Humanos , Neoplasias/diagnóstico , Neoplasias/patologia , Medicina de Precisão/tendências
16.
Ann Bot ; 124(4): 717-730, 2019 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-31241131

RESUMO

BACKGROUND AND AIMS: Perennial grasses are a global resource as forage, and for alternative uses in bioenergy and as raw materials for the processing industry. Marginal lands can be valuable for perennial biomass grass production, if perennial biomass grasses can cope with adverse abiotic environmental stresses such as drought and waterlogging. METHODS: In this study, two perennial grass species, reed canary grass (Phalaris arundinacea) and cocksfoot (Dactylis glomerata) were subjected to drought and waterlogging stress to study their responses for insights to improving environmental stress tolerance. Physiological responses were recorded, reference transcriptomes established and differential gene expression investigated between control and stress conditions. We applied a robust non-parametric method, RoDEO, based on rank ordering of transcripts to investigate differential gene expression. Furthermore, we extended and validated vRoDEO for comparing samples with varying sequencing depths. KEY RESULTS: This allowed us to identify expressed genes under drought and waterlogging whilst using only a limited number of RNA sequencing experiments. Validating the methodology, several differentially expressed candidate genes involved in the stage 3 step-wise scheme in detoxification and degradation of xenobiotics were recovered, while several novel stress-related genes classified as of unknown function were discovered. CONCLUSIONS: Reed canary grass is a species coping particularly well with flooding conditions, but this study adds novel information on how its transcriptome reacts under drought stress. We built extensive transcriptomes for the two investigated C3 species cocksfoot and reed canary grass under both extremes of water stress to provide a clear comparison amongst the two species to broaden our horizon for comparative studies, but further confirmation of the data would be ideal to obtain a more detailed picture.


Assuntos
Secas , Phalaris , Biomassa , Dactylis , Estresse Fisiológico , Transcriptoma
18.
BMC Genomics ; 19(Suppl 2): 110, 2018 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-29764364

RESUMO

BACKGROUND: Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more. RESULTS: We proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes. CONCLUSIONS: Our experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.


Assuntos
Genômica/métodos , Haplótipos , Poliploidia , Algoritmos , Genoma , Análise de Sequência de DNA
19.
Oncologist ; 23(2): 179-185, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29158372

RESUMO

BACKGROUND: Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and reporting large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires considerable manual curation performed mainly by human "molecular tumor boards" (MTBs). The purpose of this study was to determine the utility of cognitive computing as performed by Watson for Genomics (WfG) compared with a human MTB. MATERIALS AND METHODS: One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. RESULTS: Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discovered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a relevant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took <3 minutes per case. CONCLUSION: These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials. IMPLICATIONS FOR PRACTICE: The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Neoplasias/tratamento farmacológico , Biomarcadores Tumorais , Estudos de Casos e Controles , Terapia Combinada , Seguimentos , Regulação Neoplásica da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metástase Linfática , Invasividade Neoplásica , Recidiva Local de Neoplasia/tratamento farmacológico , Recidiva Local de Neoplasia/patologia , Neoplasias/patologia , Prognóstico , Estudos Retrospectivos , Taxa de Sobrevida
20.
Bioinformatics ; 32(12): i37-i43, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307640

RESUMO

UNLABELLED: Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. AVAILABILITY AND IMPLEMENTATION: The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. CONTACT: dhe@us.ibm.com.


Assuntos
Genótipo , Aprendizado de Máquina , Modelos Genéticos , Fenótipo , Algoritmos , Animais , Humanos , Modelos Lineares , Plantas , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA