Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Biotechnol ; 2023 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-37679542

RESUMO

Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.

2.
Bioinformatics ; 38(9): 2519-2528, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35188184

RESUMO

MOTIVATION: Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. RESULTS: In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. AVAILABILITY AND IMPLEMENTATION: The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Reguladoras de Genes , Software , Animais , Camundongos , Genômica , Genoma , Cromatina
3.
Genome Res ; 31(2): 337-347, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33361113

RESUMO

Understanding the changes in diverse molecular pathways underlying the development of breast tumors is critical for improving diagnosis, treatment, and drug development. Here, we used RNA-profiling of canine mammary tumors (CMTs) coupled with a robust analysis framework to model molecular changes in human breast cancer. Our study leveraged a key advantage of the canine model, the frequent presence of multiple naturally occurring tumors at diagnosis, thus providing samples spanning normal tissue and benign and malignant tumors from each patient. We showed human breast cancer signals, at both expression and mutation level, are evident in CMTs. Profiling multiple tumors per patient enabled by the CMT model allowed us to resolve statistically robust transcription patterns and biological pathways specific to malignant tumors versus those arising in benign tumors or shared with normal tissues. We showed that multiple histological samples per patient is necessary to effectively capture these progression-related signatures, and that carcinoma-specific signatures are predictive of survival for human breast cancer patients. To catalyze and support similar analyses and use of the CMT model by other biomedical researchers, we provide FREYA, a robust data processing pipeline and statistical analyses framework.

4.
Nat Commun ; 11(1): 747, 2020 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-32029740

RESUMO

ATAC-seq has become a leading technology for probing the chromatin landscape of single and aggregated cells. Distilling functional regions from ATAC-seq presents diverse analysis challenges. Methods commonly used to analyze chromatin accessibility datasets are adapted from algorithms designed to process different experimental technologies, disregarding the statistical and biological differences intrinsic to the ATAC-seq technology. Here, we present a Bayesian statistical approach that uses latent space models to better model accessible regions, termed ChromA. ChromA annotates chromatin landscape by integrating information from replicates, producing a consensus de-noised annotation of chromatin accessibility. ChromA can analyze single cell ATAC-seq data, correcting many biases generated by the sparse sampling inherent in single cell technologies. We validate ChromA on multiple technologies and biological systems, including mouse and human immune cells, establishing ChromA as a top performing general platform for mapping the chromatin landscape in different cellular populations from diverse experimental designs.


Assuntos
Cromatina/genética , Genômica/métodos , Modelos Genéticos , Algoritmos , Animais , Teorema de Bayes , Sequenciamento de Cromatina por Imunoprecipitação , Biblioteca Gênica , Humanos , Cadeias de Markov , Camundongos , Anotação de Sequência Molecular , Análise de Célula Única
5.
Immunity ; 51(1): 185-197.e6, 2019 07 16.
Artigo em Inglês | MEDLINE | ID: mdl-31278058

RESUMO

Innate lymphoid cells (ILCs) promote tissue homeostasis and immune defense but also contribute to inflammatory diseases. ILCs exhibit phenotypic and functional plasticity in response to environmental stimuli, yet the transcriptional regulatory networks (TRNs) that control ILC function are largely unknown. Here, we integrate gene expression and chromatin accessibility data to infer regulatory interactions between transcription factors (TFs) and genes within intestinal type 1, 2, and 3 ILC subsets. We predicted the "core" TFs driving ILC identities, organized TFs into cooperative modules controlling distinct gene programs, and validated roles for c-MAF and BCL6 as regulators affecting type 1 and type 3 ILC lineages. The ILC network revealed alternative-lineage-gene repression, a mechanism that may contribute to reported plasticity between ILC subsets. By connecting TFs to genes, the TRNs suggest means to selectively regulate ILC effector functions, while our network approach is broadly applicable to identifying regulators in other in vivo cell populations.


Assuntos
Intestinos/fisiologia , Subpopulações de Linfócitos/fisiologia , Linfócitos/fisiologia , Animais , Diferenciação Celular , Linhagem da Célula , Plasticidade Celular , Montagem e Desmontagem da Cromatina , Repressão Epigenética , Redes Reguladoras de Genes , Imunidade Inata , Imunomodulação , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Proteínas Proto-Oncogênicas c-bcl-6/genética , Proteínas Proto-Oncogênicas c-maf/genética , Transcriptoma
6.
Clin Cancer Res ; 24(8): 1872-1880, 2018 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-29330207

RESUMO

Purpose: Decisions to continue or suspend therapy with immune checkpoint inhibitors are commonly guided by tumor dynamics seen on serial imaging. However, immunotherapy responses are uniquely challenging to interpret because tumors often shrink slowly or can appear transiently enlarged due to inflammation. We hypothesized that monitoring tumor cell death in real time by quantifying changes in circulating tumor DNA (ctDNA) levels could enable early assessment of immunotherapy efficacy.Experimental Design: We compared longitudinal changes in ctDNA levels with changes in radiographic tumor size and with survival outcomes in 28 patients with metastatic non-small cell lung cancer (NSCLC) receiving immune checkpoint inhibitor therapy. CtDNA was quantified by determining the allele fraction of cancer-associated somatic mutations in plasma using a multigene next-generation sequencing assay. We defined a ctDNA response as a >50% decrease in mutant allele fraction from baseline, with a second confirmatory measurement.Results: Strong agreement was observed between ctDNA response and radiographic response (Cohen's kappa, 0.753). Median time to initial response among patients who achieved responses in both categories was 24.5 days by ctDNA versus 72.5 days by imaging. Time on treatment was significantly longer for ctDNA responders versus nonresponders (median, 205.5 vs. 69 days; P < 0.001). A ctDNA response was associated with superior progression-free survival [hazard ratio (HR), 0.29; 95% CI, 0.09-0.89; P = 0.03], and superior overall survival (HR, 0.17; 95% CI, 0.05-0.62; P = 0.007).Conclusions: A drop in ctDNA level is an early marker of therapeutic efficacy and predicts prolonged survival in patients treated with immune checkpoint inhibitors for NSCLC. Clin Cancer Res; 24(8); 1872-80. ©2018 AACR.


Assuntos
Biomarcadores Tumorais , DNA Tumoral Circulante , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/terapia , Antineoplásicos Imunológicos/uso terapêutico , Antígeno B7-H1/antagonistas & inibidores , Progressão da Doença , Humanos , Imunoterapia , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/imunologia , Mutação , Prognóstico , Receptor de Morte Celular Programada 1/antagonistas & inibidores , Análise de Sobrevida , Fatores de Tempo , Tomografia Computadorizada por Raios X , Resultado do Tratamento
7.
Nucleic Acids Res ; 45(8): 4315-4329, 2017 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-28334916

RESUMO

Differential binding of transcription factors (TFs) at cis-regulatory loci drives the differentiation and function of diverse cellular lineages. Understanding the regulatory interactions that underlie cell fate decisions requires characterizing TF binding sites (TFBS) across multiple cell types and conditions. Techniques, e.g. ChIP-Seq can reveal genome-wide patterns of TF binding, but typically requires laborious and costly experiments for each TF-cell-type (TFCT) condition of interest. Chromosomal accessibility assays can connect accessible chromatin in one cell type to many TFs through sequence motif mapping. Such methods, however, rarely take into account that the genomic context preferred by each factor differs from TF to TF, and from cell type to cell type. To address the differences in TF behaviors, we developed Mocap, a method that integrates chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and other factors in an ensemble of TFCT-specific classifiers. We show that integration of genomic features, such as CpG islands improves TFBS prediction in some TFCT. Further, we describe a method for mapping new TFCT, for which no ChIP-seq data exists, onto our ensemble of classifiers and show that our cross-sample TFBS prediction method outperforms several previously described methods.


Assuntos
Cromatina/metabolismo , Ectoderma/metabolismo , Endoderma/metabolismo , Mesoderma/metabolismo , Software , Fatores de Transcrição/metabolismo , Transcrição Gênica , Composição de Bases , Sítios de Ligação , Linhagem Celular , Cromatina/química , Imunoprecipitação da Cromatina , Ilhas de CpG , Bases de Dados Genéticas , Ectoderma/citologia , Endoderma/citologia , Humanos , Mesoderma/citologia , Motivos de Nucleotídeos , Especificidade de Órgãos , Ligação Proteica , Fatores de Transcrição/genética
8.
Nat Immunol ; 18(4): 412-421, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28166218

RESUMO

Type 1 regulatory T cells (Tr1 cells) are induced by interleukin-27 (IL-27) and have critical roles in the control of autoimmunity and resolution of inflammation. We found that the transcription factors IRF1 and BATF were induced early on after treatment with IL-27 and were required for the differentiation and function of Tr1 cells in vitro and in vivo. Epigenetic and transcriptional analyses revealed that both transcription factors influenced chromatin accessibility and expression of the genes required for Tr1 cell function. IRF1 and BATF deficiencies uniquely altered the chromatin landscape, suggesting that these factors serve a pioneering function during Tr1 cell differentiation.


Assuntos
Fatores de Transcrição de Zíper de Leucina Básica/metabolismo , Diferenciação Celular/imunologia , Cromatina/metabolismo , Fator Regulador 1 de Interferon/metabolismo , Linfócitos T Reguladores/imunologia , Linfócitos T Reguladores/metabolismo , Animais , Doenças Autoimunes/genética , Doenças Autoimunes/imunologia , Doenças Autoimunes/metabolismo , Autoimunidade , Fatores de Transcrição de Zíper de Leucina Básica/genética , Diferenciação Celular/genética , Cromatina/genética , Análise por Conglomerados , Citocinas/metabolismo , Citocinas/farmacologia , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Fator Regulador 1 de Interferon/genética , Camundongos , Camundongos Knockout , Regiões Promotoras Genéticas , Subpopulações de Linfócitos T/efeitos dos fármacos , Subpopulações de Linfócitos T/imunologia , Subpopulações de Linfócitos T/metabolismo , Linfócitos T Reguladores/citologia , Linfócitos T Reguladores/efeitos dos fármacos , Fatores de Transcrição/metabolismo , Transcriptoma
9.
Genomics Proteomics Bioinformatics ; 13(1): 25-35, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25712262

RESUMO

We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results.


Assuntos
Cromatografia Líquida/métodos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Fragmentos de Peptídeos/análise , Proteoma/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Humanos
10.
DNA Repair (Amst) ; 26: 44-53, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25547252

RESUMO

Efficient DNA double-strand break (DSB) repair is a critical determinant of cell survival in response to DNA damaging agents, and it plays a key role in the maintenance of genomic integrity. Homologous recombination (HR) and non-homologous end-joining (NHEJ) represent the two major pathways by which DSBs are repaired in mammalian cells. We now understand that HR and NHEJ repair are composed of multiple sub-pathways, some of which still remain poorly understood. As such, there is great interest in the development of novel assays to interrogate these key pathways, which could lead to the development of novel therapeutics, and a better understanding of how DSBs are repaired. Furthermore, assays which can measure repair specifically at endogenous chromosomal loci are of particular interest, because of an emerging understanding that chromatin interactions heavily influence DSB repair pathway choice. Here, we present the design and validation of a novel, next-generation sequencing-based approach to study DSB repair at chromosomal loci in cells. We demonstrate that NHEJ repair "fingerprints" can be identified using our assay, which are dependent on the status of key DSB repair proteins. In addition, we have validated that our system can be used to detect dynamic shifts in DSB repair activity in response to specific perturbations. This approach represents a unique alternative to many currently available DSB repair assays, which typical rely on the expression of reporter genes as an indirect read-out for repair. As such, we believe this tool will be useful for DNA repair researchers to study NHEJ repair in a high-throughput and sensitive manner, with the capacity to detect subtle changes in DSB repair patterns that was not possible previously.


Assuntos
Quebras de DNA de Cadeia Dupla , Reparo do DNA por Junção de Extremidades , Análise Mutacional de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Cromatina/metabolismo , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Loci Gênicos , Humanos , Mutação INDEL , Mamíferos , Reparo de DNA por Recombinação
11.
Cell Rep ; 9(1): 16-23, 2014 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-25284784

RESUMO

Whole-exome sequencing (WES) studies have demonstrated the contribution of de novo loss-of-function single-nucleotide variants (SNVs) to autism spectrum disorder (ASD). However, challenges in the reliable detection of de novo insertions and deletions (indels) have limited inclusion of these variants in prior analyses. By applying a robust indel detection method to WES data from 787 ASD families (2,963 individuals), we demonstrate that de novo frameshift indels contribute to ASD risk (OR = 1.6; 95% CI = 1.0-2.7; p = 0.03), are more common in female probands (p = 0.02), are enriched among genes encoding FMRP targets (p = 6 × 10(-9)), and arise predominantly on the paternal chromosome (p < 0.001). On the basis of mutation rates in probands versus unaffected siblings, we conclude that de novo frameshift indels contribute to risk in approximately 3% of individuals with ASD. Finally, by observing clustering of mutations in unrelated probands, we uncover two ASD-associated genes: KMT2E (MLL5), a chromatin regulator, and RIMS1, a regulator of synaptic vesicle release.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Mutação da Fase de Leitura , Deleção de Sequência , Criança , Transtornos Globais do Desenvolvimento Infantil/sangue , Transtornos Globais do Desenvolvimento Infantil/diagnóstico , DNA/sangue , DNA/genética , Proteínas de Ligação a DNA/genética , Feminino , Proteína do X Frágil da Deficiência Intelectual/genética , Proteínas de Ligação ao GTP/genética , Humanos , Masculino , Proteínas do Tecido Nervoso/genética , Linhagem , Fenótipo , Fatores Sexuais
12.
Eukaryot Cell ; 13(1): 77-86, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24186950

RESUMO

Parasitic protozoa of the flagellate order Kinetoplastida represent one of the deepest branches of the eukaryotic tree. Among this group of organisms, the mechanism of RNA interference (RNAi) has been investigated in Trypanosoma brucei and to a lesser degree in Leishmania (Viannia) spp. The pathway is triggered by long double-stranded RNA (dsRNA) and in T. brucei requires a set of five core genes, including a single Argonaute (AGO) protein, T. brucei AGO1 (TbAGO1). The five genes are conserved in Leishmania (Viannia) spp. but are absent in other major kinetoplastid species, such as Trypanosoma cruzi and Leishmania major. In T. brucei small interfering RNAs (siRNAs) are methylated at the 3' end, whereas Leishmania (Viannia) sp. siRNAs are not. Here we report that T. brucei HEN1, an ortholog of the metazoan HEN1 2'-O-methyltransferases, is required for methylation of siRNAs. Loss of TbHEN1 causes a reduction in the length of siRNAs. The shorter siRNAs in hen1(-/-) parasites are single stranded and associated with TbAGO1, and a subset carry a nontemplated uridine at the 3' end. These findings support a model wherein TbHEN1 methylates siRNA 3' ends after they are loaded into TbAGO1 and this methylation protects siRNAs from uridylation and 3' trimming. Moreover, expression of TbHEN1 in Leishmania (Viannia) panamensis did not result in siRNA 3' end methylation, further emphasizing mechanistic differences in the trypanosome and Leishmania RNAi mechanisms.


Assuntos
Metiltransferases/metabolismo , Proteínas de Protozoários/metabolismo , Processamento Pós-Transcricional do RNA , RNA de Protozoário/metabolismo , RNA Interferente Pequeno/metabolismo , Trypanosoma brucei brucei/metabolismo , Sequência de Aminoácidos , Leishmania/genética , Leishmania/metabolismo , Metiltransferases/química , Metiltransferases/genética , Dados de Sequência Molecular , Mutação , Proteínas de Protozoários/química , Proteínas de Protozoários/genética , Trypanosoma brucei brucei/enzimologia
13.
Nature ; 498(7453): 220-3, 2013 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-23665959

RESUMO

Congenital heart disease (CHD) is the most frequent birth defect, affecting 0.8% of live births. Many cases occur sporadically and impair reproductive fitness, suggesting a role for de novo mutations. Here we compare the incidence of de novo mutations in 362 severe CHD cases and 264 controls by analysing exome sequencing of parent-offspring trios. CHD cases show a significant excess of protein-altering de novo mutations in genes expressed in the developing heart, with an odds ratio of 7.5 for damaging (premature termination, frameshift, splice site) mutations. Similar odds ratios are seen across the main classes of severe CHD. We find a marked excess of de novo mutations in genes involved in the production, removal or reading of histone 3 lysine 4 (H3K4) methylation, or ubiquitination of H2BK120, which is required for H3K4 methylation. There are also two de novo mutations in SMAD2, which regulates H3K27 methylation in the embryonic left-right organizer. The combination of both activating (H3K4 methylation) and inactivating (H3K27 methylation) chromatin marks characterizes 'poised' promoters and enhancers, which regulate expression of key developmental genes. These findings implicate de novo point mutations in several hundreds of genes that collectively contribute to approximately 10% of severe CHD.


Assuntos
Cardiopatias/congênito , Cardiopatias/genética , Histonas/metabolismo , Adulto , Estudos de Casos e Controles , Criança , Cromatina/química , Cromatina/metabolismo , Análise Mutacional de DNA , Elementos Facilitadores Genéticos/genética , Exoma/genética , Feminino , Genes Controladores do Desenvolvimento/genética , Cardiopatias/metabolismo , Histonas/química , Humanos , Lisina/química , Lisina/metabolismo , Masculino , Metilação , Mutação , Razão de Chances , Regiões Promotoras Genéticas/genética
14.
Mol Microbiol ; 87(3): 580-93, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23217017

RESUMO

Among trypanosomatid protozoa the mechanism of RNA interference (RNAi) has been investigated in Trypanosoma brucei and to a lesser extent in Leishmania braziliensis. Although these two parasitic organisms belong to the same family, they are evolutionarily distantly related raising questions about the conservation of the RNAi pathway. Here we carried out an in-depth analysis of small interfering RNAs (siRNAs) associated with L. braziliensis Argonaute1 (LbrAGO1). In contrast to T. brucei, Leishmania siRNAs are sensitive to 3' end oxidation, indicating the absence of blocking groups, and the Leishmania genome does not code for a HEN1 RNA 2'-O-methyltransferase, which modifies small RNA 3' ends. Consistent with this observation, ~20% of siRNA 3' ends carry non-templated uridines. Thus siRNA biogenesis, and most likely their metabolism, is different in these organisms. Similarly to T. brucei, putative mobile elements and repeats constitute the major Leishmania siRNA-producing loci and AGO1 ablation leads to accumulation of long transcripts derived from putative mobile elements. However, contrary to T. brucei, no siRNAs were detected from other genomic regions with the potential to form double-stranded RNA, namely sites of convergent transcription and inverted repeats. Thus, our results indicate that organism-specific diversification has occurred in the RNAi pathway during evolution of the trypanosomatid lineage.


Assuntos
Variação Genética , Leishmania braziliensis/genética , RNA Interferente Pequeno/genética , Proteínas Argonautas/genética , Regulação da Expressão Gênica , RNA Interferente Pequeno/química , Trypanosoma brucei brucei/genética
15.
Cancer Res ; 72(14): 3492-8, 2012 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-22581825

RESUMO

Detection of cell-free tumor DNA in the blood has offered promise as a cancer biomarker, but practical clinical implementations have been impeded by the lack of a sensitive and accurate method for quantitation that is also simple, inexpensive, and readily scalable. Here we present an approach that uses next-generation sequencing to quantify the small fraction of DNA molecules that contain tumor-specific mutations within a background of normal DNA in plasma. Using layers of sequence redundancy designed to distinguish true mutations from sequencer misreads and PCR misincorporations, we achieved a detection sensitivity of approximately 1 variant in 5,000 molecules. In addition, the attachment of modular barcode tags to the DNA fragments to be sequenced facilitated the simultaneous analysis of more than 100 patient samples. As proof-of-principle, we showed the successful use of this method to follow treatment-associated changes in circulating tumor DNA levels in patients with non-small cell lung cancer. Our findings suggest that the deep sequencing approach described here may be applied to the development of a practical diagnostic test that measures tumor-derived DNA levels in blood.


Assuntos
DNA de Neoplasias/sangue , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Carcinoma Pulmonar de Células não Pequenas/sangue , Carcinoma Pulmonar de Células não Pequenas/genética , Linhagem Celular Tumoral , Feminino , Humanos , Neoplasias Pulmonares/sangue , Neoplasias Pulmonares/genética , Masculino , Mutação , Reação em Cadeia da Polimerase
16.
Nature ; 485(7397): 237-41, 2012 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-22495306

RESUMO

Multiple studies have confirmed the contribution of rare de novo copy number variations to the risk for autism spectrum disorders. But whereas de novo single nucleotide variants have been identified in affected individuals, their contribution to risk has yet to be clarified. Specifically, the frequency and distribution of these mutations have not been well characterized in matched unaffected controls, and such data are vital to the interpretation of de novo coding mutations observed in probands. Here we show, using whole-exome sequencing of 928 individuals, including 200 phenotypically discordant sibling pairs, that highly disruptive (nonsense and splice-site) de novo mutations in brain-expressed genes are associated with autism spectrum disorders and carry large effects. On the basis of mutation rates in unaffected individuals, we demonstrate that multiple independent de novo single nucleotide variants in the same gene among unrelated probands reliably identifies risk alleles, providing a clear path forward for gene discovery. Among a total of 279 identified de novo coding mutations, there is a single instance in probands, and none in siblings, in which two independent nonsense variants disrupt the same gene, SCN2A (sodium channel, voltage-gated, type II, α subunit), a result that is highly unlikely by chance.


Assuntos
Transtorno Autístico/genética , Exoma/genética , Éxons/genética , Predisposição Genética para Doença/genética , Mutação/genética , Proteínas do Tecido Nervoso/genética , Canais de Sódio/genética , Alelos , Códon sem Sentido/genética , Heterogeneidade Genética , Humanos , Canal de Sódio Disparado por Voltagem NAV1.2 , Sítios de Splice de RNA/genética , Irmãos
17.
Hum Hered ; 72(2): 85-97, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21934324

RESUMO

BACKGROUND: Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR). METHODS: We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma. RESULTS: The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest. CONCLUSIONS: Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.


Assuntos
Mineração de Dados/métodos , Epistasia Genética , Estudos de Associação Genética , Algoritmos , Simulação por Computador , Loci Gênicos , Genoma Humano , Haplótipos , Humanos , Linfoma não Hodgkin/genética , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único
18.
Nucleic Acids Res ; 38(20): 6997-7007, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20615899

RESUMO

Duplicated pseudogenes in the human genome are disabled copies of functioning parent genes. They result from block duplication events occurring throughout evolutionary history. Relatively recent duplications (with sequence similarity≥90% and length≥1 kb) are termed segmental duplications (SDs); here, we analyze the interrelationship of SDs and pseudogenes. We present a decision-tree approach to classify pseudogenes based on their (and their parents') characteristics in relation to SDs. The classification identifies 140 novel pseudogenes and makes possible improved annotation for the 3172 pseudogenes located in SDs. In particular, it reveals that many pseudogenes in SDs likely did not arise directly from parent genes, but are the result of a multi-step process. In these cases, the initial duplication or retrotransposition of a parent gene gives rise to a 'parent pseudogene', followed by further duplication creating duplicated-duplicated or duplicated-processed pseudogenes, respectively. Moreover, we can precisely identify these parent pseudogenes by overlap with ancestral SD loci. Finally, a comparison of nucleotide substitutions per site in a pseudogene with its surrounding SD region allows us to estimate the time difference between duplication and disablement events, and this suggests that most duplicated pseudogenes in SDs were likely disabled around the time of the original duplication.


Assuntos
Genoma Humano , Pseudogenes , Duplicações Segmentares Genômicas , Evolução Molecular , Duplicação Gênica , Loci Gênicos , Humanos
19.
BMC Genomics ; 10: 480, 2009 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-19835609

RESUMO

BACKGROUND: Pseudogenes provide a record of the molecular evolution of genes. As glycolysis is such a highly conserved and fundamental metabolic pathway, the pseudogenes of glycolytic enzymes comprise a standardized genomic measuring stick and an ideal platform for studying molecular evolution. One of the glycolytic enzymes, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), has already been noted to have one of the largest numbers of associated pseudogenes, among all proteins. RESULTS: We assembled the first comprehensive catalog of the processed and duplicated pseudogenes of glycolytic enzymes in many vertebrate model-organism genomes, including human, chimpanzee, mouse, rat, chicken, zebrafish, pufferfish, fruitfly, and worm (available at http://pseudogene.org/glycolysis/). We found that glycolytic pseudogenes are predominantly processed, i.e. retrotransposed from the mRNA of their parent genes. Although each glycolytic enzyme plays a unique role, GAPDH has by far the most pseudogenes, perhaps reflecting its large number of non-glycolytic functions or its possession of a particularly retrotranspositionally active sub-sequence. Furthermore, the number of GAPDH pseudogenes varies significantly among the genomes we studied: none in zebrafish, pufferfish, fruitfly, and worm, 1 in chicken, 50 in chimpanzee, 62 in human, 331 in mouse, and 364 in rat. Next, we developed a simple method of identifying conserved syntenic blocks (consistently applicable to the wide range of organisms in the study) by using orthologous genes as anchors delimiting a conserved block between a pair of genomes. This approach showed that few glycolytic pseudogenes are shared between primate and rodent lineages. Finally, by estimating pseudogene ages using Kimura's two-parameter model of nucleotide substitution, we found evidence for bursts of retrotranspositional activity approximately 42, 36, and 26 million years ago in the human, mouse, and rat lineages, respectively. CONCLUSION: Overall, we performed a consistent analysis of one group of pseudogenes across multiple genomes, finding evidence that most of them were created within the last 50 million years, subsequent to the divergence of rodent and primate lineages.


Assuntos
Evolução Molecular , Gliceraldeído-3-Fosfato Desidrogenases/genética , Pseudogenes , Retroelementos , Vertebrados/genética , Animais , Hibridização Genômica Comparativa , Análise Mutacional de DNA , Genoma , Sintenia
20.
Genome Biol ; 10(2): R23, 2009 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-19236709

RESUMO

Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.


Assuntos
Biologia Computacional/métodos , Variação Estrutural do Genoma , Modelos Genéticos , Sequência de Bases , Simulação por Computador , Genoma , Genômica/métodos , Internet , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...