Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 167(5): 1369-1384.e19, 2016 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-27863249

RESUMO

Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases.


Assuntos
Células Sanguíneas/citologia , Doença/genética , Regiões Promotoras Genéticas , Linhagem da Célula , Separação Celular , Cromatina , Elementos Facilitadores Genéticos , Epigenômica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Hematopoese , Humanos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
2.
Nature ; 583(7818): 693-698, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728248

RESUMO

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.


Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Anotação de Sequência Molecular , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Metilação de DNA , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Genômica/normas , Genômica/tendências , Histonas/metabolismo , Humanos , Camundongos , Anotação de Sequência Molecular/normas , Controle de Qualidade , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo
3.
Gut ; 73(9): 1464-1477, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-38857990

RESUMO

OBJECTIVE: Epigenetic mechanisms, including DNA methylation (DNAm), have been proposed to play a key role in Crohn's disease (CD) pathogenesis. However, the specific cell types and pathways affected as well as their potential impact on disease phenotype and outcome remain unknown. We set out to investigate the role of intestinal epithelial DNAm in CD pathogenesis. DESIGN: We generated 312 intestinal epithelial organoids (IEOs) from mucosal biopsies of 168 patients with CD (n=72), UC (n=23) and healthy controls (n=73). We performed genome-wide molecular profiling including DNAm, bulk as well as single-cell RNA sequencing. Organoids were subjected to gene editing and the functional consequences of DNAm changes evaluated using an organoid-lymphocyte coculture and a nucleotide-binding oligomerisation domain, leucine-rich repeat and CARD domain containing 5 (NLRC5) dextran sulphate sodium (DSS) colitis knock-out mouse model. RESULTS: We identified highly stable, CD-associated loss of DNAm at major histocompatibility complex (MHC) class 1 loci including NLRC5 and cognate gene upregulation. Single-cell RNA sequencing of primary mucosal tissue and IEOs confirmed the role of NLRC5 as transcriptional transactivator in the intestinal epithelium. Increased mucosal MHC-I and NLRC5 expression in adult and paediatric patients with CD was validated in additional cohorts and the functional role of MHC-I highlighted by demonstrating a relative protection from DSS-mediated mucosal inflammation in NLRC5-deficient mice. MHC-I DNAm in IEOs showed a significant correlation with CD disease phenotype and outcomes. Application of machine learning approaches enabled the development of a disease prognostic epigenetic molecular signature. CONCLUSIONS: Our study has identified epigenetically regulated intestinal epithelial MHC-I as a novel mechanism in CD pathogenesis.


Assuntos
Doença de Crohn , Metilação de DNA , Epigênese Genética , Mucosa Intestinal , Organoides , Humanos , Doença de Crohn/genética , Doença de Crohn/patologia , Doença de Crohn/metabolismo , Organoides/metabolismo , Organoides/patologia , Mucosa Intestinal/metabolismo , Mucosa Intestinal/patologia , Camundongos , Animais , Feminino , Masculino , Camundongos Knockout , Bancos de Espécimes Biológicos , Adulto , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe I/metabolismo , Modelos Animais de Doenças , Peptídeos e Proteínas de Sinalização Intracelular/genética , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo
4.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34634797

RESUMO

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Assuntos
COVID-19/virologia , Bases de Dados Genéticas , SARS-CoV-2/genética , Navegador , Coronaviridae/genética , Variação Genética , Genoma Viral , Humanos , Anotação de Sequência Molecular
5.
Annu Rev Genomics Hum Genet ; 21: 55-79, 2020 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-32421357

RESUMO

Our understanding of the human genome has continuously expanded since its draft publication in 2001. Over the years, novel assays have allowed us to progressively overlay layers of knowledge above the raw sequence of A's, T's, G's, and C's. The reference human genome sequence is now a complex knowledge base maintained under the shared stewardship of multiple specialist communities. Its complexity stems from the fact that it is simultaneously a template for transcription, a record of evolution, a vehicle for genetics, and a functional molecule. In short, the human genome serves as a frame of reference at the intersection of a diversity of scientific fields. In recent years, the progressive fall in sequencing costs has given increasing importance to the quality of the human reference genome, as hundreds of thousands of individuals are being sequenced yearly, often for clinical applications. Also, novel sequencing-based assays shed light on novel functions of the genome, especially with respect to gene expression regulation. Keeping the human genome annotation up to date and accurate is therefore an ongoing partnership between reference annotation projects and the greater community worldwide.


Assuntos
Genoma Humano , Anotação de Sequência Molecular/métodos , Anotação de Sequência Molecular/normas , Humanos
7.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33270111

RESUMO

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , SARS-CoV-2/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Epidemias , Humanos , Internet , Camundongos , Pseudogenes/genética , RNA Longo não Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Transcrição Gênica/genética
8.
Gastroenterology ; 160(1): 232-244.e7, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32814113

RESUMO

BACKGROUND & AIMS: Gene expression patterns of CD8+ T cells have been reported to correlate with clinical outcomes of adults with inflammatory bowel diseases (IBD). We aimed to validate these findings in independent patient cohorts. METHODS: We obtained peripheral blood samples from 112 children with a new diagnosis of IBD (71 with Crohn's disease and 41 with ulcerative colitis) and 19 children without IBD (controls) and recorded medical information on disease activity and outcomes. CD8+ T cells were isolated from blood samples by magnetic bead sorting at the point of diagnosis and during the course of disease. Genome-wide transcription (n = 192) and DNA methylation (n = 66) profiles were generated using Affymetrix and Illumina arrays, respectively. Publicly available transcriptomes and DNA methylomes of CD8+ T cells from 3 adult patient cohorts with and without IBD were included in data analyses. RESULTS: Previously reported CD8+ T-cell prognostic expression and exhaustion signatures were only found in the original adult IBD patient cohort. These signatures could not be detected in either a pediatric or a second adult IBD cohort. In contrast, an association between CD8+ T-cell gene expression with age and sex was detected across all 3 cohorts. CD8+ gene transcription was clearly associated with IBD in the 2 cohorts that included non-IBD controls. Lastly, DNA methylation profiles of CD8+ T cells from children with Crohn's disease correlated with age but not with disease outcome. CONCLUSIONS: We were unable to validate previously reported findings of an association between CD8+ T-cell gene transcription and disease outcome in IBD. Our findings reveal the challenges of developing prognostic biomarkers for patients with IBD and the importance of their validation in large, independent cohorts before clinical application.


Assuntos
Linfócitos T CD8-Positivos/fisiologia , Doenças Inflamatórias Intestinais/diagnóstico , Doenças Inflamatórias Intestinais/etiologia , Adolescente , Adulto , Fatores Etários , Estudos de Casos e Controles , Criança , Pré-Escolar , Metilação de DNA , Feminino , Humanos , Masculino , Valor Preditivo dos Testes , Prognóstico , Transcrição Gênica , Adulto Jovem
9.
Bioinformatics ; 36(9): 2936-2937, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31930349

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. RESULTS: We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. AVAILABILITY AND IMPLEMENTATION: The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Fenótipo , Locos de Características Quantitativas/genética , Software
10.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357393

RESUMO

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Assuntos
Bases de Dados Genéticas , Genoma Humano/genética , Genômica , Pseudogenes/genética , Animais , Biologia Computacional , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Software
11.
Bioinformatics ; 35(24): 5318-5320, 2019 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-31368484

RESUMO

MOTIVATION: Compared to traditional haploid reference genomes, graph genomes are an efficient and compact data structure for storing multiple genomic sequences, for storing polymorphisms or for mapping sequencing reads with greater sensitivity. Further, graphs are well-studied computer science objects that can be efficiently analyzed. However, their adoption in genomic research is slow, in part because of the cognitive difficulty in interpreting graphs. RESULTS: We present an intuitive graphical representation for graph genomes that re-uses well-honed techniques developed to display public transport networks, and demonstrate it as a web tool. AVAILABILITY AND IMPLEMENTATION: Code: https://github.com/vgteam/sequenceTubeMap. DEMONSTRATION: https://vgteam.github.io/sequenceTubeMap/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genoma , Software , Genômica , Análise de Sequência de DNA
12.
Nucleic Acids Res ; 46(D1): D754-D761, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29155950

RESUMO

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.


Assuntos
Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma , Disseminação de Informação , Animais , Epigenômica , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Vertebrados/genética , Navegador
13.
Nucleic Acids Res ; 45(D1): D635-D642, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899575

RESUMO

Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Ferramenta de Busca , Software , Navegador , Animais , Mineração de Dados , Evolução Molecular , Regulação da Expressão Gênica , Variação Genética , Genoma Humano , Humanos , Anotação de Sequência Molecular , Especificidade da Espécie , Vertebrados
14.
Nucleic Acids Res ; 44(D1): D710-6, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26687719

RESUMO

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.


Assuntos
Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Animais , Genes , Variação Genética , Humanos , Internet , Camundongos , Proteínas/genética , Ratos , Sequências Reguladoras de Ácido Nucleico , Software
15.
Nucleic Acids Res ; 43(Database issue): D662-9, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25352552

RESUMO

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Epigênese Genética , Variação Genética , Genoma Humano , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico , Software
16.
BMC Bioinformatics ; 17(1): 400, 2016 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-27687569

RESUMO

BACKGROUND: The study of genomic variation has provided key insights into the functional role of mutations. Predominantly, studies have focused on single nucleotide variants (SNV), which are relatively easy to detect and can be described with rich mathematical models. However, it has been observed that genomes are highly plastic, and that whole regions can be moved, removed or duplicated in bulk. These structural variants (SV) have been shown to have significant impact on phenotype, but their study has been held back by the combinatorial complexity of the underlying models. RESULTS: We describe here a general model of structural variation that encompasses both balanced rearrangements and arbitrary copy-number variants (CNV). CONCLUSIONS: In this model, we show that the space of possible evolutionary histories that explain the structural differences between any two genomes can be sampled ergodically.

17.
Brief Bioinform ; 14(5): 548-55, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23793381

RESUMO

Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.


Assuntos
Biologia Computacional/educação , Instrução por Computador/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Comportamento Cooperativo , Internet , Ensino
18.
Bioinformatics ; 30(7): 1008-9, 2014 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-24363377

RESUMO

MOTIVATION: Using high-throughput sequencing, researchers are now generating hundreds of whole-genome assays to measure various features such as transcription factor binding, histone marks, DNA methylation or RNA transcription. Displaying so much data generally leads to a confusing accumulation of plots. We describe here a multithreaded library that computes statistics on large numbers of datasets (Wiggle, BigWig, Bed, BigBed and BAM), generating statistical summaries within minutes with limited memory requirements, whether on the whole genome or on selected regions. AVAILABILITY AND IMPLEMENTATION: The code is freely available under Apache 2.0 license at www.github.com/Ensembl/Wiggletools


Assuntos
Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biblioteca Genômica , Internet , Software
19.
BMC Bioinformatics ; 15: 206, 2014 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-24946830

RESUMO

BACKGROUND: Parsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation. RESULTS: We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph G, a finite set of AVGs describe all parsimonious interpretations of G, and this set can be explored with a few sampling moves. CONCLUSION: This theoretical study describes a model in which the inference of genome rearrangements and phylogeny can be unified under parsimony.


Assuntos
Evolução Molecular , Genoma , Algoritmos , Funções Verossimilhança , Modelos Genéticos
20.
Genome Res ; 21(9): 1512-28, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21665927

RESUMO

Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new "Cactus" alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.


Assuntos
Algoritmos , Genômica , Alinhamento de Sequência , Software , Animais , Simulação por Computador , Humanos , Camundongos , Primatas , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA