Pesquisa | BVS Aleitamento Materno

1.

Liao, Wen-Wei; Asri, Mobin; Ebler, Jana; Doerr, Daniel; Haukness, Marina; Hickey, Glenn; Lu, Shuangjia; Lucas, Julian K; Monlong, Jean; Abel, Haley J; Buonaiuto, Silvia; Chang, Xian H; Cheng, Haoyu; Chu, Justin; Colonna, Vincenza; Eizenga, Jordan M; Feng, Xiaowen; Fischer, Christian; Fulton, Robert S; Garg, Shilpa; Groza, Cristian; Guarracino, Andrea; Harvey, William T; Heumos, Simon; Howe, Kerstin; Jain, Miten; Lu, Tsung-Yu; Markello, Charles; Martin, Fergal J; Mitchell, Matthew W; Munson, Katherine M; Mwaniki, Moses Njagi; Novak, Adam M; Olsen, Hugh E; Pesout, Trevor; Porubsky, David; Prins, Pjotr; Sibbesen, Jonas A; Sirén, Jouni; Tomlinson, Chad; Villani, Flavia; Vollger, Mitchell R; Antonacci-Fulton, Lucinda L; Baid, Gunjan; Baker, Carl A; Belyaeva, Anastasiya; Billis, Konstantinos; Carroll, Andrew; Chang, Pi-Chuan; Cody, Sarah.

Nature ; 617(7960): 312-324, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-37165242

RESUMO

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

Assuntos

Genoma Humano , Genômica , Humanos , Diploide , Genoma Humano/genética , Haplótipos/genética , Análise de Sequência de DNA , Genômica/normas , Padrões de Referência , Estudos de Coortes , Alelos , Variação Genética

2.

Active and repressed biosynthetic gene clusters have spatially distinct chromosome states.

Nützmann, Hans-Wilhelm; Doerr, Daniel; Ramírez-Colmenero, América; Sotelo-Fonseca, Jesús Emiliano; Wegel, Eva; Di Stefano, Marco; Wingett, Steven W; Fraser, Peter; Hurst, Laurence; Fernandez-Valverde, Selene L; Osbourn, Anne.

Proc Natl Acad Sci U S A ; 117(24): 13800-13809, 2020 06 16.

Artigo em Inglês | MEDLINE | ID: mdl-32493747

RESUMO

While colocalization within a bacterial operon enables coexpression of the constituent genes, the mechanistic logic of clustering of nonhomologous monocistronic genes in eukaryotes is not immediately obvious. Biosynthetic gene clusters that encode pathways for specialized metabolites are an exception to the classical eukaryote rule of random gene location and provide paradigmatic exemplars with which to understand eukaryotic cluster dynamics and regulation. Here, using 3C, Hi-C, and Capture Hi-C (CHi-C) organ-specific chromosome conformation capture techniques along with high-resolution microscopy, we investigate how chromosome topology relates to transcriptional activity of clustered biosynthetic pathway genes in Arabidopsis thaliana Our analyses reveal that biosynthetic gene clusters are embedded in local hot spots of 3D contacts that segregate cluster regions from the surrounding chromosome environment. The spatial conformation of these cluster-associated domains differs between transcriptionally active and silenced clusters. We further show that silenced clusters associate with heterochromatic chromosomal domains toward the periphery of the nucleus, while transcriptionally active clusters relocate away from the nuclear periphery. Examination of chromosome structure at unrelated clusters in maize, rice, and tomato indicates that integration of clustered pathway genes into distinct topological domains is a common feature in plant genomes. Our results shed light on the potential mechanisms that constrain coexpression within clusters of nonhomologous eukaryotic genes and suggest that gene clustering in the one-dimensional chromosome is accompanied by compartmentalization of the 3D chromosome.

Assuntos

Arabidopsis/genética , Cromossomos de Plantas/genética , Família Multigênica , Proteínas de Plantas/genética , Solanum lycopersicum/genética , Zea mays/genética , Arabidopsis/metabolismo , Cromossomos de Plantas/metabolismo , Genoma de Planta , Solanum lycopersicum/metabolismo , Oryza/genética , Oryza/metabolismo , Proteínas de Plantas/metabolismo , Zea mays/metabolismo

3.

Horizontal Gene Transfer Phylogenetics: A Random Walk Approach.

Sevillya, Gur; Doerr, Daniel; Lerner, Yael; Stoye, Jens; Steel, Mike; Snir, Sagi.

Mol Biol Evol ; 37(5): 1470-1479, 2020 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-31845962

RESUMO

The dramatic decrease in time and cost for generating genetic sequence data has opened up vast opportunities in molecular systematics, one of which is the ability to decipher the evolutionary history of strains of a species. Under this fine systematic resolution, the standard markers are too crude to provide a phylogenetic signal. Nevertheless, among prokaryotes, genome dynamics in the form of horizontal gene transfer (HGT) between organisms and gene loss seem to provide far richer information by affecting both gene order and gene content. The "synteny index" (SI) between a pair of genomes combines these latter two factors, allowing comparison of genomes with unequal gene content, together with order considerations of their common genes. Although this approach is useful for classifying close relatives, no rigorous statistical modeling for it has been suggested. Such modeling is valuable, as it allows observed measures to be transformed into estimates of time periods during evolution, yielding the "additivity" of the measure. To the best of our knowledge, there is no other additivity proof for other gene order/content measures under HGT. Here, we provide a first statistical model and analysis for the SI measure. We model the "gene neighborhood" as a "birth-death-immigration" process affected by the HGT activity over the genome, and analytically relate the HGT rate and time to the expected SI. This model is asymptotic and thus provides accurate results, assuming infinite size genomes. Therefore, we also developed a heuristic model following an "exponential decay" function, accounting for biologically realistic values, which performed well in simulations. Applying this model to 1,133 prokaryotes partitioned to 39 clusters by the rank of genus yields that the average number of genome dynamics events per gene in the phylogenetic depth of genus is around half with significant variability between genera. This result extends and confirms similar results obtained for individual genera in different manners.

Assuntos

Transferência Genética Horizontal , Técnicas Genéticas , Modelos Genéticos , Sintenia , Genoma Microbiano , Filogenia

4.

Analysis of local genome rearrangement improves resolution of ancestral genomic maps in plants.

Rubert, Diego P; Martinez, Fábio V; Stoye, Jens; Doerr, Daniel.

BMC Genomics ; 21(Suppl 2): 273, 2020 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-32299356

RESUMO

BACKGROUND: Computationally inferred ancestral genomes play an important role in many areas of genome research. We present an improved workflow for the reconstruction from highly diverged genomes such as those of plants. RESULTS: Our work relies on an established workflow in the reconstruction of ancestral plants, but improves several steps of this process. Instead of using gene annotations for inferring the genome content of the ancestral sequence, we identify genomic markers through a process called genome segmentation. This enables us to reconstruct the ancestral genome from hundreds of thousands of markers rather than the tens of thousands of annotated genes. We also introduce the concept of local genome rearrangement, through which we refine syntenic blocks before they are used in the reconstruction of contiguous ancestral regions. With the enhanced workflow at hand, we reconstruct the ancestral genome of eudicots, a major sub-clade of flowering plants, using whole genome sequences of five modern plants. CONCLUSIONS: Our reconstructed genome is highly detailed, yet its layout agrees well with that reported in Badouin et al. (2017). Using local genome rearrangement, not only the marker-based, but also the gene-based reconstruction of the eudicot ancestor exhibited increased genome content, evidencing the power of this novel concept.

Assuntos

Mapeamento Cromossômico/métodos , Genômica/métodos , Magnoliopsida/genética , Simulação por Computador , Evolução Molecular , Ordem dos Genes , Genoma de Planta , Modelos Genéticos , Filogenia , Sintenia/genética

5.

GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

Schulz, Tizian; Stoye, Jens; Doerr, Daniel.

BMC Genomics ; 19(Suppl 5): 308, 2018 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-29745835

RESUMO

BACKGROUND: Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. RESULTS: We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called Î´-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called Î´-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering Î´-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. CONCLUSIONS: By identifying Î´-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

Assuntos

Algoritmos , Cromossomos/química , Gráficos por Computador , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Família Multigênica , Análise de Sequência de DNA/métodos , Animais , Análise por Conglomerados , Genômica , Humanos , Camundongos

6.

Identifying gene clusters by discovering common intervals in indeterminate strings.

Doerr, Daniel; Stoye, Jens; Böcker, Sebastian; Jahn, Katharina.

BMC Genomics ; 15 Suppl 6: S2, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25571793

RESUMO

BACKGROUND: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction. RESULTS: In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. CONCLUSIONS: Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.

Assuntos

Modelos Genéticos , Família Multigênica , Algoritmos , Conjuntos de Dados como Assunto , Genoma Bacteriano

7.

AGO, a Framework for the Reconstruction of Ancestral Syntenies and Gene Orders.

Cribbie, Evan P; Doerr, Daniel; Chauve, Cedric.

Methods Mol Biol ; 2802: 247-265, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38819563

RESUMO

Reconstructing ancestral gene orders from the genome data of extant species is an important problem in comparative and evolutionary genomics. In a phylogenomics setting that accounts for gene family evolution through gene duplication and gene loss, the reconstruction of ancestral gene orders involves several steps, including multiple sequence alignment, the inference of reconciled gene trees, and the inference of ancestral syntenies and gene adjacencies. For each of the steps of such a process, several methods can be used and implemented using a growing corpus of, often parameterized, tools; in practice, interfacing such tools into an ancestral gene order reconstruction pipeline is far from trivial. This chapter introduces AGO, a Python-based framework aimed at creating ancestral gene order reconstruction pipelines allowing to interface and parameterize different bioinformatics tools. The authors illustrate the features of AGO by reconstructing ancestral gene orders for the X chromosome of three ancestral Anopheles species using three different pipelines. AGO is freely available at https://github.com/cchauve/AGO-pipeline .

Assuntos

Evolução Molecular , Ordem dos Genes , Genômica , Filogenia , Software , Animais , Genômica/métodos , Biologia Computacional/métodos , Sintenia/genética , Anopheles/genética , Cromossomo X/genética , Alinhamento de Sequência/métodos

8.

Panacus: fast and exact pangenome growth and core size estimation.

Parmigiani, Luca; Garrison, Erik; Stoye, Jens; Marschall, Tobias; Doerr, Daniel.

bioRxiv ; 2024 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-38915671

RESUMO

Motivation: Using a single linear reference genome poses a limitation to exploring the full genomic diversity of a species. The release of a draft human pangenome underscores the increasing relevance of pangenomics to overcome these limitations. Pangenomes are commonly represented as graphs, which can represent billions of base pairs of sequence. Presently, there is a lack of scalable software able to perform key tasks on pangenomes, such as quantifying universally shared sequence across genomes (the core genome) and measuring the extent of genomic variability as a function of sample size (pangenome growth). Results: We introduce Panacus (pangenome-abacus), a tool designed to rapidly perform these tasks and visualize the results in interactive plots. Panacus can process GFA files, the accepted standard for pangenome graphs, and is able to analyze a human pangenome graph with 110 million nodes in less than one hour. Availability: Panacus is implemented in Rust and is published as Open Source software under the MIT license. The source code and documentation are available at https://github.com/marschall-lab/panacus. Panacus can be installed via Bioconda at https://bioconda.github.io/recipes/panacus/README.html.

9.

Family-Free Genome Comparison.

Braga, Marilia D V; Doerr, Daniel; Rubert, Diego P; Stoye, Jens.

Methods Mol Biol ; 2802: 57-72, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38819556

RESUMO

The comparison of large-scale genome structures across distinct species offers valuable insights into the species' phylogeny, genome organization, and gene associations. In this chapter, we review the family-free genome comparison tool FFGC that, relying on built-in interfaces with a sequence comparison tool (either BLAST+ or DIAMOND) and with an ILP solver (either CPLEX or Gurobi), provides several methods for analyses that do not require prior classification of genes across the studied genomes. Taking annotated genome sequences as input, FFGC is a complete workflow for genome comparison allowing not only the computation of measures of similarity and dissimilarity but also the inference of gene families, simultaneously based on sequence similarities and large-scale genomic features.

Assuntos

Genômica , Filogenia , Software , Genômica/métodos , Genoma , Biologia Computacional/métodos , Humanos

10.

Training an automated circulating tumor cell classifier when the true classification is uncertain.

Nanou, Afroditi; Stoecklein, Nikolas H; Doerr, Daniel; Driemel, Christiane; Terstappen, Leon W M M; Coumans, Frank A W.

PNAS Nexus ; 3(2): pgae048, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38371418

RESUMO

Circulating tumor cell (CTC) and tumor-derived extracellular vesicle (tdEV) loads are prognostic factors of survival in patients with carcinoma. The current method of CTC enumeration relies on operator review and, unfortunately, has moderate interoperator agreement (Fleiss' kappa 0.60) due to difficulties in classifying CTC-like events. We compared operator review, ACCEPT automated image processing, and refined the output of a deep-learning algorithm to identify CTC and tdEV for the prediction of survival in patients with metastatic and nonmetastatic cancers. Operator review is only defined for CTC. Refinement was performed using automatic contrast maximization CM-CTC of events detected in cancer and in benign samples (CM-CTC). We used 418 samples from benign diseases, 6,293 from nonmetastatic breast, 2,408 from metastatic breast, and 698 from metastatic prostate cancer to train, test, optimize, and evaluate CTC and tdEV enumeration. For CTC identification, the CM-CTC performed best on metastatic/nonmetastatic breast cancer, respectively, with a hazard ratio (HR) for overall survival of 2.6/2.1 vs. 2.4/1.4 for operator CTC and 1.2/0.8 for ACCEPT-CTC. For tdEV identification, CM-tdEV performed best with an HR of 1.6/2.9 vs. 1.5/1.0 with ACCEPT-tdEV. In conclusion, contrast maximization is effective even though it does not utilize domain knowledge.

11.

Constructing founder sets under allelic and non-allelic homologous recombination.

Bonnet, Konstantinn; Marschall, Tobias; Doerr, Daniel.

Algorithms Mol Biol ; 18(1): 15, 2023 Sep 29.

Artigo em Inglês | MEDLINE | ID: mdl-37775806

RESUMO

Homologous recombination between the maternal and paternal copies of a chromosome is a key mechanism for human inheritance and shapes population genetic properties of our species. However, a similar mechanism can also act between different copies of the same sequence, then called non-allelic homologous recombination (NAHR). This process can result in genomic rearrangements-including deletion, duplication, and inversion-and is underlying many genomic disorders. Despite its importance for genome evolution and disease, there is a lack of computational models to study genomic loci prone to NAHR. In this work, we propose such a computational model, providing a unified framework for both (allelic) homologous recombination and NAHR. Our model represents a set of genomes as a graph, where haplotypes correspond to walks through this graph. We formulate two founder set problems under our recombination model, provide flow-based algorithms for their solution, describe exact methods to characterize the number of recombinations, and demonstrate scalability to problem instances arising in practice.

12.

Gene family assignment-free comparative genomics.

Doerr, Daniel; Thévenin, Annelyse; Stoye, Jens.

BMC Bioinformatics ; 13 Suppl 19: S3, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23281826

RESUMO

BACKGROUND: The comparison of relative gene orders between two genomes offers deep insights into functional correlations of genes and the evolutionary relationships between the corresponding organisms. Methods for gene order analyses often require prior knowledge of homologies between all genes of the genomic dataset. Since such information is hard to obtain, it is common to predict homologous groups based on sequence similarity. These hypothetical groups of homologous genes are called gene families. RESULTS: This manuscript promotes a new branch of gene order studies in which prior assignment of gene families is not required. As a case study, we present a new similarity measure between pairs of genomes that is related to the breakpoint distance. We propose an exact and a heuristic algorithm for its computation. We evaluate our methods on a dataset comprising 12 Î³-proteobacteria from the literature. CONCLUSIONS: In evaluating our algorithms, we show that the exact algorithm is suitable for computations on small genomes. Moreover, the results of our heuristic are close to those of the exact algorithm. In general, we demonstrate that gene order studies can be improved by direct, gene family assignment-free comparisons.

Assuntos

Ordem dos Genes , Genoma Bacteriano/genética , Genômica/métodos , Família Multigênica , Análise de Sequência de DNA/métodos , Algoritmos , Gammaproteobacteria/genética

13.

Small parsimony for natural genomes in the DCJ-indel model.

Doerr, Daniel; Chauve, Cedric.

J Bioinform Comput Biol ; 19(6): 2140009, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34806948

RESUMO

The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered. In this work, we describe an Integer Linear Program algorithm to solve the SPP for natural genomes, i.e. genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments. We evaluate our algorithm on simulated data and show that it is able to reconstruct very efficiently and accurately ancestral gene orders in a very comprehensive evolutionary model.

Assuntos

Genoma , Modelos Genéticos , Algoritmos , Evolução Biológica , Evolução Molecular , Rearranjo Gênico , Filogenia

14.

Computing the Rearrangement Distance of Natural Genomes.

Bohnenkämper, Leonard; Braga, Marília D V; Doerr, Daniel; Stoye, Jens.

J Comput Biol ; 28(4): 410-431, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33393848

RESUMO

The computation of genomic distances has been a very active field of computational comparative genomics over the past 25 years. Substantial results include the polynomial-time computability of the inversion distance by Hannenhalli and Pevzner in 1995 and the introduction of the double cut and join distance by Yancopoulos et al. in 2005. Both results, however, rely on the assumption that the genomes under comparison contain the same set of unique markers (syntenic genomic regions, sometimes also referred to as genes). In 2015, Shao et al. relax this condition by allowing for duplicate markers in the analysis. This generalized version of the genomic distance problem is NP-hard, and they give an integer linear programming (ILP) solution that is efficient enough to be applied to real-world datasets. A restriction of their approach is that it can be applied only to balanced genomes that have equal numbers of duplicates of any marker. Therefore, it still needs a delicate preprocessing of the input data in which excessive copies of unbalanced markers have to be removed. In this article, we present an algorithm solving the genomic distance problem for natural genomes, in which any marker may occur an arbitrary number of times. Our method is based on a new graph data structure, the multi-relational diagram, that allows an elegant extension of the ILP by Shao et al. to count runs of markers that are under- or over-represented in one genome with respect to the other and need to be inserted or deleted, respectively. With this extension, previous restrictions on the genome configurations are lifted, for the first time enabling an uncompromising rearrangement analysis. Any marker sequence can directly be used for the distance calculation. The evaluation of our approach shows that it can be used to analyze genomes with up to a few 10,000 markers, which we demonstrate on simulated and real data.

Assuntos

Biologia Computacional , Rearranjo Gênico/genética , Genoma/genética , Genômica , Algoritmos , Modelos Genéticos , Programação Linear

15.

The potential of family-free rearrangements towards gene orthology inference.

Rubert, Diego P; Doerr, Daniel; Braga, Marília D V.

J Bioinform Comput Biol ; 19(6): 2140014, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34775922

RESUMO

Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.

Assuntos

Genoma , Modelos Genéticos , Algoritmos , Rearranjo Gênico , Genômica , Humanos

16.

Correction: Constructing founder sets under allelic and non-allelic homologous recombination.

Bonnet, Konstantinn; Marschall, Tobias; Doerr, Daniel.

Algorithms Mol Biol ; 18(1): 20, 2023 Dec 06.

Artigo em Inglês | MEDLINE | ID: mdl-38057863

17.

Sequence-Based Synteny Analysis of Multiple Large Genomes.

Doerr, Daniel; Moret, Bernard M E.

Methods Mol Biol ; 1704: 317-329, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29277871

RESUMO

Current methods for synteny analysis provide only limited support to study large genomes at the sequence level. In this chapter, we describe a pipeline based on existing tools that, applied in a suitable fashion, enables synteny analysis of large genomic datasets. We give a hands-on description of each step of the pipeline using four avian genomes for data. We also provide integration scripts that simplify the conversion and setup of data between the different tools in the pipeline.

Assuntos

Aves/genética , Genoma , Software , Sintenia , Algoritmos , Animais , Aves/classificação , Biologia Computacional , Marcadores Genéticos , Genômica/métodos , Análise de Sequência de DNA

18.

Family-Free Genome Comparison.

Doerr, Daniel; Feijão, Pedro; Stoye, Jens.

Methods Mol Biol ; 1704: 331-342, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29277872

RESUMO

The comparison of genome structures across distinct species offers valuable insights into the species' phylogeny, genome organization, and gene associations. In this chapter, we review the family-free genome comparison tool FFGC which provides several methods for gene order analyses that do not require prior knowledge of evolutionary relationships between the genes across the studied genomes. Moreover, the tool features a complete workflow for genome comparison, requiring nothing but annotated genome sequences as input.

Assuntos

Evolução Molecular , Ordem dos Genes , Genoma , Software , Biologia Computacional , Modelos Genéticos , Anotação de Sequência Molecular , Filogenia

19.

Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes.

Luhmann, Nina; Doerr, Daniel; Chauve, Cedric.

Microb Genom ; 3(9): e000123, 2017 09.

Artigo em Inglês | MEDLINE | ID: mdl-29114402

RESUMO

Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95â% of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains.

Assuntos

Mapeamento de Sequências Contíguas/métodos , DNA Antigo , Genoma Bacteriano , Peste/microbiologia , Yersinia pestis/genética , DNA Bacteriano , Evolução Molecular , França/epidemiologia , História do Século XVIII , História Medieval , Humanos , Londres/epidemiologia , Pandemias/história , Filogenia , Peste/epidemiologia , Peste/história

20.

The gene family-free median of three.

Doerr, Daniel; Balaban, Metin; Feijão, Pedro; Chauve, Cedric.

Algorithms Mol Biol ; 12: 14, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28559921

RESUMO

BACKGROUND: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. METHODS: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of [Formula: see text] and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. RESULTS: The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. CONCLUSIONS: We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA