Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Proc Natl Acad Sci U S A ; 118(45)2021 11 09.
Article in English | MEDLINE | ID: mdl-34725164

ABSTRACT

Microchromosomes, once considered unimportant shreds of the chicken genome, are gene-rich elements with a high GC content and few transposable elements. Their origin has been debated for decades. We used cytological and whole-genome sequence comparisons, and chromosome conformation capture, to trace their origin and fate in genomes of reptiles, birds, and mammals. We find that microchromosomes as well as macrochromosomes are highly conserved across birds and share synteny with single small chromosomes of the chordate amphioxus, attesting to their origin as elements of an ancient animal genome. Turtles and squamates (snakes and lizards) share different subsets of ancestral microchromosomes, having independently lost microchromosomes by fusion with other microchromosomes or macrochromosomes. Patterns of fusions were quite different in different lineages. Cytological observations show that microchromosomes in all lineages are spatially separated into a central compartment at interphase and during mitosis and meiosis. This reflects higher interaction between microchromosomes than with macrochromosomes, as observed by chromosome conformation capture, and suggests some functional coherence. In highly rearranged genomes fused microchromosomes retain most ancestral characteristics, but these may erode over evolutionary time; surprisingly, de novo microchromosomes have rapidly adopted high interaction. Some chromosomes of early-branching monotreme mammals align to several bird microchromosomes, suggesting multiple microchromosome fusions in a mammalian ancestor. Subsequently, multiple rearrangements fueled the extraordinary karyotypic diversity of therian mammals. Thus, microchromosomes, far from being aberrant genetic elements, represent fundamental building blocks of amniote chromosomes, and it is mammals, rather than reptiles and birds, that are atypical.


Subject(s)
Biological Evolution , Chordata/genetics , Chromosomes, Mammalian , Genome , Animals , Base Sequence , Conserved Sequence
2.
BMC Bioinformatics ; 21(1): 48, 2020 Feb 06.
Article in English | MEDLINE | ID: mdl-32028880

ABSTRACT

BACKGROUND: The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to gene-, centric modes of inferring paths of sequence evolution are increasingly relevant. Customarily, homologous sequences derived from the same direct ancestor, whose ancestral position in two genomes is usually conserved, are termed "primary" (or "positional") orthologs. Methods based solely on similarity don't reliably distinguish primary orthologs from other homologs; for this, genomic context is often essential. Context-dependent identification of orthologs traditionally relies on genomic context over length scales characteristic of conserved gene order or whole-genome sequence alignment, and can be computationally intensive. RESULTS: We demonstrate that short-range sequence context-as short as a single "maximal" match- distinguishes primary orthologs from other homologs across whole genomes. On mammalian whole genomes not preprocessed by repeat-masker, potential orthologs are extracted by genome intersection as "non-nested maximal matches:" maximal matches that are not nested into other maximal matches. It emerges that on both nucleotide and gene scales, non-nested maximal matches recapitulate primary or positional orthologs with high precision and high recall, while the corresponding computation consumes less than one thirtieth of the computation time required by commonly applied whole-genome alignment methods. In regions of genomes that would be masked by repeat-masker, non-nested maximal matches recover orthologs that are inaccessible to Lastz net alignment, for which repeat-masking is a prerequisite. mmRBHs, reciprocal best hits of genes containing non-nested maximal matches, yield novel putative orthologs, e.g. around 1000 pairs of genes for human-chimpanzee. CONCLUSIONS: We describe an intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context. Ortholog identification based on non-nested maximal matches is parameter-free, and less computationally intensive than many alignment-based methods. It is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes. We are agnostic as to the reasons for its effectiveness, which may reflect local variation of mean mutation rate.


Subject(s)
Evolution, Molecular , Genomics/methods , Animals , Genome , Humans , Mammals/genetics , Sequence Homology
3.
Int J Mol Sci ; 20(11)2019 Jun 02.
Article in English | MEDLINE | ID: mdl-31159510

ABSTRACT

Acidovorax citrulli (A. citrulli) strains cause bacterial fruit blotch (BFB) in cucurbit crops and affect melon significantly. Numerous strains of the bacterium have been isolated from melon hosts globally. Strains that are aggressively virulent towards melon and diagnostic markers for detecting such strains are yet to be identified. Using a cross-inoculation assay, we demonstrated that two Korean strains of A. citrulli, NIHHS15-280 and KACC18782, are highly virulent towards melon but avirulent/mildly virulent to the other cucurbit crops. The whole genomes of three A. citrulli strains isolated from melon and three from watermelon were aligned, allowing the design of three primer sets (AcM13, AcM380, and AcM797) that are specific to melon host strains, from three pathogenesis-related genes. These primers successfully detected the target strain NIHHS15-280 in polymerase chain reaction (PCR) assays from a very low concentration of bacterial gDNA. They were also effective in detecting the target strains from artificially infected leaf, fruit, and seed washing suspensions, without requiring the extraction of bacterial DNA. This is the first report of PCR-based markers that offer reliable, sensitive, and rapid detection of strains of A. citrulli causing BFB in melon. These markers may also be useful in early disease detection in the field samples, in seed health tests, and for international quarantine purposes.


Subject(s)
Comamonadaceae/isolation & purification , Cucurbitaceae/microbiology , Plant Diseases/microbiology , Comamonadaceae/genetics , Crops, Agricultural/microbiology , DNA, Bacterial/analysis , DNA, Bacterial/genetics , Fruit/microbiology , Genome, Bacterial , Polymerase Chain Reaction
4.
BMC Genomics ; 19(1): 47, 2018 01 15.
Article in English | MEDLINE | ID: mdl-29334898

ABSTRACT

BACKGROUND: The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes. RESULTS: We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses. CONCLUSIONS: By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformatics.


Subject(s)
Genomics/methods , Sequence Alignment/methods , Phylogeny , Software
5.
BMC Bioinformatics ; 18(1): 338, 2017 Jul 12.
Article in English | MEDLINE | ID: mdl-28701187

ABSTRACT

BACKGROUND: Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. RESULTS: We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. CONCLUSIONS: We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license.


Subject(s)
DNA/chemistry , User-Computer Interface , Base Sequence , Genomics , Internet , Sequence Alignment
6.
BMC Genomics ; 18(1): 332, 2017 04 27.
Article in English | MEDLINE | ID: mdl-28449639

ABSTRACT

BACKGROUND: The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. RESULTS: CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CONCLUSIONS: CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.


Subject(s)
Cloud Computing , Genomics/methods , Software , Automation , Genome, Microbial/genetics , Sequence Alignment , Sequence Analysis
7.
Genome Biol ; 24(1): 223, 2023 10 05.
Article in English | MEDLINE | ID: mdl-37798615

ABSTRACT

Crop pangenomes made from individual cultivar assemblies promise easy access to conserved genes, but genome content variability and inconsistent identifiers hamper their exploration. To address this, we define pangenes, which summarize a species coding potential and link back to original annotations. The protocol get_pangenes performs whole genome alignments (WGA) to call syntenic gene models based on coordinate overlaps. A benchmark with small and large plant genomes shows that pangenes recapitulate phylogeny-based orthologies and produce complete soft-core gene sets. Moreover, WGAs support lift-over and help confirm gene presence-absence variation. Source code and documentation: https://github.com/Ensembl/plant-scripts .


Subject(s)
Genome, Plant , Software
8.
Data Brief ; 39: 107586, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34849384

ABSTRACT

Escherichia coli species exhibits a high genomic diversification from evolution, mobile genetic elements and recombination. An environmental E. coli isolate, 'JHI_5025' from a crop trial appeared to be clonally related to the historical reference isolate E. coli K-12 strain 'MG1655', warranting further genomic analysis. Their genomes share an average nucleotide identity of 99.74% and whole genome alignment showed little rearrangement of the JHI_5025 sequence compared to the reference. Five genomic islands not in the reference aligned to other sequences in the Enterobacteriaceae. Isolate JHI_5025 contained E. coli K-12 F plasmid sequence and at least one complete prophage sequence. The genome and comparison dataset provides utility of E. coli JHI_5025 as a representative contemporary genetic mimic of a well-known and much used workhorse strain.

9.
Methods Mol Biol ; 1910: 121-147, 2019.
Article in English | MEDLINE | ID: mdl-31278663

ABSTRACT

Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.


Subject(s)
Computational Biology , Genome , Genomics , Sequence Alignment , Algorithms , Computational Biology/methods , Databases, Genetic , Evolution, Molecular , Genome-Wide Association Study , Genomics/methods , Sequence Alignment/methods
10.
Microbiome ; 6(1): 15, 2018 01 18.
Article in English | MEDLINE | ID: mdl-29347966

ABSTRACT

BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. RESULTS: We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. CONCLUSIONS: reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.


Subject(s)
Databases, Genetic , Gastrointestinal Microbiome , Metagenomics/methods , Skin/microbiology , Access to Information , Algorithms , Computational Biology/methods , Humans , Phylogeny , Sequence Alignment , Sequence Analysis, DNA
11.
Methods Mol Biol ; 1704: 55-78, 2018.
Article in English | MEDLINE | ID: mdl-29277863

ABSTRACT

Bacteria and archaea, collectively known as prokaryotes, have in general genomes that are much smaller than those of eukaryotes. As a result, thousands of these genomes have been sequenced. In prokaryotes, gene architecture lacks the intron-exon structure of eukaryotic genes (with an occasional exception). These two facts mean that there is an abundance of data for prokaryotic genomes, and that they are easier to study than the more complex eukaryotic genomes. In this chapter, we provide an overview of genome comparison tools that have been developed primarily (sometimes exclusively) for prokaryotic genomes. We cover methods that use only the DNA sequences, methods that use only the gene content, and methods that use both data types.


Subject(s)
Algorithms , Genome, Archaeal , Genome, Bacterial , Genomics/methods , Computational Biology , Evolution, Molecular , Genes, Archaeal , Genes, Bacterial , Phylogeny , Sequence Alignment , Sequence Analysis, DNA , Software
SELECTION OF CITATIONS
SEARCH DETAIL