Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
1.
Proc Natl Acad Sci U S A ; 119(1)2022 01 04.
Article in English | MEDLINE | ID: mdl-34934012

ABSTRACT

Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication-informed collinear anchor identification between genomes and performs base pair-resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor-binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.


Subject(s)
Genome, Plant , Polymorphism, Genetic , Sequence Alignment , Software , Zea mays/genetics
2.
BMC Genomics ; 25(1): 515, 2024 May 25.
Article in English | MEDLINE | ID: mdl-38796435

ABSTRACT

BACKGROUND: The short-read whole-genome sequencing (WGS) approach has been widely applied to investigate the genomic variation in the natural populations of many plant species. With the rapid advancements in long-read sequencing and genome assembly technologies, high-quality genome sequences are available for a group of varieties for many plant species. These genome sequences are expected to help researchers comprehensively investigate any type of genomic variants that are missed by the WGS technology. However, multiple genome alignment (MGA) tools designed by the human genome research community might be unsuitable for plant genomes. RESULTS: To fill this gap, we developed the AnchorWave-Cactus Multiple Genome Alignment (ACMGA) pipeline, which improved the alignment of repeat elements and could identify long (> 50 bp) deletions or insertions (INDELs). We conducted MGA using ACMGA and Cactus for 8 Arabidopsis (Arabidopsis thaliana) and 26 Maize (Zea mays) de novo assembled genome sequences and compared them with the previously published short-read variant calling results. MGA identified more single nucleotide variants (SNVs) and long INDELs than did previously published WGS variant callings. Additionally, ACMGA detected significantly more SNVs and long INDELs in repetitive regions and the whole genome than did Cactus. Compared with the results of Cactus, the results of ACMGA were more similar to the previously published variants called using short-read. These two MGA pipelines identified numerous multi-allelic variants that were missed by the WGS variant calling pipeline. CONCLUSIONS: Aligning de novo assembled genome sequences could identify more SNVs and INDELs than mapping short-read. ACMGA combines the advantages of AnchorWave and Cactus and offers a practical solution for plant MGA by integrating global alignment, a 2-piece-affine-gap cost strategy, and the progressive MGA algorithm.


Subject(s)
Arabidopsis , Genome, Plant , Zea mays , Arabidopsis/genetics , Zea mays/genetics , Sequence Alignment , INDEL Mutation , Genomics/methods , Polymorphism, Single Nucleotide , Whole Genome Sequencing/methods , Software
3.
J Hered ; 2024 Sep 24.
Article in English | MEDLINE | ID: mdl-39316562

ABSTRACT

The African hunting dog (Lycaon pictus, 2n=78) once ranged over most sub-Saharan ecosystems except its deserts and rainforests. However as a result of (still ongoing) population declines, today they remain only as small fragmented populations. Furthermore, the future of the species remains unclear, due to both anthropogenic pressure as well as interactions with domestic dogs, thus their preservation is a conservation priority. On the tree of life, the hunting dog is basal to Canis and Cuon and forms a crown group with them, making it a useful species for comparative genomic studies. Here, we present a diploid chromosome level assembly of an African hunting dog. Assembled according to VGP guidelines from a combination of PacBio HiFi reads and HiC data, it is phased at the level of individual chromosomes. The maternal (pseudo)haplotype (mat) of our assembly has a length of 2.38 Gbp, and 99.36 % of the sequence is encompassed by 39 chromosomal scaffolds. The rest is included in only 36 unplaced short scaffolds. At the contig level, mat consists of only 166 contigs with an N50 of 39 Mbp. BUSCO analysis showed 95.4 % completeness based on Сarnivora conservative genes (carnivora_odb10). When compared to other available genomes from subtribe Canina, the quality of the assembly is excellent, typically between the 1st and 3rd depending on the parameter used, and a significant improvement on previously published genomes for the species. We hope this assembly will play an important role in future conservation efforts and comparative studies of canid genomes.

4.
Proc Natl Acad Sci U S A ; 118(45)2021 11 09.
Article in English | MEDLINE | ID: mdl-34725164

ABSTRACT

Microchromosomes, once considered unimportant shreds of the chicken genome, are gene-rich elements with a high GC content and few transposable elements. Their origin has been debated for decades. We used cytological and whole-genome sequence comparisons, and chromosome conformation capture, to trace their origin and fate in genomes of reptiles, birds, and mammals. We find that microchromosomes as well as macrochromosomes are highly conserved across birds and share synteny with single small chromosomes of the chordate amphioxus, attesting to their origin as elements of an ancient animal genome. Turtles and squamates (snakes and lizards) share different subsets of ancestral microchromosomes, having independently lost microchromosomes by fusion with other microchromosomes or macrochromosomes. Patterns of fusions were quite different in different lineages. Cytological observations show that microchromosomes in all lineages are spatially separated into a central compartment at interphase and during mitosis and meiosis. This reflects higher interaction between microchromosomes than with macrochromosomes, as observed by chromosome conformation capture, and suggests some functional coherence. In highly rearranged genomes fused microchromosomes retain most ancestral characteristics, but these may erode over evolutionary time; surprisingly, de novo microchromosomes have rapidly adopted high interaction. Some chromosomes of early-branching monotreme mammals align to several bird microchromosomes, suggesting multiple microchromosome fusions in a mammalian ancestor. Subsequently, multiple rearrangements fueled the extraordinary karyotypic diversity of therian mammals. Thus, microchromosomes, far from being aberrant genetic elements, represent fundamental building blocks of amniote chromosomes, and it is mammals, rather than reptiles and birds, that are atypical.


Subject(s)
Biological Evolution , Chordata/genetics , Chromosomes, Mammalian , Genome , Animals , Base Sequence , Conserved Sequence
5.
J Med Virol ; 95(1): e28395, 2023 01.
Article in English | MEDLINE | ID: mdl-36504122

ABSTRACT

Rapid and accurate diagnosis of infections is fundamental to containment of disease. Several monkeypox virus (MPV) real-time diagnostic assays have been recommended by the CDC; however, the specificity of the primers and probes in these assays for the ongoing MPV outbreak has not been investigated. We analyzed the primer and probe sequences present in the CDC recommended MPV generic real-time PCR assay by aligning those sequences against 1730 MPV complete genomes reported in 2022 worldwide. Sequence mismatches were found in 99.08% and 97.46% of genomes for the MPV generic forward and reverse primers, respectively. Mismatch-corrected primers were synthetized and compared to the generic assay for MPV detection. Results showed that the two primer-template mismatches resulted in a ~11-fold underestimation of initial template DNA in the reaction and 4-fold increase in the 95% LOD. We further evaluated the specificity of seven other real-time PCR assays used for MPV and orthopoxvirus (OPV) detection and identified two assays with the highest matching score (>99.6%) to the global MPV genome database in 2022. Genetic variations in the primer-probe regions across MPV genomes could indicate the temporal and spatial emergence pattern of monkeypox disease. Our results show that the current MPV real-time generic assay may not be optimal to accurately detect MPV, and the mismatch-corrected assay with full complementarity between primers and current MPV genomes could provide a more sensitive and accurate detection of MPV.


Subject(s)
Monkeypox virus , Mpox (monkeypox) , Humans , Monkeypox virus/genetics , Real-Time Polymerase Chain Reaction/methods , DNA Primers/genetics , Mpox (monkeypox)/diagnosis , Mpox (monkeypox)/epidemiology , Disease Outbreaks , Sensitivity and Specificity
6.
BMC Bioinformatics ; 23(1): 225, 2022 Jun 10.
Article in English | MEDLINE | ID: mdl-35689182

ABSTRACT

BACKGROUND: An important initial phase of arguably most homology search and alignment methods such as required for genome alignments is seed finding. The seed finding step is crucial to curb the runtime as potential alignments are restricted to and anchored at the sequence position pairs that constitute the seed. To identify seeds, it is good practice to use sets of spaced seed patterns, a method that locally compares two sequences and requires exact matches at certain positions only. RESULTS: We introduce a new method for filtering alignment seeds that we call geometric hashing. Geometric hashing achieves a high specificity by combining non-local information from different seeds using a simple hash function that only requires a constant and small amount of additional time per spaced seed. Geometric hashing was tested on the task of finding homologous positions in the coding regions of human and mouse genome sequences. Thereby, the number of false positives was decreased about million-fold over sets of spaced seeds while maintaining a very high sensitivity. CONCLUSIONS: An additional geometric hashing filtering phase could improve the run-time, accuracy or both of programs for various homology-search-and-align tasks.


Subject(s)
Algorithms , Genome , Animals , Mice , Sequence Alignment
7.
Genomics ; 113(5): 3174-3184, 2021 09.
Article in English | MEDLINE | ID: mdl-34293476

ABSTRACT

As mutations in SARS-CoV-2 virus accumulate rapidly, novel primers that amplify this virus sensitively and specifically are in demand. We have developed a webserver named CoVrimer by which users can search for and align existing or newly designed conserved/degenerate primer pair sequences against the viral genome and assess the mutation load of both primers and amplicons. CoVrimer uses mutation data obtained from an online platform established by NGDC-CNCB (12 May 2021) to identify genomic regions, either conserved or with low levels of mutations, from which potential primer pairs are designed and provided to the user for filtering based on generalized and SARS-CoV-2 specific parameters. Alignments of primers and probes can be visualized with respect to the reference genome, indicating variant details and the level of conservation. Consequently, CoVrimer is likely to help researchers with the challenges posed by viral evolution and is freely available at http://konulabapps.bilkent.edu.tr:3838/CoVrimer/.


Subject(s)
DNA Primers/chemistry , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , Software , Conserved Sequence , DNA Primers/genetics , Genome, Viral , Mutation
8.
BMC Genomics ; 22(1): 11, 2021 Jan 06.
Article in English | MEDLINE | ID: mdl-33407096

ABSTRACT

BACKGROUND: The genus Ehrlichia consists of tick-borne obligatory intracellular bacteria that can cause deadly diseases of medical and agricultural importance. Ehrlichia sp. HF, isolated from Ixodes ovatus ticks in Japan [also referred to as I. ovatus Ehrlichia (IOE) agent], causes acute fatal infection in laboratory mice that resembles acute fatal human monocytic ehrlichiosis caused by Ehrlichia chaffeensis. As there is no small laboratory animal model to study fatal human ehrlichiosis, Ehrlichia sp. HF provides a needed disease model. However, the inability to culture Ehrlichia sp. HF and the lack of genomic information have been a barrier to advance this animal model. In addition, Ehrlichia sp. HF has several designations in the literature as it lacks a taxonomically recognized name. RESULTS: We stably cultured Ehrlichia sp. HF in canine histiocytic leukemia DH82 cells from the HF strain-infected mice, and determined its complete genome sequence. Ehrlichia sp. HF has a single double-stranded circular chromosome of 1,148,904 bp, which encodes 866 proteins with a similar metabolic potential as E. chaffeensis. Ehrlichia sp. HF encodes homologs of all virulence factors identified in E. chaffeensis, including 23 paralogs of P28/OMP-1 family outer membrane proteins, type IV secretion system apparatus and effector proteins, two-component systems, ankyrin-repeat proteins, and tandem repeat proteins. Ehrlichia sp. HF is a novel species in the genus Ehrlichia, as demonstrated through whole genome comparisons with six representative Ehrlichia species, subspecies, and strains, using average nucleotide identity, digital DNA-DNA hybridization, and core genome alignment sequence identity. CONCLUSIONS: The genome of Ehrlichia sp. HF encodes all known virulence factors found in E. chaffeensis, substantiating it as a model Ehrlichia species to study fatal human ehrlichiosis. Comparisons between Ehrlichia sp. HF and E. chaffeensis will enable identification of in vivo virulence factors that are related to host specificity, disease severity, and host inflammatory responses. We propose to name Ehrlichia sp. HF as Ehrlichia japonica sp. nov. (type strain HF), to denote the geographic region where this bacterium was initially isolated.


Subject(s)
Ehrlichia chaffeensis , Ehrlichiosis , Ixodes , Animals , Dogs , Ehrlichia chaffeensis/genetics , Ehrlichiosis/veterinary , Genome, Bacterial , Japan , Mice
9.
BMC Bioinformatics ; 21(1): 48, 2020 Feb 06.
Article in English | MEDLINE | ID: mdl-32028880

ABSTRACT

BACKGROUND: The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to gene-, centric modes of inferring paths of sequence evolution are increasingly relevant. Customarily, homologous sequences derived from the same direct ancestor, whose ancestral position in two genomes is usually conserved, are termed "primary" (or "positional") orthologs. Methods based solely on similarity don't reliably distinguish primary orthologs from other homologs; for this, genomic context is often essential. Context-dependent identification of orthologs traditionally relies on genomic context over length scales characteristic of conserved gene order or whole-genome sequence alignment, and can be computationally intensive. RESULTS: We demonstrate that short-range sequence context-as short as a single "maximal" match- distinguishes primary orthologs from other homologs across whole genomes. On mammalian whole genomes not preprocessed by repeat-masker, potential orthologs are extracted by genome intersection as "non-nested maximal matches:" maximal matches that are not nested into other maximal matches. It emerges that on both nucleotide and gene scales, non-nested maximal matches recapitulate primary or positional orthologs with high precision and high recall, while the corresponding computation consumes less than one thirtieth of the computation time required by commonly applied whole-genome alignment methods. In regions of genomes that would be masked by repeat-masker, non-nested maximal matches recover orthologs that are inaccessible to Lastz net alignment, for which repeat-masking is a prerequisite. mmRBHs, reciprocal best hits of genes containing non-nested maximal matches, yield novel putative orthologs, e.g. around 1000 pairs of genes for human-chimpanzee. CONCLUSIONS: We describe an intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context. Ortholog identification based on non-nested maximal matches is parameter-free, and less computationally intensive than many alignment-based methods. It is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes. We are agnostic as to the reasons for its effectiveness, which may reflect local variation of mean mutation rate.


Subject(s)
Evolution, Molecular , Genomics/methods , Animals , Genome , Humans , Mammals/genetics , Sequence Homology
10.
BMC Genomics ; 21(Suppl 2): 274, 2020 Apr 16.
Article in English | MEDLINE | ID: mdl-32299360

ABSTRACT

BACKGROUND: The term pan-genome was proposed to denominate collections of genomic sequences jointly analyzed or used as a reference. The constant growth of genomic data intensifies development of data structures and algorithms to investigate pan-genomes efficiently. RESULTS: This work focuses on providing a tool for discovering and visualizing the relationships between the sequences constituting a pan-genome. A new structure to represent such relationships - called affinity tree - is proposed. Each node of this tree has assigned a subset of genomes, as well as their homogeneity level and averaged consensus sequence. Moreover, subsets assigned to sibling nodes form a partition of the genomes assigned to their parent. CONCLUSIONS: Functionality of affinity tree is demonstrated on simulated data and on the Ebola virus pan-genome. Furthermore, two software packages are provided: PangTreeBuild constructs affinity tree, while PangTreeVis presents its result.


Subject(s)
Ebolavirus/genetics , Genomics/methods , Algorithms , Computational Biology , Computer Simulation , Databases, Genetic , Models, Genetic , Phylogeny , Sequence Alignment , Software
11.
Int J Mol Sci ; 20(11)2019 Jun 02.
Article in English | MEDLINE | ID: mdl-31159510

ABSTRACT

Acidovorax citrulli (A. citrulli) strains cause bacterial fruit blotch (BFB) in cucurbit crops and affect melon significantly. Numerous strains of the bacterium have been isolated from melon hosts globally. Strains that are aggressively virulent towards melon and diagnostic markers for detecting such strains are yet to be identified. Using a cross-inoculation assay, we demonstrated that two Korean strains of A. citrulli, NIHHS15-280 and KACC18782, are highly virulent towards melon but avirulent/mildly virulent to the other cucurbit crops. The whole genomes of three A. citrulli strains isolated from melon and three from watermelon were aligned, allowing the design of three primer sets (AcM13, AcM380, and AcM797) that are specific to melon host strains, from three pathogenesis-related genes. These primers successfully detected the target strain NIHHS15-280 in polymerase chain reaction (PCR) assays from a very low concentration of bacterial gDNA. They were also effective in detecting the target strains from artificially infected leaf, fruit, and seed washing suspensions, without requiring the extraction of bacterial DNA. This is the first report of PCR-based markers that offer reliable, sensitive, and rapid detection of strains of A. citrulli causing BFB in melon. These markers may also be useful in early disease detection in the field samples, in seed health tests, and for international quarantine purposes.


Subject(s)
Comamonadaceae/isolation & purification , Cucurbitaceae/microbiology , Plant Diseases/microbiology , Comamonadaceae/genetics , Crops, Agricultural/microbiology , DNA, Bacterial/analysis , DNA, Bacterial/genetics , Fruit/microbiology , Genome, Bacterial , Polymerase Chain Reaction
12.
BMC Genomics ; 19(1): 47, 2018 01 15.
Article in English | MEDLINE | ID: mdl-29334898

ABSTRACT

BACKGROUND: The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes. RESULTS: We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses. CONCLUSIONS: By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformatics.


Subject(s)
Genomics/methods , Sequence Alignment/methods , Phylogeny , Software
13.
Mol Genet Genomics ; 293(6): 1507-1522, 2018 Dec.
Article in English | MEDLINE | ID: mdl-30099586

ABSTRACT

Aflatoxins are toxic secondary metabolites produced by members of the genus Aspergillus, most notably A. flavus. Non-aflatoxigenic strains of A. flavus are commonly used for biocontrol of the aflatoxigenic strains to reduce aflatoxins in corn, cotton, peanuts and tree nuts. However, genomic differences between aflatoxigenic strains and non-aflatoxigenic strains have not been reported in detail, though such differences may further elucidate the evolutionary histories of certain biocontrol strains and help guide development of other useful strains. We recently reported the genome and transcriptome sequencing of A. flavus WRRL 1519, a strain isolated from almond that does not produce aflatoxins or cyclopiazonic acid due to deletions in the biosynthetic gene clusters. Continued bioinformatics analyses focused on comparing strain WRRL 1519 to the aflatoxigenic strain NRRL 3357. The genome assembly of strain WRRL 1519 was improved by anchoring 84 of the 127 scaffolds to the putative nuclear chromosomes of strain NRRL 3357. The five largest areas of extrachromosomal mismatches observed between WRRL 1519 and NRRL 3357 were not similar to any of the mismatches that were observed with pairwise comparisons of NRRL 3357 to other non-aflatoxigenic strains NRRL 21882, NRRL 30797 or NRRL 18543. Comparisons of predicted secondary metabolite gene clusters uncovered two other biosynthetic gene clusters in which strain WRRL 1519 had large deletions compared to the homologous clusters in NRRL 3357. Additionally, there was a marked overrepresentation of repetitive sequences in WRRL 1519 compared to other inspected A. flavus strains. This is the first report of detection of a large number of putative retrotransposons in any A. flavus strain, initially suggesting that retrotransposons may contribute to the natural occurrence of genetic variation and biocontrol strains. However, the transposons may not be significantly associated with the chromosomal differences. Future experimentation and continued bioinformatics analyses will potentially illuminate causes of the differences and may reveal whether transposon activity in A. flavus can lead to random natural occurrences of non-aflatoxigenic strains.


Subject(s)
Aspergillus flavus/genetics , Biological Control Agents , Chromosomes, Fungal/genetics , DNA Transposable Elements/genetics , Genetic Variation , Chromosome Mapping , DNA Copy Number Variations , Evolution, Molecular , Gene Dosage , Species Specificity
14.
BMC Bioinformatics ; 18(1): 338, 2017 Jul 12.
Article in English | MEDLINE | ID: mdl-28701187

ABSTRACT

BACKGROUND: Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. RESULTS: We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. CONCLUSIONS: We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license.


Subject(s)
DNA/chemistry , User-Computer Interface , Base Sequence , Genomics , Internet , Sequence Alignment
15.
BMC Genomics ; 18(1): 332, 2017 04 27.
Article in English | MEDLINE | ID: mdl-28449639

ABSTRACT

BACKGROUND: The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. RESULTS: CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CONCLUSIONS: CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.


Subject(s)
Cloud Computing , Genomics/methods , Software , Automation , Genome, Microbial/genetics , Sequence Alignment , Sequence Analysis
16.
J Integr Plant Biol ; 57(11): 980-91, 2015 Nov.
Article in English | MEDLINE | ID: mdl-25809845

ABSTRACT

DNA markers play important roles in plant breeding and genetics. The Insertion/Deletion (InDel) marker is one kind of co-dominant DNA markers widely used due to its low cost and high precision. However, the canonical way of searching for InDel markers is time-consuming and labor-intensive. We developed an end-to-end computational solution (InDel Markers Development Platform, IMDP) to identify genome-wide InDel markers under a graphic pipeline environment. IMDP constitutes assembled genome sequences alignment pipeline (AGA-pipe) and next-generation re-sequencing data mapping pipeline (NGS-pipe). With AGA-pipe we are able to identify 12,944 markers between the genome of rice cultivars Nipponbare and 93-11. Using NGS-pipe, we reported 34,794 InDels from re-sequencing data of rice cultivars Wu-Yun-Geng7 and Guang-Lu-Ai4. Combining AGA-pipe and NGS-pipe, we developed 205,659 InDels in eight japonica and nine indica cultivars and 2,681 InDels showed a subgroup-specific pattern. Polymerase chain reaction (PCR) analysis of subgroup-specific markers indicated that the precision reached 90% (86 of 95). Finally, to make them available to the public, we have integrated the InDels/markers information into a website (Rice InDel Marker Database, RIMD, http://202.120.45.71/). The application of IMDP in rice will facilitate efficiency for development of genome-wide InDel markers, in addition it can be used in other species with reference genome sequences and NGS data.


Subject(s)
Genomics/methods , INDEL Mutation , Oryza/genetics , Genetic Markers
17.
bioRxiv ; 2024 May 13.
Article in English | MEDLINE | ID: mdl-38798446

ABSTRACT

Investigating collinearity between chromosomes is often used in comparative genomics to help identify gene orthologs, pinpoint genes that might have been overlooked as part of annotation processes and/or perform various evolutionary inferences. Collinear segments, also known as syntenic blocks, can be inferred from sequence alignments and/or from the identification of genes arrayed in the same order and relative orientations between investigated genomes. To help perform these analyses and assess their outcomes, we built a simple pipeline called SYNY (for synteny) that implements the two distinct approaches and produces different visualizations. The SYNY pipeline was built with ease of use in mind and runs on modest hardware. The pipeline is written in Perl and Python and is available on GitHub (https://github.com/PombertLab/SYNY) under the permissive MIT license.

18.
Sci Total Environ ; 912: 169229, 2024 Feb 20.
Article in English | MEDLINE | ID: mdl-38072259

ABSTRACT

The anthranilic diamide insecticide chlorantraniliprole has been extensively applied to control Lepidoptera pests. However, its overuse leads to the development of resistance and accumulation of residue in the environment. Four P450s (CYP6CV5, CYP9A68, CYP321F3, and CYP324A12) were first found to be constitutively overexpressed in an SSB CAP-resistant strain. It is imperative to further elucidate the molecular mechanisms underlying P450s-mediated CAP resistance for mitigating its environmental contamination. Here, we heterologously expressed these four P450s in insect cells and evaluated their abilities to metabolize CAP. Western blotting and reduced CO difference spectrum tests showed that these four P450 proteins had been successfully expressed in Sf9 cells, which are indicative of active functional enzymes. The recombinant proteins CYP6CV5, CYP9A68, CYP321F3, and CYP324A12 exhibited a preference for metabolizing the fluorescent P450 model probe substrates EC, BFC, EFC, and EC with enzyme activities of 0.54, 0.67, 0.57, and 0.46 pmol/min/pmol P450, respectively. In vitro metabolism revealed distinct CAP metabolic rates (0.97, 0.86, 0.75, and 0.55 pmol/min/pmol P450) and efficiencies (0.45, 0.37, 0.30, and 0.17) of the four recombinant P450 enzymes, thereby elucidating different protein catalytic activities. Furthermore, molecular model docking confirmed metabolic differences and efficiencies of these P450s and unveiled the hydroxylation reaction in generating N-demethylation and methylphenyl hydroxylation during CAP metabolism. Our findings not only first provide new insights into the mechanisms of P450s-mediated metabolic resistance to CAP at the protein level in SSB but also demonstrate significant differences in the capacities of multiple P450s for insecticide degradation and facilitate the evaluation and mitigation of toxic risks associated with CAP application in the environment.


Subject(s)
Insecticides , Lepidoptera , Animals , Cytochrome P-450 Enzyme System/metabolism , ortho-Aminobenzoates
19.
ArXiv ; 2024 Sep 13.
Article in English | MEDLINE | ID: mdl-39314498

ABSTRACT

Summary: With the rapid development of long-read sequencing technologies, the era of individual complete genomes is approaching. We have developed wgatools, a cross-platform, ultrafast toolkit that supports a range of whole genome alignment (WGA) formats, offering practical tools for conversion, processing, statistical evaluation, and visualization of alignments, thereby facilitating population-level genome analysis and advancing functional and evolutionary genomics. Availability and Implementation: wgatools supports diverse formats and can process, filter, and statistically evaluate alignments, perform alignment-based variant calling, and visualize alignments both locally and genome-wide. Built with Rust for efficiency and safe memory usage, it ensures fast performance and can handle large datasets consisting of hundreds of genomes. wgatools is published as free software under the MIT open-source license, and its source code is freely available at https://github.com/wjwei-handsome/wgatools.

20.
Methods Mol Biol ; 2802: 165-187, 2024.
Article in English | MEDLINE | ID: mdl-38819560

ABSTRACT

Newly sequenced genomes are being added to the tree of life at an unprecedented fast pace. A large proportion of such new genomes are phylogenetically close to previously sequenced and annotated genomes. In other cases, whole clades of closely related species or strains ought to be annotated simultaneously. Often, in subsequent studies, differences between the closely related species or strains are in the focus of research when the shared gene structures prevail. We here review methods for comparative structural genome annotation. The reviewed methods include classical approaches such as the alignment of protein sequences or protein profiles against the genome and comparative gene prediction methods that exploit a genome alignment to annotate either a single target genome or all input genomes simultaneously. We discuss how the methods depend on the phylogenetic placement of genomes, give advice on the choice of methods, and examine the consistency between gene structure annotations in an example. Furthermore, we provide practical advice on genome annotation in general.


Subject(s)
Genomics , Molecular Sequence Annotation , Phylogeny , Molecular Sequence Annotation/methods , Genomics/methods , Computational Biology/methods , Genome/genetics , Sequence Alignment/methods , Software
SELECTION OF CITATIONS
SEARCH DETAIL