Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Genome Res ; 34(5): 769-777, 2024 06 25.
Article in English | MEDLINE | ID: mdl-38866550

ABSTRACT

Gene prediction has remained an active area of bioinformatics research for a long time. Still, gene prediction in large eukaryotic genomes presents a challenge that must be addressed by new algorithms. The amount and significance of the evidence available from transcriptomes and proteomes vary across genomes, between genes, and even along a single gene. User-friendly and accurate annotation pipelines that can cope with such data heterogeneity are needed. The previously developed annotation pipelines BRAKER1 and BRAKER2 use RNA-seq or protein data, respectively, but not both. A further significant performance improvement integrating all three data types was made by the recently released GeneMark-ETP. We here present the BRAKER3 pipeline that builds on GeneMark-ETP and AUGUSTUS, and further improves accuracy using the TSEBRA combiner. BRAKER3 annotates protein-coding genes in eukaryotic genomes using both short-read RNA-seq and a large protein database, along with statistical models learned iteratively and specifically for the target genome. We benchmarked the new pipeline on genomes of 11 species under an assumed level of relatedness of the target species proteome to available proteomes. BRAKER3 outperforms BRAKER1 and BRAKER2. The average transcript-level F1-score is increased by about 20 percentage points on average, whereas the difference is most pronounced for species with large and complex genomes. BRAKER3 also outperforms other existing tools, MAKER2, Funannotate, and FINDER. The code of BRAKER3 is available on GitHub and as a ready-to-run Docker container for execution with Docker or Singularity. Overall, BRAKER3 is an accurate, easy-to-use tool for eukaryotic genome annotation.


Subject(s)
Molecular Sequence Annotation , Software , Molecular Sequence Annotation/methods , Humans , RNA-Seq/methods , Algorithms , Animals , Genome , Computational Biology/methods , Genomics/methods , Transcriptome
2.
bioRxiv ; 2024 Feb 29.
Article in English | MEDLINE | ID: mdl-37398387

ABSTRACT

Gene prediction has remained an active area of bioinformatics research for a long time. Still, gene prediction in large eukaryotic genomes presents a challenge that must be addressed by new algorithms. The amount and significance of the evidence available from transcriptomes and proteomes vary across genomes, between genes and even along a single gene. User-friendly and accurate annotation pipelines that can cope with such data heterogeneity are needed. The previously developed annotation pipelines BRAKER1 and BRAKER2 use RNA-seq or protein data, respectively, but not both. A further significant performance improvement was made by the recently released GeneMark-ETP integrating all three data types. We here present the BRAKER3 pipeline that builds on GeneMark-ETP and AUGUSTUS and further improves accuracy using the TSEBRA combiner. BRAKER3 annotates protein-coding genes in eukaryotic genomes using both short-read RNA-seq and a large protein database, along with statistical models learned iteratively and specifically for the target genome. We benchmarked the new pipeline on genomes of 11 species under assumed level of relatedness of the target species proteome to available proteomes. BRAKER3 outperformed BRAKER1 and BRAKER2. The average transcript-level F1-score was increased by ~20 percentage points on average, while the difference was most pronounced for species with large and complex genomes. BRAKER3 also outperformed other existing tools, MAKER2, Funannotate and FINDER. The code of BRAKER3 is available on GitHub and as a ready-to-run Docker container for execution with Docker or Singularity. Overall, BRAKER3 is an accurate, easy-to-use tool for eukaryotic genome annotation.

3.
J Hered ; 115(1): 86-93, 2024 Feb 03.
Article in English | MEDLINE | ID: mdl-37738158

ABSTRACT

Wildlife diseases, such as the sea star wasting (SSW) epizootic that outbroke in the mid-2010s, appear to be associated with acute and/or chronic abiotic environmental change; dissociating the effects of different drivers can be difficult. The sunflower sea star, Pycnopodia helianthoides, was the species most severely impacted during the SSW outbreak, which overlapped with periods of anomalous atmospheric and oceanographic conditions, and there is not yet a consensus on the cause(s). Genomic data may reveal underlying molecular signatures that implicate a subset of factors and, thus, clarify past events while also setting the scene for effective restoration efforts. To advance this goal, we used Pacific Biosciences HiFi long sequencing reads and Dovetail Omni-C proximity reads to generate a highly contiguous genome assembly that was then annotated using RNA-seq-informed gene prediction. The genome assembly is 484 Mb long, with contig N50 of 1.9 Mb, scaffold N50 of 21.8 Mb, BUSCO completeness score of 96.1%, and 22 major scaffolds consistent with prior evidence that sea star genomes comprise 22 autosomes. These statistics generally fall between those of other recently assembled chromosome-scale assemblies for two species in the distantly related asteroid genus Pisaster. These novel genomic resources for P. helianthoides will underwrite population genomic, comparative genomic, and phylogenomic analyses-as well as their integration across scales-of SSW and environmental stressors.


Subject(s)
Helianthus , Animals , Starfish/genetics , Genome , Genomics , Chromosomes
4.
Front Plant Sci ; 14: 1284478, 2023.
Article in English | MEDLINE | ID: mdl-38107002

ABSTRACT

Sour cherry (Prunus cerasus L.) is an important allotetraploid cherry species that evolved in the Caspian Sea and Black Sea regions from a hybridization of the tetraploid ground cherry (Prunus fruticosa Pall.) and an unreduced pollen of the diploid sweet cherry (P. avium L.) ancestor. Details of when and where the evolution of this species occurred are unclear, as well as the effect of hybridization on the genome structure. To gain insight, the genome of the sour cherry cultivar 'Schattenmorelle' was sequenced using Illumina NovaSeqTM and Oxford Nanopore long-read technologies, resulting in a ~629-Mbp pseudomolecule reference genome. The genome could be separated into two subgenomes, with subgenome PceS_a originating from P. avium and subgenome PceS_f originating from P. fruticosa. The genome also showed size reduction compared to ancestral species and traces of homoeologous sequence exchanges throughout. Comparative analysis confirmed that the genome of sour cherry is segmental allotetraploid and evolved very recently in the past.

5.
BMC Bioinformatics ; 24(1): 327, 2023 Aug 31.
Article in English | MEDLINE | ID: mdl-37653395

ABSTRACT

BACKGROUND: The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. RESULTS: Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein-to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. CONCLUSIONS: Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.


Subject(s)
Eukaryota , Eukaryotic Cells , Animals , Molecular Sequence Annotation , Transcriptome
6.
bioRxiv ; 2023 Apr 10.
Article in English | MEDLINE | ID: mdl-37090650

ABSTRACT

The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein- to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a previously unannotated land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.

7.
Genomics ; 113(6): 4173-4183, 2021 11.
Article in English | MEDLINE | ID: mdl-34774678

ABSTRACT

Cherries are stone fruits and belong to the economically important plant family of Rosaceae with worldwide cultivation of different species. The ground cherry, Prunus fruticosa Pall., is an ancestor of cultivated sour cherry, an important tetraploid cherry species. Here, we present a long read chromosome-level draft genome assembly and related plastid sequences using the Oxford Nanopore Technology PromethION platform and R10.3 pore type. We generated a final consensus genome sequence of 366 Mb comprising eight chromosomes. The N50 scaffold was ~44 Mb with the longest chromosome being 66.5 Mb. The chloroplast and mitochondrial genomes were 158,217 bp and 383,281 bp long, which is in accordance with previously published plastid sequences. This is the first report of the genome of ground cherry (P. fruticosa) sequenced by long read technology only. The datasets obtained from this study provide a foundation for future breeding, molecular and evolutionary analysis in Prunus studies.


Subject(s)
Physalis , Prunus , Chromosomes , Physalis/genetics , Plant Breeding , Prunus/genetics , Tetraploidy
8.
BMC Bioinformatics ; 22(1): 566, 2021 Nov 25.
Article in English | MEDLINE | ID: mdl-34823473

ABSTRACT

BACKGROUND: BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistical models. For training and prediction, BRAKER1 and BRAKER2 incorporate complementary extrinsic evidence: BRAKER1 uses only RNA-seq data while BRAKER2 uses only a database of cross-species proteins. The BRAKER suite has so far not been able to reliably exceed the accuracy of BRAKER1 and BRAKER2 when incorporating both types of evidence simultaneously. Currently, for a novel genome project where both RNA-seq and protein data are available, the best option is to run both pipelines independently, and to pick one, likely better output. Therefore, one or another type of the extrinsic evidence would remain unexploited. RESULTS: We present TSEBRA, a software that selects gene predictions (transcripts) from the sets generated by BRAKER1 and BRAKER2. TSEBRA uses a set of rules to compare scores of overlapping transcripts based on their support by RNA-seq and homologous protein evidence. We show in computational experiments on genomes of 11 species that TSEBRA achieves higher accuracy than either BRAKER1 or BRAKER2 running alone and that TSEBRA compares favorably with the combiner tool EVidenceModeler. CONCLUSION: TSEBRA is an easy-to-use and fast software tool. It can be used in concert with the BRAKER pipeline to generate a gene prediction set supported by both RNA-seq and homologous protein evidence.


Subject(s)
Genome , Software , Genomics , RNA-Seq , Sequence Analysis, RNA
9.
Carbohydr Polym ; 246: 116533, 2020 Oct 15.
Article in English | MEDLINE | ID: mdl-32747232

ABSTRACT

The heterogeneous sulfoethylation of cellulose, xylan, α-1,3-glucan, glucomannan, pullulan, curdlan, galactoglucomannan, and agarose was studied using sodium vinylsulfonate (NaVS) as reagent in presence of sodium hydroxide and iso-propanol (i-PrOH) as slurry medium. The influence of the concentration of polymer, water, and NaOH (solid or aqueous solution) on the degree of substitution (DS) was investigated. The sulfoethylation rendered the polysaccharides studied water-soluble. Sulfoethylation of heteropolysaccharides yielded products with higher DS compared to the conversion of homopolysaccharides. Structure characterization was carried out by means of 13C-NMR spectroscopy.


Subject(s)
Cellulose/chemistry , Glucans/chemistry , Mannans/chemistry , Sepharose/chemistry , Xylans/chemistry , beta-Glucans/chemistry , 2-Propanol/chemistry , Carbon-13 Magnetic Resonance Spectroscopy/methods , Dimethyl Sulfoxide/chemistry , Sodium Hydroxide/chemistry , Solubility , Water/chemistry
10.
Carbohydr Polym ; 207: 782-790, 2019 Mar 01.
Article in English | MEDLINE | ID: mdl-30600065

ABSTRACT

Novel non-charged and ionic xylan carbamate (XC) derivatives were synthesized in a modular approach from xylan phenyl carbonates (XPC) as reactive intermediates. XPC with varying degrees of substitution (DS) from 0.5 to 1.9 were converted with different non-ionic primary and secondary amines in different molar ratio to obtain the corresponding XC with high conversion rates of up to 100%. In a similar way, ionic amines were employed for the aminolysis of XPC to obtain charged XC. The XC were characterized by NMR- and infrared spectroscopy. XPC proofed to be highly versatile building blocks for the preparation of ionic xylan derivatives. The type and amount of charged groups could be tuned efficiently. Moreover, high DS values of up to 1.4 for cationic and 1.8 for anionic XC derivatives could be achieved, which is higher than reported previously for comparable ionic xylan derivatives that were prepared by "conventional" esterification and etherification reactions.

11.
Carbohydr Polym ; 193: 45-53, 2018 Aug 01.
Article in English | MEDLINE | ID: mdl-29773396

ABSTRACT

Xylan phenyl carbonate (XPC) derivatives were prepared and characterized comprehensively. By conversion of xylan with phenyl chloroformate either in dipolar aprotic solvents with LiCl or in an ionic liquid, XPC with degrees of substitution (DS) of up to 2.0, i.e., fully functionalized derivatives, could be obtained. The synthesis was studied with respect to the influence of different reaction parameters. It was found that the reaction medium as well as the type of starting xylan strongly affected the efficiency of the derivatization. The derivatives obtained were characterized by FT-IR- and NMR spectroscopy. Surprisingly, it was found that C-3 is the most reactive position in this particular reaction while substitution in position C-2 only occurred if the neighboring position C-3 already carried a phenyl carbonate group. XPC were found to form spherical nanoparticles (NP) of well-defined shape with diameters around 158 nm. These materials possess unique potential as activated NP for advanced applications.

13.
Chem Commun (Camb) ; 46(19): 3387-9, 2010 May 21.
Article in English | MEDLINE | ID: mdl-20358095

ABSTRACT

A seven-component reaction was accomplished by utilizing the different chemoselectivities of the Ugi-Mumm and the Ugi-Smiles reaction. The sequential multicomponent reactions led to highly diverse peptide and glycopeptide like structures.


Subject(s)
Aldehydes/chemistry , Amines/chemistry , Formaldehyde/chemistry , Nitriles/chemistry , Peptides/chemical synthesis , Molecular Structure , Peptides/chemistry , Stereoisomerism
SELECTION OF CITATIONS
SEARCH DETAIL