Search | VHL Regional Portal

1.

Rapid and sensitive detection of genome contamination at scale with FCS-GX.

Astashyn, Alexander; Tvedte, Eric S; Sweeney, Deacon; Sapojnikov, Victor; Bouk, Nathan; Joukov, Victor; Mozes, Eyal; Strope, Pooja K; Sylla, Pape M; Wagner, Lukas; Bidwell, Shelby L; Brown, Larissa C; Clark, Karen; Davis, Emily W; Smith-White, Brian; Hlavina, Wratko; Pruitt, Kim D; Schneider, Valerie A; Murphy, Terence D.

Genome Biol ; 25(1): 60, 2024 Feb 26.

Article in English | MEDLINE | ID: mdl-38409096

ABSTRACT

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .

Subject(s)

Databases, Nucleic Acid , Genome , Software

2.

RNA viruses, M satellites, chromosomal killer genes, and killer/nonkiller phenotypes in the 100-genomes S. cerevisiae strains.

Vijayraghavan, Sriram; Kozmin, Stanislav G; Strope, Pooja K; Skelly, Daniel A; Magwene, Paul M; Dietrich, Fred S; McCusker, John H.

G3 (Bethesda) ; 13(10)2023 09 30.

Article in English | MEDLINE | ID: mdl-37497616

ABSTRACT

We characterized previously identified RNA viruses (L-A, L-BC, 20S, and 23S), L-A-dependent M satellites (M1, M2, M28, and Mlus), and M satellite-dependent killer phenotypes in the Saccharomyces cerevisiae 100-genomes genetic resource population. L-BC was present in all strains, albeit in 2 distinct levels, L-BChi and L-BClo; the L-BC level is associated with the L-BC genotype. L-BChi, L-A, 20S, 23S, M1, M2, and Mlus (M28 was absent) were in fewer strains than the similarly inherited 2µ plasmid. Novel L-A-dependent phenotypes were identified. Ten M+ strains exhibited M satellite-dependent killing (K+) of at least 1 of the naturally M0 and cured M0 derivatives of the 100-genomes strains; in these M0 strains, sensitivities to K1+, K2+, and K28+ strains varied. Finally, to complement our M satellite-encoded killer toxin analysis, we assembled the chromosomal KHS1 and KHR1 killer genes and used naturally M0 and cured M0 derivatives of the 100-genomes strains to assess and characterize the chromosomal killer phenotypes.

Subject(s)

RNA Viruses , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , RNA, Viral/genetics , RNA, Double-Stranded , RNA Viruses/genetics , Phenotype

3.

Rapid and sensitive detection of genome contamination at scale with FCS-GX.

Astashyn, Alexander; Tvedte, Eric S; Sweeney, Deacon; Sapojnikov, Victor; Bouk, Nathan; Joukov, Victor; Mozes, Eyal; Strope, Pooja K; Sylla, Pape M; Wagner, Lukas; Bidwell, Shelby L; Clark, Karen; Davis, Emily W; Smith-White, Brian; Hlavina, Wratko; Pruitt, Kim D; Schneider, Valerie A; Murphy, Terence D.

bioRxiv ; 2023 06 06.

Article in English | MEDLINE | ID: mdl-37292984

ABSTRACT

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 minutes. Testing FCS-GX on artificially fragmented genomes demonstrates sensitivity >95% for diverse contaminant species and specificity >99.93%. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination (0.16% of total bases), with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/.

4.

Mitochondrial Genome Variation Affects Multiple Respiration and Nonrespiration Phenotypes in Saccharomyces cerevisiae.

Vijayraghavan, Sriram; Kozmin, Stanislav G; Strope, Pooja K; Skelly, Daniel A; Lin, Zhenguo; Kennell, John; Magwene, Paul M; Dietrich, Fred S; McCusker, John H.

Genetics ; 211(2): 773-786, 2019 02.

Article in English | MEDLINE | ID: mdl-30498022

ABSTRACT

Mitochondrial genome variation and its effects on phenotypes have been widely analyzed in higher eukaryotes but less so in the model eukaryote Saccharomyces cerevisiae Here, we describe mitochondrial genome variation in 96 diverse S. cerevisiae strains and assess associations between mitochondrial genotype and phenotypes as well as nuclear-mitochondrial epistasis. We associate sensitivity to the ATP synthase inhibitor oligomycin with SNPs in the mitochondrially encoded ATP6 gene. We describe the use of iso-nuclear F1 pairs, the mitochondrial genome equivalent of reciprocal hemizygosity analysis, to identify and analyze mitochondrial genotype-dependent phenotypes. Using iso-nuclear F1 pairs, we analyze the oligomycin phenotype-ATP6 association and find extensive nuclear-mitochondrial epistasis. Similarly, in iso-nuclear F1 pairs, we identify many additional mitochondrial genotype-dependent respiration phenotypes, for which there was no association in the 96 strains, and again find extensive nuclear-mitochondrial epistasis that likely contributes to the lack of association in the 96 strains. Finally, in iso-nuclear F1 pairs, we identify novel mitochondrial genotype-dependent nonrespiration phenotypes: resistance to cycloheximide, ketoconazole, and copper. We discuss potential mechanisms and the implications of mitochondrial genotype and of nuclear-mitochondrial epistasis effects on respiratory and nonrespiratory quantitative traits.

Subject(s)

Genome, Mitochondrial , Phenotype , Polymorphism, Genetic , Saccharomyces cerevisiae/genetics , Antifungal Agents/toxicity , Cell Respiration/genetics , Copper/toxicity , Cycloheximide/toxicity , Drug Resistance, Fungal/genetics , Epistasis, Genetic , Ketoconazole/toxicity , Mitochondrial Proton-Translocating ATPases/genetics , Polymorphism, Single Nucleotide , Saccharomyces cerevisiae/drug effects , Saccharomyces cerevisiae Proteins/genetics

5.

Improving taxonomic accuracy for fungi in public sequence databases: applying 'one name one species' in well-defined genera with Trichoderma/Hypocrea as a test case.

Robbertse, Barbara; Strope, Pooja K; Chaverri, Priscila; Gazis, Romina; Ciufo, Stacy; Domrachev, Michael; Schoch, Conrad L.

Database (Oxford) ; 20172017 01 01.

Article in English | MEDLINE | ID: mdl-29220466

ABSTRACT

The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data. This database is focused on sequences obtained from type material stored in public collections. While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks. We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published. In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database. A name status report is available here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species. Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions. During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications. Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation. Database URL:http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353

Subject(s)

Databases, Nucleic Acid , Fungal Proteins/genetics , Trichoderma/classification , Trichoderma/genetics

6.

2µ plasmid in Saccharomyces species and in Saccharomyces cerevisiae.

Strope, Pooja K; Kozmin, Stanislav G; Skelly, Daniel A; Magwene, Paul M; Dietrich, Fred S; McCusker, John H.

FEMS Yeast Res ; 15(8)2015 Dec.

Article in English | MEDLINE | ID: mdl-26463005

ABSTRACT

We determined that extrachromosomal 2µ plasmid was present in 67 of the Saccharomyces cerevisiae 100-genome strains; in addition to variation in the size and copy number of 2µ, we identified three distinct classes of 2µ. We identified 2µ presence/absence and class associations with populations, clinical origin and nuclear genotypes. We also screened genome sequences of S. paradoxus, S. kudriavzevii, S. uvarum, S. eubayanus, S. mikatae, S. arboricolus and S. bayanus strains for both integrated and extrachromosomal 2µ. Similar to S. cerevisiae, we found no integrated 2µ sequences in any S. paradoxus strains. However, we identified part of 2µ integrated into the genomes of some S. uvarum, S. kudriavzevii, S. mikatae and S. bayanus strains, which were distinct from each other and from all extrachromosomal 2µ. We identified extrachromosomal 2µ in one S. paradoxus, one S. eubayanus, two S. bayanus and 13 S. uvarum strains. The extrachromosomal 2µ in S. paradoxus, S. eubayanus and S. cerevisiae were distinct from each other. In contrast, the extrachromosomal 2µ in S. bayanus and S. uvarum strains were identical with each other and with one of the three classes of S. cerevisiae 2µ, consistent with interspecific transfer.

Subject(s)

Interspersed Repetitive Sequences , Plasmids , Saccharomyces/genetics , Genetic Variation , Saccharomyces/classification

7.

The 100-genomes strains, an S. cerevisiae resource that illuminates its natural phenotypic and genotypic variation and emergence as an opportunistic pathogen.

Strope, Pooja K; Skelly, Daniel A; Kozmin, Stanislav G; Mahadevan, Gayathri; Stone, Eric A; Magwene, Paul M; Dietrich, Fred S; McCusker, John H.

Genome Res ; 25(5): 762-74, 2015 May.

Article in English | MEDLINE | ID: mdl-25840857

ABSTRACT

Saccharomyces cerevisiae, a well-established model for species as diverse as humans and pathogenic fungi, is more recently a model for population and quantitative genetics. S. cerevisiae is found in multiple environments-one of which is the human body-as an opportunistic pathogen. To aid in the understanding of the S. cerevisiae population and quantitative genetics, as well as its emergence as an opportunistic pathogen, we sequenced, de novo assembled, and extensively manually edited and annotated the genomes of 93 S. cerevisiae strains from multiple geographic and environmental origins, including many clinical origin strains. These 93 S. cerevisiae strains, the genomes of which are near-reference quality, together with seven previously sequenced strains, constitute a novel genetic resource, the "100-genomes" strains. Our sequencing coverage, high-quality assemblies, and annotation provide unprecedented opportunities for detailed interrogation of complex genomic loci, examples of which we demonstrate. We found most phenotypic variation to be quantitative and identified population, genotype, and phenotype associations. Importantly, we identified clinical origin associations. For example, we found that an introgressed PDR5 was present exclusively in clinical origin mosaic group strains; that the mosaic group was significantly enriched for clinical origin strains; and that clinical origin strains were much more copper resistant, suggesting that copper resistance contributes to fitness in the human host. The 100-genomes strains are a novel, multipurpose resource to advance the study of S. cerevisiae population genetics, quantitative genetics, and the emergence of an opportunistic pathogen.

Subject(s)

Contig Mapping/methods , Genome, Fungal , Genotype , Phenotype , Polymorphism, Genetic , Saccharomyces cerevisiae/genetics , Sequence Alignment/methods , Phylogeny , Saccharomyces cerevisiae/classification , Saccharomyces cerevisiae/pathogenicity , Virulence/genetics

8.

Structures of naturally evolved CUP1 tandem arrays in yeast indicate that these arrays are generated by unequal nonhomologous recombination.

Zhao, Ying; Strope, Pooja K; Kozmin, Stanislav G; McCusker, John H; Dietrich, Fred S; Kokoska, Robert J; Petes, Thomas D.

G3 (Bethesda) ; 4(11): 2259-69, 2014 Sep 17.

Article in English | MEDLINE | ID: mdl-25236733

ABSTRACT

An important issue in genome evolution is the mechanism by which tandem duplications are generated from single-copy genes. In the yeast Saccharomyces cerevisiae, most strains contain tandemly duplicated copies of CUP1, a gene that encodes a copper-binding metallothionein. By screening 101 natural isolates of S. cerevisiae, we identified five different types of CUP1-containing repeats, as well as strains that only had one copy of CUP1. A comparison of the DNA sequences of these strains indicates that the CUP1 tandem arrays were generated by unequal nonhomologous recombination events from strains that had one CUP1 gene.

Subject(s)

Gene Duplication , Homologous Recombination , Metallothionein/genetics , Saccharomyces cerevisiae/genetics , Evolution, Molecular

9.

Molecular evolution of urea amidolyase and urea carboxylase in fungi.

Strope, Pooja K; Nickerson, Kenneth W; Harris, Steven D; Moriyama, Etsuko N.

BMC Evol Biol ; 11: 80, 2011 Mar 29.

Article in English | MEDLINE | ID: mdl-21447149

ABSTRACT

BACKGROUND: Urea amidolyase breaks down urea into ammonia and carbon dioxide in a two-step process, while another enzyme, urease, does this in a one step-process. Urea amidolyase has been found only in some fungal species among eukaryotes. It contains two major domains: the amidase and urea carboxylase domains. A shorter form of urea amidolyase is known as urea carboxylase and has no amidase domain. Eukaryotic urea carboxylase has been found only in several fungal species and green algae. In order to elucidate the evolutionary origin of urea amidolyase and urea carboxylase, we studied the distribution of urea amidolyase, urea carboxylase, as well as other proteins including urease, across kingdoms. RESULTS: Among the 64 fungal species we examined, only those in two Ascomycota classes (Sordariomycetes and Saccharomycetes) had the urea amidolyase sequences. Urea carboxylase was found in many but not all of the species in the phylum Basidiomycota and in the subphylum Pezizomycotina (phylum Ascomycota). It was completely absent from the class Saccharomycetes (phylum Ascomycota; subphylum Saccharomycotina). Four Sordariomycetes species we examined had both the urea carboxylase and the urea amidolyase sequences. Phylogenetic analysis showed that these two enzymes appeared to have gone through independent evolution since their bacterial origin. The amidase domain and the urea carboxylase domain sequences from fungal urea amidolyases clustered strongly together with the amidase and urea carboxylase sequences, respectively, from a small number of beta- and gammaproteobacteria. On the other hand, fungal urea carboxylase proteins clustered together with another copy of urea carboxylases distributed broadly among bacteria. The urease proteins were found in all the fungal species examined except for those of the subphylum Saccharomycotina. CONCLUSIONS: We conclude that the urea amidolyase genes currently found only in fungi are the results of a horizontal gene transfer event from beta-, gamma-, or related species of proteobacteria. The event took place before the divergence of the subphyla Pezizomycotina and Saccharomycotina but after the divergence of the subphylum Taphrinomycotina. Urea carboxylase genes currently found in fungi and other limited organisms were also likely derived from another ancestral gene in bacteria. Our study presented another important example showing plastic and opportunistic genome evolution in bacteria and fungi and their evolutionary interplay.

Subject(s)

Carbon-Nitrogen Ligases/genetics , Evolution, Molecular , Fungi/enzymology , Fungi/genetics , Bacteria/enzymology , Bacteria/genetics , Carbon-Nitrogen Ligases/chemistry , Fungi/metabolism , Gene Transfer, Horizontal , Phylogeny , Protein Structure, Tertiary , Sequence Homology, Amino Acid

10.

Simple alignment-free methods for protein classification: a case study from G-protein-coupled receptors.

Strope, Pooja K; Moriyama, Etsuko N.

Genomics ; 89(5): 602-12, 2007 May.

Article in English | MEDLINE | ID: mdl-17336495

ABSTRACT

Computational methods of predicting protein functions rely on detecting similarities among proteins. However, sufficient sequence information is not always available for some protein families. For example, proteins of interest may be new members of a divergent protein family. The performance of protein classification methods could vary in such challenging situations. Using the G-protein-coupled receptor superfamily as an example, we investigated the performance of several protein classifiers. Alignment-free classifiers based on support vector machines using simple amino acid compositions were effective in remote-similarity detection even from short fragmented sequences. Although it is computationally expensive, a support vector machine classifier using local pairwise alignment scores showed very good balanced performance. More commonly used profile hidden Markov models were generally highly specific and well suited to classifying well-established protein family members. It is suggested that different types of protein classifiers should be applied to gain the optimal mining power.

Subject(s)

Amino Acids/chemistry , Classification/methods , Receptors, G-Protein-Coupled/classification , Algorithms , Amino Acids/analysis , Animals , Drosophila melanogaster/genetics , Expressed Sequence Tags/chemistry , Markov Chains , Models, Chemical , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/metabolism

11.

Mining the Arabidopsis thaliana genome for highly-divergent seven transmembrane receptors.

Moriyama, Etsuko N; Strope, Pooja K; Opiyo, Stephen O; Chen, Zhongying; Jones, Alan M.

Genome Biol ; 7(10): R96, 2006.

Article in English | MEDLINE | ID: mdl-17064408

ABSTRACT

To identify divergent seven-transmembrane receptor (7TMR) candidates from the Arabidopsis thaliana genome, multiple protein classification methods were combined, including both alignment-based and alignment-free classifiers. This resolved problems in optimally training individual classifiers using limited and divergent samples, and increased stringency for candidate proteins. We identified 394 proteins as 7TMR candidates and highlighted 54 with corresponding expression patterns for further investigation.

Subject(s)

Arabidopsis/genetics , Genetic Variation , Genome, Plant , Receptors, Cell Surface/genetics , Arabidopsis Proteins/genetics , Databases, Protein , Gene Expression Profiling , Genetic Vectors , Markov Chains

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL