Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
Genome Biol ; 25(1): 60, 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38409096

ABSTRACT

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .


Subject(s)
Databases, Nucleic Acid , Genome , Software
2.
bioRxiv ; 2023 06 06.
Article in English | MEDLINE | ID: mdl-37292984

ABSTRACT

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 minutes. Testing FCS-GX on artificially fragmented genomes demonstrates sensitivity >95% for diverse contaminant species and specificity >99.93%. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination (0.16% of total bases), with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/.

3.
PLoS Biol ; 17(6): e3000294, 2019 06.
Article in English | MEDLINE | ID: mdl-31158217

ABSTRACT

A morphospecies is defined as a taxonomic species based wholly on morphology, but often morphospecies consist of clusters of cryptic species that can be identified genetically or molecularly. The nature of the evolutionary novelty that accompanies speciation in a morphospecies is an intriguing question. Morphospecies are particularly common among ciliates, a group of unicellular eukaryotes that separates 2 kinds of nuclei-the silenced germline nucleus (micronucleus [MIC]) and the actively expressed somatic nucleus (macronucleus [MAC])-within a common cytoplasm. Because of their very similar morphologies, members of the Tetrahymena genus are considered a morphospecies. We explored the hidden genomic evolution within this genus by performing a comprehensive comparative analysis of the somatic genomes of 10 species and the germline genomes of 2 species of Tetrahymena. These species show high genetic divergence; phylogenomic analysis suggests that the genus originated about 300 million years ago (Mya). Seven universal protein domains are preferentially included among the species-specific (i.e., the youngest) Tetrahymena genes. In particular, leucine-rich repeat (LRR) genes make the largest contribution to the high level of genome divergence of the 10 species. LRR genes can be sorted into 3 different age groups. Parallel evolutionary trajectories have independently occurred among LRR genes in the different Tetrahymena species. Thousands of young LRR genes contain tandem arrays of exactly 90-bp exons. The introns separating these exons show a unique, extreme phase 2 bias, suggesting a clonal origin and successive expansions of 90-bp-exon LRR genes. Identifying LRR gene age groups allowed us to document a Tetrahymena intron length cycle. The youngest 90-bp exon LRR genes in T. thermophila are concentrated in pericentromeric and subtelomeric regions of the 5 micronuclear chromosomes, suggesting that these regions act as genome innovation centers. Copies of a Tetrahymena Long interspersed element (LINE)-like retrotransposon are very frequently found physically adjacent to 90-bp exon/intron repeat units of the youngest LRR genes. We propose that Tetrahymena species have used a massive exon-shuffling mechanism, involving unequal crossing over possibly in concert with retrotransposition, to create the unique 90-bp exon array LRR genes.


Subject(s)
Genomics/methods , Species Specificity , Tetrahymena/genetics , Biological Evolution , Evolution, Molecular , Exons , Genome, Protozoan , Introns , Leucine-Rich Repeat Proteins , Phylogeny , Proteins/genetics , Tetrahymena/metabolism
4.
Elife ; 52016 11 28.
Article in English | MEDLINE | ID: mdl-27892853

ABSTRACT

The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena's germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum.


Subject(s)
Gene Rearrangement , Genome, Protozoan , Tetrahymena thermophila/genetics , Sequence Analysis, DNA
5.
Plant Cell Physiol ; 56(1): e1, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25432968

ABSTRACT

Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The website (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant 'mines' such as ThaleMine and PhytoMine, and other model organism databases (MODs). In addition to these new features, we continue to provide keyword- and locus identifier-based searches served via a Chado-backed Tripal Instance, a BLAST search interface and bulk downloads of data sets from the iPlant Data Store (iDS). Finally, we maintain an E-mail helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific data sets from the community.


Subject(s)
Computational Biology , Databases, Genetic , Genome, Plant/genetics , Medicago truncatula/genetics , User-Computer Interface , Information Storage and Retrieval , Internet
6.
Genome Biol ; 15(6): R77, 2014 Jun 10.
Article in English | MEDLINE | ID: mdl-24916971

ABSTRACT

BACKGROUND: Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus. RESULTS: We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event. CONCLUSIONS: Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes.


Subject(s)
Brassica/genetics , Genome, Plant , Transcriptome , Aneuploidy , Brassica/metabolism , Chromosome Mapping , DNA Methylation , Epigenesis, Genetic , Evolution, Molecular , Gene Expression Regulation, Plant , Molecular Sequence Annotation , Molecular Sequence Data , Sequence Analysis, DNA
7.
Nature ; 455(7214): 757-63, 2008 Oct 09.
Article in English | MEDLINE | ID: mdl-18843361

ABSTRACT

The human malaria parasite Plasmodium vivax is responsible for 25-40% of the approximately 515 million annual cases of malaria worldwide. Although seldom fatal, the parasite elicits severe and incapacitating clinical symptoms and often causes relapses months after a primary infection has cleared. Despite its importance as a major human pathogen, P. vivax is little studied because it cannot be propagated continuously in the laboratory except in non-human primates. We sequenced the genome of P. vivax to shed light on its distinctive biological features, and as a means to drive development of new drugs and vaccines. Here we describe the synteny and isochore structure of P. vivax chromosomes, and show that the parasite resembles other malaria parasites in gene content and metabolic potential, but possesses novel gene families and potential alternative invasion pathways not recognized previously. Completion of the P. vivax genome provides the scientific community with a valuable resource that can be used to advance investigation into this neglected species.


Subject(s)
Genome, Protozoan/genetics , Genomics , Malaria, Vivax/parasitology , Plasmodium vivax/genetics , Amino Acid Motifs , Animals , Artemisinins/metabolism , Artemisinins/pharmacology , Atovaquone/metabolism , Atovaquone/pharmacology , Cell Nucleus/genetics , Chromosomes/genetics , Conserved Sequence/genetics , Erythrocytes/parasitology , Evolution, Molecular , Haplorhini/parasitology , Humans , Isochores/genetics , Ligands , Malaria, Vivax/metabolism , Multigene Family , Plasmodium vivax/drug effects , Plasmodium vivax/pathogenicity , Plasmodium vivax/physiology , Sequence Analysis, DNA , Species Specificity , Synteny/genetics
8.
PLoS Pathog ; 3(10): 1401-13, 2007 Oct 19.
Article in English | MEDLINE | ID: mdl-17953480

ABSTRACT

Babesia bovis is an apicomplexan tick-transmitted pathogen of cattle imposing a global risk and severe constraints to livestock health and economic development. The complete genome sequence was undertaken to facilitate vaccine antigen discovery, and to allow for comparative analysis with the related apicomplexan hemoprotozoa Theileria parva and Plasmodium falciparum. At 8.2 Mbp, the B. bovis genome is similar in size to that of Theileria spp. Structural features of the B. bovis and T. parva genomes are remarkably similar, and extensive synteny is present despite several chromosomal rearrangements. In contrast, B. bovis and P. falciparum, which have similar clinical and pathological features, have major differences in genome size, chromosome number, and gene complement. Chromosomal synteny with P. falciparum is limited to microregions. The B. bovis genome sequence has allowed wide scale analyses of the polymorphic variant erythrocyte surface antigen protein (ves1 gene) family that, similar to the P. falciparum var genes, is postulated to play a role in cytoadhesion, sequestration, and immune evasion. The approximately 150 ves1 genes are found in clusters that are distributed throughout each chromosome, with an increased concentration adjacent to a physical gap on chromosome 1 that contains multiple ves1-like sequences. ves1 clusters are frequently linked to a novel family of variant genes termed smorfs that may themselves contribute to immune evasion, may play a role in variant erythrocyte surface antigen protein biology, or both. Initial expression analysis of ves1 and smorf genes indicates coincident transcription of multiple variants. B. bovis displays a limited metabolic potential, with numerous missing pathways, including two pathways previously described for the P. falciparum apicoplast. This reduced metabolic potential is reflected in the B. bovis apicoplast, which appears to have fewer nuclear genes targeted to it than other apicoplast containing organisms. Finally, comparative analyses have identified several novel vaccine candidates including a positional homolog of p67 and SPAG-1, Theileria sporozoite antigens targeted for vaccine development. The genome sequence provides a greater understanding of B. bovis metabolism and potential avenues for drug therapies and vaccine development.


Subject(s)
Babesia bovis/genetics , DNA, Protozoan/analysis , Genes, Protozoan , Plasmodium falciparum/genetics , Theileria parva/genetics , Animals , Antigens, Protozoan/immunology , Babesia bovis/immunology , Babesia bovis/metabolism , Babesiosis/parasitology , Base Sequence , Carrier Proteins/genetics , Carrier Proteins/immunology , Carrier Proteins/metabolism , Chromosomes , DNA, Complementary/analysis , Evolution, Molecular , Genomic Library , Molecular Sequence Data , Plasmodium falciparum/immunology , Plasmodium falciparum/metabolism , Protozoan Proteins/genetics , Protozoan Proteins/immunology , Protozoan Proteins/metabolism , Sequence Analysis, DNA , Species Specificity , Synteny , Theileria parva/immunology , Theileria parva/metabolism
9.
Science ; 315(5809): 207-12, 2007 Jan 12.
Article in English | MEDLINE | ID: mdl-17218520

ABSTRACT

We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.


Subject(s)
Genome, Protozoan , Sequence Analysis, DNA , Trichomonas vaginalis/genetics , Animals , Biological Transport/genetics , DNA Transposable Elements , DNA, Protozoan/genetics , Gene Transfer, Horizontal , Genes, Protozoan , Humans , Hydrogen/metabolism , Metabolic Networks and Pathways/genetics , Molecular Sequence Data , Multigene Family , Organelles/metabolism , Oxidative Stress/genetics , Peptide Hydrolases/genetics , Peptide Hydrolases/metabolism , Protozoan Proteins/genetics , Protozoan Proteins/physiology , RNA Processing, Post-Transcriptional , Repetitive Sequences, Nucleic Acid , Sexually Transmitted Diseases/parasitology , Trichomonas Infections/parasitology , Trichomonas Infections/transmission , Trichomonas vaginalis/cytology , Trichomonas vaginalis/metabolism , Trichomonas vaginalis/pathogenicity
10.
Eur J Immunol ; 35(6): 1859-68, 2005 Jun.
Article in English | MEDLINE | ID: mdl-15864779

ABSTRACT

Duffy antigen is the receptor used by Plasmodium vivax to invade erythrocytes. Consequently, individuals lacking Duffy antigen [Fy(-)] do not develop blood-stage infections. We hypothesized that naturally exposed Fy(-) humans may develop immune responses mainly to pre-erythrocytic stages and could be used to study acquired immunity to P. vivax and to identify liver-stage antigens. We report here that antibody and IFN-gamma responses to known sporozoite antigens were significantly induced by natural exposure in Fy(-) humans, whereas responses to blood-stage antigens were significantly induced in Fy(+) humans. IFN-gamma responses to sporozoite antigens were lower in Fy(+) than in Fy(-) humans, indicating that in Fy(+) humans blood-stage infections may have suppressed T cell responses to pre-erythrocytic stages. We evaluated the immune responses to 18 novel P. vivax homologs of P. falciparum sporozoite proteins identified from the P. vivax genome sequence. Eight proteins recalled IFN-gamma responses in P. vivax-exposed but not in unexposed individuals. Of these, 3 antigens elicited IFN-gamma responses in Fy(-) but not in Fy(+) individuals. These results suggest that differential immune responses observed in naturally exposed Fy(-) and Fy(+) individuals can be exploited to identify P. vivax stage-specific antigens.


Subject(s)
Antigens, Protozoan/immunology , Duffy Blood-Group System/analysis , Erythrocytes/parasitology , Liver/parasitology , Plasmodium vivax/immunology , Adult , Animals , Cross Reactions , Female , Genome, Protozoan , Humans , Interferon-gamma/biosynthesis , Male , Middle Aged , Plasmodium vivax/genetics
11.
Science ; 307(5706): 82-6, 2005 Jan 07.
Article in English | MEDLINE | ID: mdl-15637271

ABSTRACT

Plasmodium berghei and Plasmodium chabaudi are widely used model malaria species. Comparison of their genomes, integrated with proteomic and microarray data, with the genomes of Plasmodium falciparum and Plasmodium yoelii revealed a conserved core of 4500 Plasmodium genes in the central regions of the 14 chromosomes and highlighted genes evolving rapidly because of stage-specific selective pressures. Four strategies for gene expression are apparent during the parasites' life cycle: (i) housekeeping; (ii) host-related; (iii) strategy-specific related to invasion, asexual replication, and sexual development; and (iv) stage-specific. We observed posttranscriptional gene silencing through translational repression of messenger RNA during sexual development, and a 47-base 3' untranslated region motif is implicated in this process.


Subject(s)
Genome, Protozoan , Life Cycle Stages , Plasmodium/growth & development , Plasmodium/genetics , Proteome/analysis , 3' Untranslated Regions , Animals , Anopheles/parasitology , Computational Biology , Evolution, Molecular , Gene Expression Profiling , Gene Silencing , Genes, Protozoan , Malaria/parasitology , Oligonucleotide Array Sequence Analysis , Plasmodium/metabolism , Plasmodium berghei/genetics , Plasmodium berghei/growth & development , Plasmodium berghei/metabolism , Plasmodium chabaudi/genetics , Plasmodium chabaudi/growth & development , Plasmodium chabaudi/metabolism , Plasmodium falciparum/genetics , Plasmodium falciparum/growth & development , Plasmodium falciparum/metabolism , Plasmodium yoelii/genetics , Plasmodium yoelii/growth & development , Plasmodium yoelii/metabolism , Proteomics , Protozoan Proteins/analysis , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA, Protozoan/genetics , RNA, Protozoan/metabolism , Selection, Genetic , Transcription, Genetic
12.
PLoS Pathog ; 1(4): e44, 2005 Dec.
Article in English | MEDLINE | ID: mdl-16389297

ABSTRACT

Whole-genome comparisons are highly informative regarding genome evolution and can reveal the conservation of genome organization and gene content, gene regulatory elements, and presence of species-specific genes. Initial comparative genome analyses of the human malaria parasite Plasmodium falciparum and rodent malaria parasites (RMPs) revealed a core set of 4,500 Plasmodium orthologs located in the highly syntenic central regions of the chromosomes that sharply defined the boundaries of the variable subtelomeric regions. We used composite RMP contigs, based on partial DNA sequences of three RMPs, to generate a whole-genome synteny map of P. falciparum and the RMPs. The core regions of the 14 chromosomes of P. falciparum and the RMPs are organized in 36 synteny blocks, representing groups of genes that have been stably inherited since these malaria species diverged, but whose relative organization has altered as a result of a predicted minimum of 15 recombination events. P. falciparum-specific genes and gene families are found in the variable subtelomeric regions (575 genes), at synteny breakpoints (42 genes), and as intrasyntenic indels (126 genes). Of the 168 non-subtelomeric P. falciparum genes, including two newly discovered gene families, 68% are predicted to be exported to the surface of the blood stage parasite or infected erythrocyte. Chromosomal rearrangements are implicated in the generation and dispersal of P. falciparum-specific gene families, including one encoding receptor-associated protein kinases. The data show that both synteny breakpoints and intrasyntenic indels can be foci for species-specific genes with a predicted role in host-parasite interactions and suggest that, besides rearrangements in the subtelomeric regions, chromosomal rearrangements may also be involved in the generation of species-specific gene families. A majority of these genes are expressed in blood stages, suggesting that the vertebrate host exerts a greater selective pressure than the mosquito vector, resulting in the acquisition of diversity.


Subject(s)
Genes, Protozoan , Genome , Plasmodium falciparum/genetics , Plasmodium/genetics , Animals , Base Sequence , Chromosome Mapping , Conserved Sequence , Humans , Malaria, Falciparum/parasitology , Molecular Sequence Data , Plasmodium/classification , Plasmodium falciparum/pathogenicity , Species Specificity
13.
Mol Biol Evol ; 22(1): 126-34, 2005 Jan.
Article in English | MEDLINE | ID: mdl-15371525

ABSTRACT

Mariner transposable elements encoding a D,D34D motif-bearing transposase are characterized by their pervasiveness among, and exclusivity to, animal phyla. To date, several hundred sequences have been obtained from taxa ranging from cnidarians to humans, only two of which are known to be functional. Related transposons have been identified in plants and fungi, but their absence among protists is noticeable. Here, we identify and characterize Tvmar1, the first representative of the mariner family to be found in a species of protist, the human parasite Trichomonas vaginalis. This is the first D,D34D element to be found outside the animal kingdom, and its inclusion in the mariner family is supported by both structural and phylogenetic analyses. Remarkably, Tvmar1 has all the hallmarks of a functional element and has recently expanded to several hundred copies in the genome of T. vaginalis. Our results show that a new potentially active mariner has been found that belongs to a distinct mariner lineage and has successfully invaded a nonanimal, single-celled organism. The considerable genetic distance between Tvmar1 and other mariners may have valuable implications for the design of new, high-efficiency vectors to be used in transfection studies in protists.


Subject(s)
DNA Transposable Elements , DNA-Binding Proteins/genetics , Evolution, Molecular , Selection, Genetic , Transposases/genetics , Trichomonas vaginalis/genetics , Amino Acid Sequence , Animals , Base Sequence , Genome , Molecular Sequence Data , Phylogeny , Sequence Homology, Amino Acid
14.
Nature ; 419(6906): 512-9, 2002 Oct 03.
Article in English | MEDLINE | ID: mdl-12368865

ABSTRACT

Species of malaria parasite that infect rodents have long been used as models for malaria disease research. Here we report the whole-genome shotgun sequence of one species, Plasmodium yoelii yoelii, and comparative studies with the genome of the human malaria parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P. y. yoelii contiguous DNA sequences (contigs) aligned to 14 P. falciparum chromosomes reveals marked conservation of gene synteny within the body of each chromosome. Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii orthologues of predominantly metabolic function were identified. Over 800 copies of a variant antigen gene located in subtelomeric regions were found. This is the first genome sequence of a model eukaryotic parasite, and it provides insight into the use of such systems in the modelling of Plasmodium biology and disease.


Subject(s)
Genome, Protozoan , Plasmodium yoelii/genetics , Animals , DNA, Protozoan , Disease Models, Animal , Humans , Malaria/parasitology , Multigene Family , Plasmodium falciparum/genetics , Recombination, Genetic , Rodentia , Sequence Alignment , Sequence Analysis, DNA , Species Specificity , Synteny , Telomere
SELECTION OF CITATIONS
SEARCH DETAIL
...