RESUMO
BACKGROUND: Protein functional diversity at the post-transcriptional level is regulated through spliceosome mediated pre-mRNA alternative splicing (AS) events and that has been widely demonstrated to be a key player in regulating the functional diversity in plants. Identification and analysis of AS genes in cereal crop plants are critical for crop improvement and understanding regulatory mechanisms. RESULTS: We carried out the comparative analyses of the functional landscapes of the AS using the consensus assembly of expressed sequence tags and available mRNA sequences in four cereal plants. We identified a total of 8,734 in Oryza sativa subspecies (ssp) japonica, 2,657 in O. sativa ssp indica, 3,971 in Sorghum bicolor, and 10,687 in Zea mays AS genes. Among the identified AS events, intron retention remains to be the dominant type accounting for 23.5 % in S. bicolor, and up to 55.8 % in O. sativa ssp indica. We identified a total of 887 AS genes that were conserved among Z. mays, S. bicolor, and O. sativa ssp japonica; and 248 AS genes were found to be conserved among all four studied species or ssp. Furthermore, we identified 53 AS genes conserved with Brachypodium distachyon. Gene Ontology classification of AS genes revealed functional assignment of these genes in many biological processes with diverse molecular functions. CONCLUSIONS: AS is common in cereal plants. The AS genes identified in four cereal crops in this work provide the foundation for further studying the roles of AS in regulation of cereal plant growth and development. The data can be accessed at Plant Alternative Splicing Database (http://proteomics.ysu.edu/altsplice/).
Assuntos
Processamento Alternativo , Grão Comestível/genética , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Estudo de Associação Genômica Ampla , Mapeamento Cromossômico , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Evolução Molecular , Éxons , Etiquetas de Sequências Expressas , Genoma de Planta , Íntrons , Anotação de Sequência Molecular , Isoformas de RNARESUMO
Sex determination in papaya is controlled by a recently evolved XY chromosome pair, with two slightly different Y chromosomes controlling the development of males (Y) and hermaphrodites (Y(h)). To study the events of early sex chromosome evolution, we sequenced the hermaphrodite-specific region of the Y(h) chromosome (HSY) and its X counterpart, yielding an 8.1-megabase (Mb) HSY pseudomolecule, and a 3.5-Mb sequence for the corresponding X region. The HSY is larger than the X region, mostly due to retrotransposon insertions. The papaya HSY differs from the X region by two large-scale inversions, the first of which likely caused the recombination suppression between the X and Y(h) chromosomes, followed by numerous additional chromosomal rearrangements. Altogether, including the X and/or HSY regions, 124 transcription units were annotated, including 50 functional pairs present in both the X and HSY. Ten HSY genes had functional homologs elsewhere in the papaya autosomal regions, suggesting movement of genes onto the HSY, whereas the X region had none. Sequence divergence between 70 transcripts shared by the X and HSY revealed two evolutionary strata in the X chromosome, corresponding to the two inversions on the HSY, the older of which evolved about 7.0 million years ago. Gene content differences between the HSY and X are greatest in the older stratum, whereas the gene content and order of the collinear regions are identical. Our findings support theoretical models of early sex chromosome evolution.
Assuntos
Carica/genética , Cromossomos Sexuais , Duplicação Cromossômica , Inversão Cromossômica , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Cromossomos de Plantas , Evolução Molecular , Modelos Genéticos , Dados de Sequência Molecular , Sequências Repetitivas de Ácido Nucleico , Retroelementos , Análise de Sequência de DNARESUMO
A draft sequence of the genome of Brachypodium distachyon, the emerging grass model, was recently released. This represents a unique opportunity to determine its functional diversity compared to the genomes of other model species. Using homology mapping of assembled expressed sequence tags with chromosome scale pseudomolecules, we identified 128 alternative splicing events in B. distachyon. Our study identified that retention of introns is the major type of alternative splicing events (53%) in this plant and highlights the prevalence of splicing site recognition for definition of introns in plants. We have analyzed the compositional profiles of exon-intron junctions by base-pairing nucleotides with U1 snRNA which serves as a model for describing the possibility of sequence conservation. The alternative splicing isoforms identified in this study are novel and represent one of the potentially biologically significant means by which B. distachyon controls the function of its genes. Our observations serve as a basis to understand alternative splicing events of cereal crops with more complex genomes, like wheat or barley.
Assuntos
Processamento Alternativo/genética , Brachypodium/genética , Genoma de Planta/genética , Genômica/métodos , Modelos Biológicos , RNA Nuclear Pequeno/genéticaRESUMO
The relative rates of nucleotide substitution at synonymous and nonsynonymous sites within protein-coding regions have been widely used to infer the action of natural selection from comparative sequence data. It is known, however, that mutational and repair biases can affect rates of evolution at both synonymous and nonsynonymous sites. More importantly, it is also known that synonymous sites are particularly prone to the effects of nucleotide bias. This means that nucleotide biases may affect the calculated ratio of substitution rates at synonymous and nonsynonymous sites. Using a large data set of animal mitochondrial sequences, we demonstrate that this is, in fact, the case. Highly biased nucleotide sequences are characterized by significantly elevated dN/dS ratios, but only when the nucleotide frequencies are not taken into account. When the analysis is repeated taking the nucleotide frequencies at each codon position into account, such elevated ratios disappear. These results suggest that the recently reported differences in dN/dS ratios between vertebrate and invertebrate mitochondrial sequences could be explained by variations in mitochondrial nucleotide frequencies rather than the effects of positive Darwinian selection.
Assuntos
DNA Mitocondrial/genética , Evolução Molecular , Nucleotídeos/genética , Seleção Genética , Animais , Sequência de BasesRESUMO
Tomato (Solanum lycopersicum) is an important vegetable and fruit crop. Its genome was completely sequenced and there are also a large amount of available expressed sequence tags (ESTs) and short reads generated by RNA sequencing (RNA-seq) technologies. Mapping transcripts including mRNA sequences, ESTs, and RNA-seq reads to the genome allows identifying pre-mRNA alternative splicing (AS), a post-transcriptional process generating two or more RNA isoforms from one pre-mRNA transcript. We comprehensively analyzed the AS landscape in tomato by integrating genome mapping information of all available mRNA and ESTs with mapping information of RNA-seq reads which were collected from 27 published projects. A total of 369,911 AS events were identified from 34,419 genomic loci involving 161,913 transcripts. Within the basic AS events, intron retention is the prevalent type (18.9%), followed by alternative acceptor site (12.9%) and alternative donor site (7.3%), with exon skipping as the least type (6.0%). Complex AS types having two or more basic event accounted for 54.9% of total AS events. Within 35,768 annotated protein-coding gene models, 23,233 gene models were found having pre-mRNAs generating AS isoform transcripts. Thus the estimated AS rate was 65.0% in tomato. The list of identified AS genes with their corresponding transcript isoforms serves as a catalog for further detailed examination of gene functions in tomato biology. The post-transcriptional information is also expected to be useful in improving the predicted gene models in tomato. The sequence and annotation information can be accessed at plant alternative splicing database (http://proteomics.ysu.edu/altsplice).
RESUMO
Variations in GC content between genomes have been extensively documented. Genomes with comparable GC contents can, however, still differ in the apportionment of the G and C nucleotides between the two DNA strands. This asymmetric strand bias is known as GC skew. Here, we have investigated the impact of differences in nucleotide skew on the amino acid composition of the encoded proteins. We compared orthologous genes between animal mitochondrial genomes that show large differences in GC and AT skews. Specifically, we compared the mitochondrial genomes of mammals, which are characterized by a negative GC skew and a positive AT skew, to those of flatworms, which show the opposite skews for both GC and AT base pairs. We found that the mammalian proteins are highly enriched in amino acids encoded by CA-rich codons (as predicted by their negative GC and positive AT skews), whereas their flatworm orthologs were enriched in amino acids encoded by GT-rich codons (also as predicted from their skews). We found that these differences in mitochondrial strand asymmetry (measured as GC and AT skews) can have very large, predictable effects on the composition of the encoded proteins.
Assuntos
DNA/química , Proteínas Mitocondriais/química , Aminoácidos/análise , Animais , Composição de Bases , Códon/genética , DNA/genética , Genoma Mitocondrial , Mamíferos/genética , Proteínas Mitocondriais/genética , Platelmintos/genética , Especificidade da EspécieRESUMO
TargetIdentifier is a webserver that identifies full-length cDNA sequences from the expressed sequence tag (EST)-derived contig and singleton data. To accomplish this TargetIdentifier uses BLASTX alignments as a guide to locate protein coding regions and potential start and stop codons. This information is then used to determine whether the EST-derived sequences include their translation start codons. The algorithm also uses the BLASTX output to assign putative functions to the query sequences. The server is available at https://fungalgenome.concordia.ca/tools/TargetIdentifier.html.
Assuntos
DNA Complementar/química , Etiquetas de Sequências Expressas/química , Análise de Sequência de DNA/métodos , Software , Algoritmos , Aspergillus niger/genética , Códon , Bases de Dados Genéticas , Humanos , Internet , Alinhamento de Sequência , Interface Usuário-ComputadorRESUMO
OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. OrfPredictor is available at https://fungalgenome.concordia.ca/tools/OrfPredictor.html.
Assuntos
Etiquetas de Sequências Expressas/química , Fases de Leitura Aberta , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Aminoácidos , Códon , Bases de Dados Genéticas , Internet , Interface Usuário-ComputadorRESUMO
BACKGROUND: Aspergillus niger, a saprophyte commonly found on decaying vegetation, is widely used and studied for industrial purposes. Despite its place as one of the most important organisms for commercial applications, the lack of available information about its genetic makeup limits research with this filamentous fungus. RESULTS: We present here the analysis of 12,820 expressed sequence tags (ESTs) generated from A. niger cultured under seven different growth conditions. These ESTs identify about 5,108 genes of which 44.5% code for proteins sharing similarity (E < or = 1e(-5)) with GenBank entries of known function, 38% code for proteins that only share similarity with GenBank entries of unknown function and 17.5% encode proteins that do not have a GenBank homolog. Using the Gene Ontology hierarchy, we present a first classification of the A. niger proteins encoded by these genes and compare its protein repertoire with other well-studied fungal species. We have established a searchable web-based database that includes the EST and derived contig sequences and their annotation. Details about this project and access to the annotated A. niger database are available. CONCLUSION: This EST collection and its annotation provide a significant resource for fundamental and applied research with A. niger. The gene set identified in this manuscript will be highly useful in the annotation of the genome sequence of A. niger, the genes described in the manuscript, especially those encoding hydrolytic enzymes will provide a valuable source for researchers interested in enzyme properties and applications.
Assuntos
Aspergillus niger/genética , Biologia Computacional , Etiquetas de Sequências Expressas , Genes Fúngicos/genética , Bases de Dados de Ácidos Nucleicos , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão GênicaRESUMO
The subcellular location of a protein is a key factor in determining the molecular function of the protein in an organism. MetazSecKB is a secretome and subcellular proteome knowledgebase specifically designed for metazoan, i.e. human and animals. The protein sequence data, consisting of over 4 million entries with 121 species having a complete proteome, were retrieved from UniProtKB. Protein subcellular locations including secreted and 15 other subcellular locations were assigned based on either curated experimental evidence or prediction using seven computational tools. The protein or subcellular proteome data can be searched and downloaded using several different types of identifiers, gene name or keyword(s), and species. BLAST search and community annotation of subcellular locations are also supported. Our primary analysis revealed that the proteome sizes, secretome sizes and other subcellular proteome sizes vary tremendously in different animal species. The proportions of secretomes vary from 3 to 22% (average 8%) in metazoa species. The proportions of other major subcellular proteomes ranged approximately 21-43% (average 31%) in cytoplasm, 20-37% (average 30%) in nucleus, 3-19% (average 12%) as plasma membrane proteins and 3-9% (average 6%) in mitochondria. We also compared the protein families in secretomes of different primates. The Gene Ontology and protein family domain analysis of human secreted proteins revealed that these proteins play important roles in regulation of human structure development, signal transduction, immune systems and many other biological processes. Database URL: http://proteomics.ysu.edu/secretomes/animal/index.php.
Assuntos
Bases de Dados de Proteínas , Ontologia Genética , Proteoma , Análise de Sequência de Proteína , Software , Animais , Humanos , Proteoma/química , Proteoma/genética , Proteoma/metabolismo , Análise de Sequência de Proteína/instrumentação , Análise de Sequência de Proteína/métodosRESUMO
Expressed Sequence Tags (ESTs) are a rich resource for identifying Alternatively Splicing (AS) genes. The ASFinder webserver is designed to identify AS isoforms from EST-derived sequences. Two approaches are implemented in ASFinder. If no genomic sequences are provided, the server performs a local BLASTN to identify AS isoforms from ESTs having both ends aligned but an internal segment unaligned. Otherwise, ASFinder uses SIM4 to map ESTs to the genome, then the overlapping ESTs that are mapped to the same genomic locus and have internal variable exon/intron boundaries are identified as AS isoforms. The tool is available at http://proteomics.ysu.edu/tools/ASFinder.html.
Assuntos
Processamento Alternativo , Etiquetas de Sequências Expressas , RNA Mensageiro/genética , Software , Arabidopsis/genética , Aspergillus niger/genética , Sequência de Bases , Genoma Fúngico , Genoma de Planta , Humanos , Isoformas de ProteínasRESUMO
Recently, Brachypodium distachyon has emerged as a model plant for studying monocot grasses and cereal crops. Using assembled expressed transcript sequences and subsequent mapping to the corresponding genome, we identified 1219 alternative splicing (AS) events spanning across 2021 putatively assembled transcripts generated from 941 genes. Approximately, 6.3% of expressed genes are alternatively spliced in B. distachyon. We observed that a majority of the identified AS events were related to retained introns (55.5%), followed by alternative acceptor sites (16.7%). We also observed a low percentage of exon skipping (5.0%) and alternative donor site events (8.8%). The 'complex event' that consists of a combination of two or more basic splicing events accounted for â¼14.0%. Comparative AS transcript analysis revealed 163 and 39 homologous pairs between B. distachyon and Oryza sativa and between B. distachyon and Arabidopsis thaliana, respectively. In all, we found 16 AS transcripts to be conserved in all 3 species. AS events and related putative assembled transcripts annotation can be systematically browsed at Plant Alternative Splicing Database (http://proteomics.ysu.edu/altsplice/plant/).
Assuntos
Processamento Alternativo , Brachypodium/genética , Genoma de Planta , Evolução Molecular , Éxons , Etiquetas de Sequências Expressas , Genes de Plantas , Genômica , ÍntronsRESUMO
BACKGROUND: Sacred lotus is a basal eudicot with agricultural, medicinal, cultural and religious importance. It was domesticated in Asia about 7,000 years ago, and cultivated for its rhizomes and seeds as a food crop. It is particularly noted for its 1,300-year seed longevity and exceptional water repellency, known as the lotus effect. The latter property is due to the nanoscopic closely packed protuberances of its self-cleaning leaf surface, which have been adapted for the manufacture of a self-cleaning industrial paint, Lotusan. RESULTS: The genome of the China Antique variety of the sacred lotus was sequenced with Illumina and 454 technologies, at respective depths of 101× and 5.2×. The final assembly has a contig N50 of 38.8 kbp and a scaffold N50 of 3.4 Mbp, and covers 86.5% of the estimated 929 Mbp total genome size. The genome notably lacks the paleo-triplication observed in other eudicots, but reveals a lineage-specific duplication. The genome has evidence of slow evolution, with a 30% slower nucleotide mutation rate than observed in grape. Comparisons of the available sequenced genomes suggest a minimum gene set for vascular plants of 4,223 genes. Strikingly, the sacred lotus has 16 COG2132 multi-copper oxidase family proteins with root-specific expression; these are involved in root meristem phosphate starvation, reflecting adaptation to limited nutrient availability in an aquatic environment. CONCLUSIONS: The slow nucleotide substitution rate makes the sacred lotus a better resource than the current standard, grape, for reconstructing the pan-eudicot genome, and should therefore accelerate comparative analysis between eudicots and monocots.
Assuntos
Genoma de Planta , Nelumbo/genética , Adaptação Biológica , Substituição de Aminoácidos , Evolução Molecular , Dados de Sequência Molecular , Taxa de Mutação , Nelumbo/classificação , Nelumbo/fisiologia , Filogenia , Vitis/genéticaRESUMO
The Fungal Secretome KnowledgeBase (FunSecKB) provides a resource of secreted fungal proteins, i.e. secretomes, identified from all available fungal protein data in the NCBI RefSeq database. The secreted proteins were identified using a well evaluated computational protocol which includes SignalP, WolfPsort and Phobius for signal peptide or subcellular location prediction, TMHMM for identifying membrane proteins, and PS-Scan for identifying endoplasmic reticulum (ER) target proteins. The entries were mapped to the UniProt database and any annotations of subcellular locations that were either manually curated or computationally predicted were included in FunSecKB. Using a web-based user interface, the database is searchable, browsable and downloadable by using NCBI's RefSeq accession or gi number, UniProt accession number, keyword or by species. A BLAST utility was integrated to allow users to query the database by sequence similarity. A user submission tool was implemented to support community annotation of subcellular locations of fungal proteins. With the complete fungal data from RefSeq and associated web-based tools, FunSecKB will be a valuable resource for exploring the potential applications of fungal secreted proteins. Database URL: http://proteomics.ysu.edu/secretomes/fungi.php.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Proteínas Fúngicas , Proteômica/métodos , Internet , ProteomaRESUMO
The availability of complete genome sequences for 12 Drosophila species provides an unprecedented resource for large-scale studies of genome evolution. In this study, we looked for correlated shifts in the patterns of genome and proteome evolution within the genus Drosophila. Specifically, we asked if the nucleotide composition of the Drosophila willistoni genome--which is significantly less GC rich than the other 11 sequenced Drosophila genomes--is reflected in an altered pattern of amino acid substitutions in the encoded proteins. Our results show that this is indeed the case: There are large and highly significant asymmetries in the patterns of amino acid substitution between D. willistoni and Drosophila melanogaster, and they are in the direction predicted by the nucleotide biases. The implication of this result, combined with previous studies on long-term proteome evolution, is that substitutional biases at the DNA level can be a major factor in determining both the long-term and the short-term directions of proteome evolution.
RESUMO
By comparing mtDNA sequences between different orders of mammals, we show that both longevity and generation time are significantly correlated with the nucleotide content of the mtDNA. Specifically, there is a positive correlation between generation time and mt GC content. This correlation is repeated, at a finer evolutionary scale, within the primates. Moreover, a comparison of human and chimpanzee mtDNAs shows that the effect has been very pronounced during the short evolutionary period since the divergence of these two species, with human mtDNA showing a GC-biased pattern of substitution at the variable sites. In addition to these DNA sequence patterns, comparisons between the human and the chimp mt protein sequences also revealed a surprisingly high substitution rate for threonine residues, resulting in a reduction of threonine in the human mt proteome. These patterns of both DNA and protein evolution can be explained by a balance between AT-biased mutational pressure and age-related purifying selection.
Assuntos
Envelhecimento/genética , DNA Mitocondrial/genética , Evolução Molecular , Seleção Genética , Aminoácidos/genética , Animais , Composição de Bases/genética , Humanos , Nucleotídeos/genética , Pan troglodytes/genética , Fatores de TempoRESUMO
DNA barcoding shows enormous promise for the rapid identification of organisms at the species level. There has been much recent debate, however, about the need for longer barcode sequences, especially when these sequences are used to construct molecular phylogenies. Here, we have analysed a set of fungal mitochondrial sequences - of various lengths - and we have monitored the effect of reducing sequence length on the utility of the data for both species identification and phylogenetic reconstruction. Our results demonstrate that reducing sequence length has a profound effect on the accuracy of resulting phylogenetic trees, but surprisingly short sequences still yield accurate species identifications. We conclude that the standard short barcode sequences ( approximately 600 bp) are not suitable for inferring accurate phylogenetic relationships, but they are sufficient for species identification among the fungi.
RESUMO
DNA barcodes have achieved prominence as a tool for species-level identifications. Consequently, there is a rapidly growing database of these short sequences from a wide variety of taxa. In this study, we have analyzed the correlation between the nucleotide content of the short DNA barcode sequences and the genomes from which they are derived. Our results show that such short sequences can yield important, and surprisingly accurate, information about the composition of the entire genome. In other words, for unsequenced genomes, the DNA barcodes can provide a quick preview of the whole genome composition.