RESUMO
Activation-induced cytidine deaminase (AID) is the essential enzyme for imprinting immunological memory through class switch recombination (CSR) and somatic hypermutation (SHM) of the immunoglobulin (Ig) gene. AID-dependent reduction of Topoisomerase 1 (Top1) promotes DNA cleavage that occurs upon Ig gene diversification, whereas the mechanism behind AID-induced Top1 reduction remains unclear. Here, we clarified the contribution of the microRNA-Ago2 complex in AID-dependent Top1 decrease. Ago2 binds to Top1 3'UTR with two regions of AID-dependent Ago2-binding sites (5'- and 3'dABs). Top1 3'UTR knockout (3'UTRKO) in B lymphoma cells leads to decreases in DNA break efficiency in the IgH gene accompanied by a reduction in CSR and SHM frequencies. Furthermore, AID-dependent Top1 protein reduction and Ago2-binding to Top1 mRNA are down-regulated in 3'UTRKO cells. Top1 mRNA in the highly translated fractions of the sucrose gradient is decreased in an AID-dependent and Top1 3'UTR-mediated manner, resulting in a decrease in Top1 protein synthesis. Both AID and Ago2 localize in the mRNA-binding protein fractions and they interact with each other. Furthermore, we found some candidate miRNAs which possibly bind to 5'- and 3'dAB in Top1 mRNA. Among them, miR-92a-3p knockdown induces the phenotypes of 3'UTRKO cells to wild-type cells whereas it does not impact on 3'UTRKO cells. Taken together, the Ago2-miR-92a-3p complex will be recruited to Top1 3'UTR in an AID-dependent manner and posttranscriptionally reduces Top1 protein synthesis. These consequences cause the increase in a non-B-DNA structure, enhance DNA cleavage by Top1 in the Ig gene and contribute to immunological memory formation.
Assuntos
MicroRNAs , MicroRNAs/genética , Regiões 3' não Traduzidas , Clivagem do DNA , Citidina Desaminase/genética , Switching de Imunoglobulina , Anticorpos/genética , Hipermutação Somática de ImunoglobulinaRESUMO
BACKGROUND: Calf mortality generally occurs in calves prior to weaning, which is a serious problem in cattle breeding. Several causative variants of monogenic Mendelian disorders in calf mortality have been identified, whereas genetic factors affecting the susceptibility of calves to death are not well known. To identify variants associated with calf mortality in Japanese Black cattle, we evaluated calf mortality as a categorical trait with a threshold model and performed a genome-wide copy number variation (CNV) association study on calf mortality. RESULTS: We identified a 44-kb deleted-type CNV ranging from 103,317,687 to 103,361,802 bp on chromosome 5, which was associated with the mortality of 1-180-day-old calves. The CNV harbored C1RL, a pseudogene, and an IncRNA localized in the C1R and C1S gene cluster, which is a component of the classical complement activation pathway for immune complexes for infectious pathogens. The average complement activity in CNVR_221 homozygotes at postnatal day 7 was significantly lower than that of wild-type animals and heterozygotes. The frequency of the risk allele in dead calves suffering from diarrhea and pneumonia and in healthy cows was 0.35 and 0.28, respectively (odds ratio = 2.2, P = 0.016), suggesting that CNVR_221 was associated with the mortality of Japanese Black calves suffering from an infectious disease. CONCLUSIONS: This study identified a deleted-type CNV associated with the mortality of 1-180-day-old calves. The complement activity in CNVR_221 homozygotes was significantly lower than that in heterozygotes and wild type animals. The frequency of the risk allele was higher in dead calves suffering from an infectious disease than in healthy cows. These results suggest that the existence of CNVR_221 in calves could be attributed to a reduction in complement activity, which in turn leads to susceptibility to infections. Thus, the risk allele could serve as a useful marker to reduce the mortality of infected Japanese Black calves.
Assuntos
Doenças dos Bovinos , Variações do Número de Cópias de DNA , Alelos , Animais , Bovinos , Doenças dos Bovinos/genética , Feminino , Homozigoto , Japão , DesmameRESUMO
DBTSS (http://dbtss.hgc.jp/) was originally constructed as a collection of uniquely determined transcriptional start sites (TSSs) in humans and some other species in 2002. Since then, it has been regularly updated and in recent updates epigenetic information has also been incorporated because such information is useful for characterizing the biological relevance of these TSSs/downstream genes. In the newest release, Release 9, we further integrated public and original single nucleotide variation (SNV) data into our database. For our original data, we generated SNV data from genomic analyses of various cancer types, including 97 lung adenocarcinomas and 57 lung small cell carcinomas from Japanese patients as well as 26 cell lines of lung cancer origin. In addition, we obtained publically available SNV data from other cancer types and germline variations in total of 11,322 individuals. With these updates, users can examine the association between sequence variation pattern in clinical lung cancers with its corresponding TSS-seq, RNA-seq, ChIP-seq and BS-seq data. Consequently, DBTSS is no longer a mere storage site for TSS information but has evolved into an integrative platform of a variety of genome activity data.
Assuntos
Bases de Dados de Ácidos Nucleicos , Epigênese Genética , Perfilação da Expressão Gênica , Variação Genética , Sítio de Iniciação de Transcrição , Genômica , Humanos , Internet , Taxa de Mutação , Neoplasias/genéticaRESUMO
The previous release of our Full-parasites database (http://fullmal.hgc.jp/) brought enhanced functionality, an expanded full-length cDNA content, and new RNA-Seq datasets from several important apicomplexan parasites. The 2015 update witnesses the major shift in the databases content with focus on diverse transcriptomes of the apicomplexan parasites. The content of the database was substantially enriched with transcriptome information for new apicomplexan parasites. The latest version covers a total of 17 species, with addition of our newly generated RNA-Seq data of a total of 909,150,388 tags. Moreover, we have generated and included two novel and unique datasets, which represent diverse nature of transcriptomes in individual parasites in vivo and in vitro. One is the data collected from 116 Indonesian patients infected with Plasmodium falciparum. The other is a series of transcriptome data collected from a total of 38 single cells of P. falciparum cultured in vitro. We believe that with the recent advances our database becomes an even better resource and a unique platform in the analysis of apicomplexan parasites and their interaction with their hosts. To adequately reflect the recent modifications and the current content we have changed the database name to DB-AT--DataBase of Apicomplexa Transcriptomes.
Assuntos
Apicomplexa/genética , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Internet , Malária Falciparum/parasitologia , Plasmodium falciparum/genética , Análise de Sequência de RNARESUMO
To identify and characterize transcript structures ranging from transcriptional start sites (TSSs) to poly(A)-addition sites (PASs), we constructed and analyzed human TSS/PAS mate pair full-length cDNA libraries from 14 tissue types and four cell lines. The collected information enabled us to define TSS cluster (TSC) and PAS cluster (PAC) relationships for a total of 8530/9400 RefSeq genes, as well as 4251/5618 of their putative alternative promoters/terminators and 4619/4605 intervening transcripts, respectively. Analyses of the putative alternative TSCs and alternative PACs revealed that their selection appeared to be mostly independent, with rare exceptions. In those exceptional cases, pairs of transcript units rarely overlapped one another and were occasionally separated by Rad21/CTCF. We also identified a total of 172 similar cases in which TSCs and PACs spanned adjacent but distinct genes. In these cases, different transcripts may utilize different functional units of a particular gene or of adjacent genes. This approach was also useful for identifying fusion gene transcripts in cancerous cells. Furthermore, we could construct cDNA libraries in which 3'-end mate pairs were distributed randomly over the transcripts. These libraries were useful for assembling the internal structure of previously uncharacterized alternative promoter products, as well as intervening transcripts.
Assuntos
Regiões 3' não Traduzidas , Biblioteca Gênica , Sítio de Iniciação de Transcrição , Linhagem Celular , Cromatina/química , DNA Complementar , Células HEK293 , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células MCF-7 , Poli A , RNA Mensageiro/químicaRESUMO
Here we conducted an integrative multi-omics analysis to understand how cancers harbor various types of aberrations at the genomic, epigenomic and transcriptional levels. In order to elucidate biological relevance of the aberrations and their mutual relations, we performed whole-genome sequencing, RNA-Seq, bisulfite sequencing and ChIP-Seq of 26 lung adenocarcinoma cell lines. The collected multi-omics data allowed us to associate an average of 536 coding mutations and 13,573 mutations in promoter or enhancer regions with aberrant transcriptional regulations. We detected the 385 splice site mutations and 552 chromosomal rearrangements, representative cases of which were validated to cause aberrant transcripts. Averages of 61, 217, 3687 and 3112 mutations are located in the regulatory regions which showed differential DNA methylation, H3K4me3, H3K4me1 and H3K27ac marks, respectively. We detected distinct patterns of aberrations in transcriptional regulations depending on genes. We found that the irregular histone marks were characteristic to EGFR and CDKN1A, while a large genomic deletion and hyper-DNA methylation were most frequent for CDKN2A. We also used the multi-omics data to classify the cell lines regarding their hallmarks of carcinogenesis. Our datasets should provide a valuable foundation for biological interpretations of interlaced genomic and epigenomic aberrations.
Assuntos
Adenocarcinoma/genética , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Neoplasias Pulmonares/genética , Transcriptoma , Adenocarcinoma de Pulmão , Linhagem Celular Tumoral , Imunoprecipitação da Cromatina , Metilação de DNA , Perfilação da Expressão Gênica , Genoma Humano , Genômica , Histonas/metabolismo , Humanos , Mutação , RNA Polimerase II/metabolismo , Análise de Sequência de DNA , Análise de Sequência de RNARESUMO
BACKGROUND: Babesia bovis is an apicomplexan parasite that causes babesiosis in infected cattle. Genomes of pathogens contain promising information that can facilitate the development of methods for controlling infections. Although the genome of B. bovis is publically available, annotated gene models are not highly reliable prior to experimental validation. Therefore, we validated a preproposed gene model of B. bovis and extended the associated annotations on the basis of experimentally obtained full-length expressed sequence tags (ESTs). RESULTS: From in vitro cultured merozoites, 12,286 clones harboring full-length cDNAs were sequenced from both ends using the Sanger method, and 6,787 full-length cDNAs were assembled. These were then clustered, and a nonredundant referential data set of 2,115 full-length cDNA sequences was constructed. The comparison of the preproposed gene model with our data set identified 310 identical genes, 342 almost identical genes, 1,054 genes with potential structural inconsistencies, and 409 novel genes. The median length of 5' untranslated regions (UTRs) was 152 nt. Subsequently, we identified 4,086 transcription start sites (TSSs) and 2,023 transcriptionally active regions (TARs) by examining 5' ESTs. We identified ATGGGG and CCCCAT sites as consensus motifs in TARs that were distributed around -50 bp from TSSs. In addition, we found ACACA, TGTGT, and TATAT sites, which were distributed periodically around TSSs in cycles of approximately 150 bp. Moreover, related periodical distributions were not observed in mammalian promoter regions. CONCLUSIONS: The observations in this study indicate the utility of integrated bioinformatics and experimental data for improving genome annotations. In particular, full-length cDNAs with one-base resolution for TSSs enabled the identification of consensus motifs in promoter sequences and demonstrated clear distributions of identified motifs. These observations allowed the illustration of a model promoter composition, which supports the differences in transcriptional regulation frameworks between apicomplexan parasites and mammals.
Assuntos
Babesia bovis/genética , Regiões Promotoras Genéticas , Sequência de Bases , Sequência Consenso , Mapeamento de Sequências Contíguas , DNA Complementar/genética , DNA de Protozoário/genética , Etiquetas de Sequências Expressas , Genes de Protozoários , Modelos Genéticos , Anotação de Sequência Molecular , Dados de Sequência Molecular , Nucleossomos/genética , Análise de Sequência de DNA , Sítio de Iniciação de TranscriçãoRESUMO
Full-Parasites (http://fullmal.hgc.jp/) is a transcriptome database of apicomplexa parasites, which include Plasmodium and Toxoplasma species. The latest version of Full-Parasites contains a total of 105,786 EST sequences from 12 parasites, of which 5925 full-length cDNAs have been completely sequenced. Full-Parasites also contain more than 30 million transcription start sites (TSS) for Plasmodium falciparum (Pf) and Toxoplasma gondii (Tg), which were identified using our novel oligo-capping-based protocol. Various types of cDNA data resources were interconnected with our original database functionalities. Specifically, in this update, we have included two unique RNA-Seq data sets consisting of 730 million mapped RNA-Seq tags. One is a dataset of 16 time-lapse experiments of cultured bradyzoite differentiation for Tg. The other dataset includes 31 clinical samples of Pf. Parasite RNA was extracted together with host human RNA, and the extracted mixed RNA was used for RNA sequencing, with the expectation that gene expression information from the host and parasite would be simultaneously represented. By providing the largest unique full-length cDNA and dynamic transcriptome data, Full-Parasites is useful for understanding host-parasite interactions and will help to eventually elucidate how monophyletic organisms have evolved to become parasites by adopting complex life cycles.
Assuntos
Apicomplexa/genética , DNA Complementar/química , Bases de Dados de Ácidos Nucleicos , RNA de Protozoário/química , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Interações Hospedeiro-Parasita , Humanos , Plasmodium falciparum/genética , Análise de Sequência de RNA , Toxoplasma/genética , Sítio de Iniciação de TranscriçãoRESUMO
A genetic analysis of Japanese Black cattle using short reads and guided by the reference genome from Western breeds would miss the structural variation and/or other unique characteristics of Japanese Black cattle. To overcome this difficulty, a de novo genome assembly independent from the reference genome is required. This chapter describes the technical developments, with respect to both experimental and bioinformatics procedures, including the use of short and long reads, required for de novo genome assembly of Japanese Black cattle.
Assuntos
Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Animais , Bovinos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos , Genoma , Análise de Sequência de DNA/métodosRESUMO
DataBase of Transcription Start Sites (DBTSS) is a database which contains precise positional information for transcription start sites (TSSs) of eukaryotic mRNAs. In this update, we included 330 million new tags generated by massively sequencing the 5'-end of oligo-cap selected cDNAs in humans and mice. The tags were collected from normal fetal or adult human tissues, including brain, thymus, liver, kidney and heart, from 6 human cell lines in 21 diverse growth conditions as well as from mouse NIH3T3 cell line: altogether 31 different cell types or culture conditions are represented. This unprecedented increase in depth of data now allows DBTSS to faithfully represent the dynamically changing landscape of TSSs in different cell types and conditions, during development and in the course of evolution. Differential usage of alternative 5'-ends across cell types and conditions can be viewed in a series of new interfaces. Promoter sequence information is now displayed in a comparative genomics viewer where evolutionary turnover of the TSSs can be evaluated. DBTSS can be accessed at http://dbtss.hgc.jp/.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Sítio de Iniciação de Transcrição , Algoritmos , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Perfilação da Expressão Gênica , Genômica , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Camundongos , Células NIH 3T3 , Interface Usuário-ComputadorRESUMO
Omic analyses of economically important animals, including Japanese Black cattle, are currently underway worldwide. In particular, tissue and developmental stage-specific transcriptome characterization is essential for understanding the molecular mechanisms underlying the phenotypic expression of genetic disorders and economic traits. Here, we conducted a comprehensive analysis of 124 transcriptomes across 31 major tissues from fetuses, juvenile calves, and adult Japanese Black cattle using short-read sequencing. We found that genes exhibiting high tissue-specific expression tended to increase after 60 days from fertilization and significantly reflected tissue-relevant biology. Based on gene expression variation and inflection points during development, we categorized gene expression patterns as stable, increased, decreased, temporary, or complex in each tissue. We also analysed the expression profiles of causative genes (e.g. SLC12A1, ANXA10, and MYH6) for genetic disorders in cattle, revealing disease-relevant expression patterns. In addition, to directly analyse the structure of full-length transcripts without transcript reconstruction, we performed RNA sequencing analysis of 22 tissues using long-read sequencing and identified 232 novel non-RefSeq isoforms. Collectively, our comprehensive transcriptomic analysis can serve as an important resource for the biological and functional interpretation of gene expression and enable the mechanistic interpretation of genetic disorders and economic traits in Japanese Black cattle.
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Animais , Bovinos/genética , Fenótipo , Isoformas de ProteínasRESUMO
Full-Malaria/Parasites is a database for transcriptome studies of apicomplexa and other parasites, which is based on our original full-length cDNA sequences and physical cDNA clone resources. In this update, the database has been expanded to contain the shogun sequencing for the entire sequences of 14,818 non-redundant full-length cDNA clones from six apicomplexa parasites and 6.8 million of transcription start sites (TSS), both of which had been produced by novel protocols using the oligo-capping method and the Illumina GA sequencer. The former should be the ultimate data for exact annotation of the expressed genes, while the latter should be useful for ultra-deep expression analysis. Furthermore, we have launched Full-Arthropods, a full-length cDNA database for arthropods of medical importance. Full-Arthropods contains 50 343 one-pass sequences, 10 399 shotgun complete sequences and 22.4 million TSS tags in anopheles mosquitoes that transmit malaria, tsetse flies that transmit trypanosomiasis and dust mites that cause allergic dermatitis and bronchial asthma. By providing the largest integrated full-length cDNA data resources in the apicomplexa parasites as well as their vectors, Full-Malaria/Parasites and Full-Arthropods should help combat parasitic diseases. Full-Malaria/Parasites and Full-Arthropods are accessible from http://fullmal.hgc.jp/.
Assuntos
Apicomplexa/genética , Vetores Artrópodes/genética , Artrópodes/genética , DNA Complementar/química , Bases de Dados de Ácidos Nucleicos , Parasitos/genética , Animais , Anopheles/genética , Plasmodium/genética , Análise de Sequência de DNA , Toxoplasma/genética , Sítio de Iniciação de Transcrição , Moscas Tsé-Tsé/genéticaRESUMO
Combining our full-length cDNA method and the massively parallel sequencing technology, we developed a simple method to collect precise positional information of transcriptional start sites (TSSs) together with digital information of the gene-expression levels in a high throughput manner. We applied this method to observe gene-expression changes in a colon cancer cell line cultured in normoxic and hypoxic conditions. We generated more than 100 million 36-base TSS-tag sequences and revealed comprehensive features of hypoxia responsive alterations in the transcriptional landscape of the human genome. The features include presence of inducible 'hot regions' in 54 genomic regions, 220 novel hypoxia inducible promoters that may drive non-protein-coding transcripts, 191 hypoxia responsive alternative promoters and detailed views of 120 novel as well as known hypoxia responsive genes. We further analyzed hypoxic response of different cells using additional 60 million TSS-tags and found that the degree of the gene-expression changes were different among cell lines, possibly reflecting cellular robustness against hypoxia. The novel dynamic figure of the human gene transcriptome will deepen our understanding of the transcriptional program of the human genome as well as bringing new insights into the biology of cancer cells in hypoxia.
Assuntos
Regulação Neoplásica da Expressão Gênica , Sítio de Iniciação de Transcrição , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Hipóxia Celular , Linhagem Celular , Linhagem Celular Tumoral , Neoplasias do Colo/genética , Biblioteca Gênica , Redes Reguladoras de Genes , Genoma Humano , Humanos , Subunidade alfa do Fator 1 Induzível por Hipóxia/metabolismo , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Transcrição GênicaRESUMO
Intensive use of a few elite sires has increased the risk of the manifestation of deleterious recessive traits in cattle. Substantial genotyping data gathered using single-nucleotide polymorphism (SNP) arrays have identified the haplotypes with homozygous deficiency, which may compromise survival. We developed Japanese Black cattle haplotypes (JBHs) using SNP array data (4843 individuals) and identified deleterious recessive haplotypes using exome sequencing of 517 sires. We identified seven JBHs with homozygous deficiency. JBH_10 and JBH_17 were associated with the resuming of estrus after artificial insemination, indicating that these haplotypes carried deleterious mutations affecting embryonic survival. The exome data of 517 Japanese Black sires revealed that AC_000165.1:g.85341291C>G of IARS in JBH_8_2, AC_000174.1:g.74743512G>T of CDC45 in JBH_17, and a copy variation region (CNVR_27) of CLDN16 in JBH_1_1 and JBH_1_2 were the candidate mutations. A novel variant AC_000174.1:g.74743512G>T of CDC45 in JBH_17 was located in a splicing donor site at a distance of 5 bp, affecting pre-mRNA splicing. Mating between heterozygotes of JBH_17 indicated that homozygotes carrying the risk allele died around the blastocyst stage. Analysis of frequency of the CDC45 risk allele revealed that its carriers were widespread throughout the tested Japanese Black cattle population. Our approach can effectively manage the inheritance of recessive risk alleles in a breeding population.
Assuntos
Alelos , Genes Recessivos , Haplótipos , Mutação , Animais , Biomarcadores , Cruzamento , Bovinos , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Variações do Número de Cópias de DNA , Desenvolvimento Embrionário , Homozigoto , Polimorfismo de Nucleotídeo Único , Splicing de RNA , Sequenciamento do ExomaRESUMO
DBTSS is a database of transcriptional start sites, based on our unique collection of precise, experimentally determined 5'-end sequences of full-length cDNAs. Since its first release in 2002, several major updates have been made. In this update, we expanded the human transcriptional start site dataset by 19 million uniquely mapped, and RefSeq-associated, 5'-end sequences, which were generated by a newly introduced Solexa sequencer. Moreover, in order to provide means for interpreting those massive TSS data, we implemented two new analytical tools: one for connecting expression information with predicted transcription factor binding sites; the other for examining evolutionary conservation or species-specificity of promoters and transcripts, which can be browsed by our own comparative genome viewer. With the expanded dataset and the enhanced functionalities, DBTSS provides a unique platform that enables in-depth transcriptome analyses. DBTSS is accessible at http://dbtss.hgc.jp/.
Assuntos
Bases de Dados de Ácidos Nucleicos , Sítio de Iniciação de Transcrição , Animais , Sítios de Ligação , Evolução Molecular , Expressão Gênica , Humanos , Internet , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Software , Especificidade da Espécie , Fatores de Transcrição/metabolismoRESUMO
Although the knowledge accumulated on the transcriptional regulations of eukaryotes is significant, the knowledge on their translational regulations remains limited. Thus, we performed a comprehensive detection of terminal oligo-pyrimidine (TOP), which is one of the well-characterized cis-regulatory motifs for translational controls located immediately downstream of the transcriptional start sites of mRNAs. Utilizing our precise 5'-end information of the full-length cDNAs, we could screen 1645 candidate TOP genes by position specific matrix search. Among them, not only 75 out of 78 ribosomal protein genes but also eight previously identified non-ribosomal-protein TOP genes were included. We further experimentally validated the translational activities of 83 TOP candidate genes. Clear translational regulations exerted on the stimulation of 12-O-tetradecanoyl-1-phorbol-13-acetate for at least 41 of them was observed, indicating that there should be a few hundreds of human genes which are subjected to regulation at translation levels via TOPs. Our result suggests that TOP genes code not only formerly characterized ribosomal proteins and translation-related proteins but also a wider variety of proteins, such as lysosome-related proteins and metabolism-related proteins, playing pivotal roles in gene expression controls in the majority of cellular mRNAs.
Assuntos
Regulação da Expressão Gênica , Biossíntese de Proteínas , Sequência de Oligopirimidina na Região 5' Terminal do RNA , Animais , Perfilação da Expressão Gênica , Genoma Humano , Células HL-60 , Humanos , Camundongos , RNA Mensageiro/química , Proteínas Ribossômicas/genética , Sítio de Iniciação de TranscriçãoRESUMO
BACKGROUND: Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. RESULTS: In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. CONCLUSION: Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of apicomplexa parasites.
Assuntos
Apicomplexa/genética , DNA Complementar/genética , Etiquetas de Sequências Expressas , Genoma de Protozoário , Regiões 5' não Traduzidas , Animais , Mapeamento Cromossômico , Análise por Conglomerados , DNA de Protozoário/genética , Biblioteca Gênica , Modelos Genéticos , Análise de Sequência de DNA , Sítio de Iniciação de TranscriçãoRESUMO
Comparasite is a database for comparative studies of transcriptomes of parasites. In this database, each data is defined by the full-length cDNAs from various apicomplexan parasites. It integrates seven individual databases, Full-Parasites, consisting of numerous full-length cDNA clones that we have produced and sequenced: 12,484 cDNA sequences from Plasmodium falciparum, 11,262 from Plasmodium yoelii, 9633 from Plasmodium vivax, 1518 from Plasmodium berghei, 7400 from Toxoplasma gondii, 5921 from Cryptosporidium parvum and 10,966 from the tapeworm Echinococcus multilocularis. Putatively counterpart gene groups are clustered and comparative analysis of any combination of six apicomplexa species is implemented, such as interspecies comparisons regarding protein motifs (InterPro), predicted subcellular localization signals (PSORT), transmembrane regions (SOSUI) or upstream promoter elements. By specifying keywords and other search conditions, Comparasite retrieves putative counterpart gene groups containing a given feature in common or in a species-specific manner. By enabling multi-faceted comparative analyses of genes of apicomplexa protozoa, monophyletic organisms that have evolved to diversify to parasitize various hosts by adopting complex life cycles, Comparasite should help elucidate the mechanism behind parasitism. Our full-length cDNA databases and Comparasite are accessible from http://fullmal.ims.u-tokyo.ac.jp.
Assuntos
Apicomplexa/genética , DNA Complementar/química , Bases de Dados de Ácidos Nucleicos , Genes de Protozoários , Transcrição Gênica , Animais , Apicomplexa/metabolismo , Internet , Regiões Promotoras Genéticas , Integração de Sistemas , Sítio de Iniciação de Transcrição , Interface Usuário-ComputadorRESUMO
The liver, a major organ for drug metabolism, is physiologically similar between monkeys and humans. However, the paucity of identified genes has hampered a deep understanding of drug metabolism in monkeys. To provide such a genetic resource, 28655 expressed sequence tags (ESTs) were generated from a cynomolgus monkey liver full-length enriched cDNA library, which contained 23 unique ESTs homologous to human drug-metabolizing enzymes. Our comparative genomics approach identified nine lineage-specific candidate ESTs, including three drug-metabolizing enzymes, which could be important for understanding the physiological differences between monkeys and humans.
Assuntos
Hidrocarboneto de Aril Hidroxilases/genética , Etiquetas de Sequências Expressas , Fígado/metabolismo , Esteroide Hidroxilases/genética , Sequência de Aminoácidos , Animais , Hidrocarboneto de Aril Hidroxilases/química , Primers do DNA , Fígado/enzimologia , Macaca fascicularis , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos , Esteroide Hidroxilases/químicaRESUMO
DBTSS was first constructed in 2002 based on precise, experimentally determined 5' end clones. Several major updates and additions have been made since the last report. First, the number of human clones has drastically increased, going from 190,964 to 1,359,000. Second, information about potential alternative promoters is presented because the number of 5' end clones is now sufficient to determine several promoters for one gene. Namely, we defined putative promoter groups by clustering transcription start sites (TSSs) separated by <500 bases. A total of 8308 human genes and 4276 mouse genes were found to have putative multiple promoters. Third, DBTSS provides detailed sequence comparisons of user-specified TSSs. Finally, we have added TSS information for zebrafish, malaria and schyzon (a red algae model organism). DBTSS is accessible at http://dbtss.hgc.jp.