Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
PLoS Genet ; 10(2): e1004077, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24516395

RESUMO

Although two related species may have extremely similar phenotypes, the genetic networks underpinning this conserved biology may have diverged substantially since they last shared a common ancestor. This is termed Developmental System Drift (DSD) and reflects the plasticity of genetic networks. One consequence of DSD is that some orthologous genes will have evolved different in vivo functions in two such phenotypically similar, related species and will therefore have different loss of function phenotypes. Here we report an RNAi screen in C. elegans and C. briggsae to identify such cases. We screened 1333 genes in both species and identified 91 orthologues that have different RNAi phenotypes. Intriguingly, we find that recently evolved genes of unknown function have the fastest evolving in vivo functions and, in several cases, we identify the molecular events driving these changes. We thus find that DSD has a major impact on the evolution of gene function and we anticipate that the C. briggsae RNAi library reported here will drive future studies on comparative functional genomics screens in these nematodes.


Assuntos
Caenorhabditis elegans/genética , Evolução Molecular , Redes Reguladoras de Genes , Interferência de RNA , Animais , Caenorhabditis elegans/crescimento & desenvolvimento , Proteínas de Caenorhabditis elegans/biossíntese , Proteínas de Caenorhabditis elegans/genética , Regulação da Expressão Gênica no Desenvolvimento , Fenótipo , Homologia de Sequência de Aminoácidos , Especificidade da Espécie
2.
BMC Bioinformatics ; 14: 16, 2013 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-23324024

RESUMO

BACKGROUND: The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this 'names problem' has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science. RESULTS: The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets. CONCLUSIONS: We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.


Assuntos
Plantas/classificação , Software , Algoritmos , Classificação/métodos , Bases de Dados Factuais , Internet , Nomes , Interface Usuário-Computador
3.
PLoS Genet ; 5(6): e1000537, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19557190

RESUMO

A crucial step in the development of muscle cells in all metazoan animals is the assembly and anchorage of the sarcomere, the essential repeat unit responsible for muscle contraction. In Caenorhabditis elegans, many of the critical proteins involved in this process have been uncovered through mutational screens focusing on uncoordinated movement and embryonic arrest phenotypes. We propose that additional sarcomeric proteins exist for which there is a less severe, or entirely different, mutant phenotype produced in their absence. We have used Serial Analysis of Gene Expression (SAGE) to generate a comprehensive profile of late embryonic muscle gene expression. We generated two replicate long SAGE libraries for sorted embryonic muscle cells, identifying 7,974 protein-coding genes. A refined list of 3,577 genes expressed in muscle cells was compiled from the overlap between our SAGE data and available microarray data. Using the genes in our refined list, we have performed two separate RNA interference (RNAi) screens to identify novel genes that play a role in sarcomere assembly and/or maintenance in either embryonic or adult muscle. To identify muscle defects in embryos, we screened specifically for the Pat embryonic arrest phenotype. To visualize muscle defects in adult animals, we fed dsRNA to worms producing a GFP-tagged myosin protein, thus allowing us to analyze their myofilament organization under gene knockdown conditions using fluorescence microscopy. By eliminating or severely reducing the expression of 3,300 genes using RNAi, we identified 122 genes necessary for proper myofilament organization, 108 of which are genes without a previously characterized role in muscle. Many of the genes affecting sarcomere integrity have human homologs for which little or nothing is known.


Assuntos
Citoesqueleto de Actina/química , Caenorhabditis elegans/genética , Perfilação da Expressão Gênica/métodos , Desenvolvimento Muscular , Citoesqueleto de Actina/genética , Citoesqueleto de Actina/metabolismo , Animais , Caenorhabditis elegans/química , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Músculos/química , Músculos/embriologia , Músculos/metabolismo , Sarcômeros/genética , Sarcômeros/metabolismo
4.
BMC Plant Biol ; 9: 101, 2009 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-19646253

RESUMO

BACKGROUND: Functional genomics tools provide researchers with the ability to apply high-throughput techniques to determine the function and interaction of a diverse range of genes. Mutagenized plant populations are one such resource that facilitate gene characterisation. They allow complex physiological responses to be correlated with the expression of single genes in planta, through either reverse genetics where target genes are mutagenized to assay the affect, or through forward genetics where populations of mutant lines are screened to identify those whose phenotype diverges from wild type for a particular trait. One limitation of these types of populations is the prevalence of gene redundancy within plant genomes, which can mask the affect of individual genes. Activation or enhancer populations, which not only provide knock-out but also dominant activation mutations, can facilitate the study of such genes. RESULTS: We have developed a population of almost 50,000 activation tagged A. thaliana lines that have been archived as individual lines to the T3 generation. The population is an excellent tool for both reverse and forward genetic screens and has been used successfully to identify a number of novel mutants. Insertion site sequences have been generated and mapped for 15,507 lines to enable further application of the population, while providing a clear distribution of T-DNA insertions across the genome. The population is being screened for a number of biochemical and developmental phenotypes, provisional data identifying novel alleles and genes controlling steps in proanthocyanidin biosynthesis and trichome development is presented. CONCLUSION: This publicly available population provides an additional tool for plant researcher's to assist with determining gene function for the many as yet uncharacterised genes annotated within the Arabidopsis genome sequence http://aafc-aac.usask.ca/FST. The presence of enhancer elements on the inserted T-DNA molecule allows both knock-out and dominant activation phenotypes to be identified for traits of interest.


Assuntos
Arabidopsis/genética , Genoma de Planta , Genômica/métodos , Mutagênese Insercional , Análise Mutacional de DNA , DNA Bacteriano/genética , DNA de Plantas/genética , Genes de Plantas
5.
BMC Bioinformatics ; 9: 549, 2008 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-19099578

RESUMO

BACKGROUND: While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. RESULTS: The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders. CONCLUSION: This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.


Assuntos
Caenorhabditis/genética , Genoma Helmíntico , Animais , Caenorhabditis elegans/classificação , Caenorhabditis elegans/genética , Biologia Computacional/métodos , DNA/genética , Bases de Dados Genéticas , Genes de Helmintos , Genômica
6.
Curr Biol ; 15(10): 935-41, 2005 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-15916950

RESUMO

Cilia and flagella play important roles in many physiological processes, including cell and fluid movement, sensory perception, and development. The biogenesis and maintenance of cilia depend on intraflagellar transport (IFT), a motility process that operates bidirectionally along the ciliary axoneme. Disruption in IFT and cilia function causes several human disorders, including polycystic kidneys, retinal dystrophy, neurosensory impairment, and Bardet-Biedl syndrome (BBS). To uncover new ciliary components, including IFT proteins, we compared C. elegans ciliated neuronal and nonciliated cells through serial analysis of gene expression (SAGE) and screened for genes potentially regulated by the ciliogenic transcription factor, DAF-19. Using these complementary approaches, we identified numerous candidate ciliary genes and confirmed the ciliated-cell-specific expression of 14 novel genes. One of these, C27H5.7a, encodes a ciliary protein that undergoes IFT. As with other IFT proteins, its ciliary localization and transport is disrupted by mutations in IFT and bbs genes. Furthermore, we demonstrate that the ciliary structural defect of C. elegans dyf-13(mn396) mutants is caused by a mutation in C27H5.7a. Together, our findings help define a ciliary transcriptome and suggest that DYF-13, an evolutionarily conserved protein, is a novel core IFT component required for cilia function.


Assuntos
Caenorhabditis elegans/genética , Cílios/genética , Perfilação da Expressão Gênica , Neurônios/metabolismo , Animais , Sequência de Bases , Proteínas de Caenorhabditis elegans/metabolismo , Cílios/metabolismo , Biologia Computacional , Genômica/métodos , Proteínas de Fluorescência Verde , Mutação/genética , Transporte Proteico/fisiologia , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo
7.
Exp Gerontol ; 42(8): 825-39, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17543485

RESUMO

We used Serial Analysis of Gene Expression (SAGE) to compare the global transcription profiles of long-lived mutant daf-2 adults and dauer larvae, aiming to identify aging-related genes based on similarity of expression patterns. Genes that are expressed similarly in both long-lived types potentially define a common life-extending program. Comparison of eight SAGE libraries yielded a set of 120 genes, the expression of which was significantly different in long-lived worms vs. normal adults. The gene annotations indicate a strong link between oxidative stress and life span, further supporting the hypothesis that metabolic activity is a major determinant in longevity. The SAGE data show changes in mRNA levels for electron transport chain components, elevated expression of glyoxylate shunt enzymes and significantly reduced expression for components of the TCA cycle in longer-lived nematodes. We propose a model for enhanced longevity through a cytochrome c oxidase-mediated reduction in reactive oxygen species commonly held to be a major contributor to aging.


Assuntos
Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Genes de Helmintos , Longevidade/genética , Receptor de Insulina/genética , Envelhecimento/genética , Animais , Caenorhabditis elegans/crescimento & desenvolvimento , Caenorhabditis elegans/metabolismo , Ciclo do Ácido Cítrico/genética , Complexo III da Cadeia de Transporte de Elétrons/genética , Complexo IV da Cadeia de Transporte de Elétrons/genética , Perfilação da Expressão Gênica , Larva/crescimento & desenvolvimento , Modelos Biológicos , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Interferência de RNA , RNA de Helmintos/genética , RNA de Helmintos/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Espécies Reativas de Oxigênio/metabolismo
8.
Curr Protoc Bioinformatics ; 50: 9.10.1-9.10.10, 2015 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-26087747

RESUMO

The Reactome project builds, maintains, and publishes a knowledgebase of biological pathways. The information in the knowledgebase is gathered from the experts in the field, peer reviewed and edited by Reactome editorial staff, and then published to the Reactome Web site, http://www.reactome.org. The Reactome software is open source and builds on top of other open-source or freely available software. Reactome data and code can be freely downloaded in its entirety and the Web site installed locally. This allows for more flexible interrogation of the data and also makes it possible to add one's own information to the knowledgebase.


Assuntos
Internet , Bases de Conhecimento , Transdução de Sinais , Bases de Dados como Assunto , Software
9.
Curr Protoc Bioinformatics ; Chapter 1: 1.22.1-1.22.26, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23749752

RESUMO

The iPlant Collaborative is an academic consortium whose mission is to develop an informatics and social infrastructure to address the "grand challenges" in plant biology. Its cyberinfrastructure supports the computational needs of the research community and facilitates solving major challenges in plant science. The Discovery Environment provides a powerful and rich graphical interface to the iPlant Collaborative cyberinfrastructure by creating an accessible virtual workbench that enables all levels of expertise, ranging from students to traditional biology researchers and computational experts, to explore, analyze, and share their data. By providing access to iPlant's robust data-management system and high-performance computing resources, the Discovery Environment also creates a unified space in which researchers can access scalable tools. Researchers can use available Applications (Apps) to execute analyses on their data, as well as customize or integrate their own tools to better meet the specific needs of their research. These Apps can also be used in workflows that automate more complicated analyses. This module describes how to use the main features of the Discovery Environment, using bioinformatics workflows for high-throughput sequence data as examples.


Assuntos
Armazenamento e Recuperação da Informação , Plantas , Biologia Computacional , Sistemas de Gerenciamento de Base de Dados , Genômica , Internet , Plantas/genética , Análise de Sequência/métodos , Software , Fluxo de Trabalho
10.
Curr Protoc Bioinformatics ; 43: 9.15.1-9.15.20, 2013 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26270172

RESUMO

Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere.


Assuntos
Computação em Nuvem , Software , Arabidopsis/genética , Genoma , Internet , Análise de Sequência de RNA , Interface Usuário-Computador
11.
Artigo em Inglês | MEDLINE | ID: mdl-24145117

RESUMO

Prior studies of the elasmobranch rectal gland have demonstrated that feeding induces profound and rapid up regulation of the gland's ability to secrete concentrated NaCl solutions and the metabolic capacity to support this highly ATP consuming process. We undertook the current study to attempt to determine the degree to which up regulation of mRNA transcription was involved in the gland's activation. cDNA libraries were created from mRNA isolated from rectal glands of fasted (7days post-feeding) and fed (6h and 22h post-feeding) spiny dogfish sharks (Squalus acanthias), and the libraries were subjected to suppression subtractive hybridization (SSH) analysis. Quantitative real time PCR (qPCR) was also used to ascertain the mRNA expression of several genes revealed by the SSH analysis. In total the treatments changed the abundance of 170 transcripts, with 103 up regulated by feeding, and 67 up regulated by fasting. While many of the changes took place in 'expected' Gene Ontology (GO) categories (e.g., metabolism, transport, structural proteins, DNA and RNA turnover, etc.), KEGG analysis revealed a number of categories which identify oxidative stress as a topic of interest for the gland. GO analysis also revealed that branched chain essential amino acids (e.g., valine, leucine, isoleucine) are potential metabolic fuels for the rectal gland. In addition, up regulation of transcripts for many genes in the anticipated GO categories did not agree (i.e., fasting down regulated in feeding treatments) with previously observed increases in their respective proteins/enzyme activities. These results suggest an 'anticipatory' storage of selected mRNAs which presumably supports the rapid translation of proteins upon feeding activation of the gland.


Assuntos
Glândula de Sal/metabolismo , Squalus acanthias/genética , Animais , Jejum/fisiologia , Alimentos , Transporte de Íons/genética , Masculino , Estresse Oxidativo/genética , RNA Mensageiro/metabolismo , Regulação para Cima
12.
Database (Oxford) ; 2011: bar023, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21856757

RESUMO

The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org.


Assuntos
Bases de Dados Genéticas , Genoma , Genômica/métodos , Internet , Software , Animais , Caenorhabditis elegans/genética , DNA/genética , Drosophila melanogaster/genética , Humanos
13.
Curr Protoc Bioinformatics ; Chapter 9: Unit 9.12, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20836076

RESUMO

Genome Browsers are software that allow the user to view genome annotations in the context of a reference sequence, such as a chromosome, contig, scaffold, etc. The Generic Genome Browser (GBrowse) is an open-source genome browser package developed as part of the Generic Model Database Project (see UNIT ; Stein et al., 2002). The increasing number of sequenced genomes has led to a corresponding growth in the field of comparative genomics, which requires methods to view and compare multiple genomes. Using the same software framework as GBrowse, the Generic Synteny Browser (GBrowse_syn) allows the comparison of colinear regions of multiple genomes using the familiar GBrowse-style Web page. Like GBrowse, GBrowse_syn can be configured to display any organism, and is currently the synteny browser used for model organisms such as C. elegans (WormBase; http://www.wormbase.org; see UNIT 1.8) and Arabidopsis (TAIR; http://www.arabidopsis.org; see UNIT 1.1). GBrowse_syn is part of the GBrowse software package and can be downloaded from the Web and run on any Unix-like operating system, such as Linux, Solaris, or MacOS X. GBrowse_syn is still under active development. This unit will cover installation and configuration as part of the current stable version of GBrowse (v. 1.71).


Assuntos
Genoma , Genômica/métodos , Software , Sintenia/genética
14.
Dev Biol ; 302(2): 627-45, 2007 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-17113066

RESUMO

A SAGE library was prepared from hand-dissected intestines from adult Caenorhabditis elegans, allowing the identification of >4000 intestinally-expressed genes; this gene inventory provides fundamental information for understanding intestine function, structure and development. Intestinally-expressed genes fall into two broad classes: widely-expressed "housekeeping" genes and genes that are either intestine-specific or significantly intestine-enriched. Within this latter class of genes, we identified a subset of highly-expressed highly-validated genes that are expressed either exclusively or primarily in the intestine. Over half of the encoded proteins are candidates for secretion into the intestinal lumen to hydrolyze the bacterial food (e.g. lysozymes, amoebapores, lipases and especially proteases). The promoters of this subset of intestine-specific/intestine-enriched genes were analyzed computationally, using both a word-counting method (RSAT oligo-analysis) and a method based on Gibbs sampling (MotifSampler). Both methods returned the same over-represented site, namely an extended GATA-related sequence of the general form AHTGATAARR, which agrees with experimentally determined cis-acting control sequences found in intestine genes over the past 20 years. All promoters in the subset contain such a site, compared to <5% for control promoters; moreover, our analysis suggests that the majority (perhaps all) of genes expressed exclusively or primarily in the worm intestine are likely to contain such a site in their promoters. There are three zinc-finger GATA-type factors that are candidates to bind this extended GATA site in the differentiating C. elegans intestine: ELT-2, ELT-4 and ELT-7. All evidence points to ELT-2 being the most important of the three. We show that worms in which both the elt-4 and the elt-7 genes have been deleted from the genome are essentially wildtype, demonstrating that ELT-2 provides all essential GATA-factor functions in the intestine. The SAGE analysis also identifies more than a hundred other transcription factors in the adult intestine but few show an RNAi-induced loss-of-function phenotype and none (other than ELT-2) show a phenotype primarily in the intestine. We thus propose a simple model in which the ELT-2 GATA factor directly participates in the transcription of all intestine-specific/intestine-enriched genes, from the early embryo through to the dying adult. Other intestinal transcription factors would thus modulate the action of ELT-2, depending on the worm's nutritional and physiological needs.


Assuntos
Proteínas de Caenorhabditis elegans/fisiologia , Caenorhabditis elegans/genética , Fatores de Transcrição GATA/fisiologia , Modelos Genéticos , Transcrição Gênica , Animais , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/genética , Fatores de Transcrição GATA/genética , Perfilação da Expressão Gênica , Mucosa Intestinal/metabolismo , Regiões Promotoras Genéticas
15.
Bioinformatics ; 18(11): 1538-9, 2002 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-12424127

RESUMO

AcePrimer is an internet-accessed application based on CGI/Perl programming that designs PCR primers to search for deletion alleles in Caenorhabditis elegans gene knockout experiments and uses electronic PCR to search the entire genomic DNA sequence for potential false priming or multiple PCR amplification targets. Features such as the ability to target specific exons with the 'poison primer' approach and evaluation of primers with electronic PCR provide a flexible, web-based approach to design effective primers whilst minimizing the need for empirical optimization of PCR experiments.


Assuntos
Caenorhabditis elegans/genética , Primers do DNA/genética , Sistemas de Gerenciamento de Base de Dados , Deleção de Genes , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA/métodos , Software , Alelos , Animais , Sequência de Bases , Simulação por Computador , Bases de Dados de Ácidos Nucleicos , Internet , Dados de Sequência Molecular , Alinhamento de Sequência/métodos , Interface Usuário-Computador
16.
Genome Res ; 14(10B): 2083-92, 2004 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-15489330

RESUMO

The Mammalian Gene Collection (MGC) consortium (http://mgc.nci.nih.gov) seeks to establish publicly available collections of full-ORF cDNAs for several organisms of significance to biomedical research, including human. To date over 15,200 human cDNA clones containing full-length open reading frames (ORFs) have been identified via systematic expressed sequence tag (EST) analysis of a diverse set of cDNA libraries; however, further systematic EST analysis is no longer an efficient method for identifying new cDNAs. As part of our involvement in the MGC program, we have developed a scalable method for targeted recovery of cDNA clones to facilitate recovery of genes absent from the MGC collection. First, cDNA is synthesized from various RNAs, followed by polymerase chain reaction (PCR) amplification of transcripts in 96-well plates using gene-specific primer pairs flanking the ORFs. Amplicons are cloned into a sequencing vector, and full-length sequences are obtained. Sequences are processed and assembled using Phred and Phrap, and analyzed using Consed and a number of bioinformatics methods we have developed. Sequences are compared with the Reference Sequence (RefSeq) database, and validation of sequence discrepancies is attempted using other sequence databases including dbEST and dbSNP. Clones with identical sequence to RefSeq or containing only validated changes will become part of the MGC human gene collection. Clones containing novel splice variants or polymorphisms have also been identified. Our approach to clone recovery, applied at large scale, has the potential to recover many and possibly most of the genes absent from the MGC collection.


Assuntos
DNA Complementar/química , Genoma Humano , Fases de Leitura Aberta/genética , Análise de Sequência de DNA , Clonagem Molecular , DNA Complementar/análise , Etiquetas de Sequências Expressas , Biblioteca Gênica , Humanos , Plasmídeos , Reação em Cadeia da Polimerase
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA