Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Genomics ; 7: 48, 2006 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-16533400

RESUMO

BACKGROUND: Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins. DESCRIPTION: From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system. CONCLUSION: Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Animais , Caenorhabditis elegans/genética , Biologia Computacional , Drosophila melanogaster/genética , Genômica , Humanos , Internet , Proteoma/análise , Homologia de Sequência , Integração de Sistemas , Interface Usuário-Computador
2.
Nucleic Acids Res ; 30(2): E6, 2002 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-11788732

RESUMO

We describe here an efficient strategy for simultaneous genome mapping and sequencing. The approach is based on physically oriented, overlapping restriction fragment libraries called slalom libraries. Slalom libraries combine features of general genomic, jumping and linking libraries. Slalom libraries can be adapted to different applications and two main types of slalom libraries are described in detail. This approach was used to map and sequence (with approximately 46% coverage) two human P1-derived artificial chromosome (PAC) clones, each of approximately 100 kb. This model experiment demonstrates the feasibility of the approach and shows that the efficiency (cost-effectiveness and speed) of existing mapping/sequencing methods could be improved at least 5-10-fold. Furthermore, since the efficiency of contig assembly in the slalom approach is virtually independent of length of sequence reads, even short sequences produced by rapid, high throughput sequencing techniques would suffice to complete a physical map and a sequence scan of a small genome.


Assuntos
Biblioteca Gênica , Genoma , Genômica/métodos , Mapeamento Físico do Cromossomo/métodos , Análise de Sequência de DNA/métodos , Cromossomos Artificiais Humanos/genética , Cromossomos Artificiais Humanos/metabolismo , Clonagem Molecular , Desoxirribonuclease BamHI/metabolismo , Desoxirribonuclease EcoRI/metabolismo , Desoxirribonucleases de Sítio Específico do Tipo II/metabolismo , Genoma Humano , Genômica/economia , Humanos , Mapeamento Físico do Cromossomo/economia , Sequências Repetitivas de Ácido Nucleico/genética , Mapeamento por Restrição , Análise de Sequência de DNA/economia , Fatores de Tempo
3.
Nucleic Acids Res ; 30(14): 3163-70, 2002 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-12136098

RESUMO

A set of 22 551 unique human NotI flanking sequences (16.2 Mb) was generated. More than 40% of the set had regions with significant similarity to known proteins and expressed sequences. The data demonstrate that regions flanking NotI sites are less likely to form nucleosomes efficiently and resemble promoter regions. The draft human genome sequence contained 55.7% of the NotI flanking sequences, Celera's database contained matches to 57.2% of the clones and all public databases (including non-human and previously sequenced NotI flanks) matched 89.2% of the NotI flanking sequences (identity > or =90% over at least 50 bp, data from December 2001). The data suggest that the shotgun sequencing approach used to generate the draft human genome sequence resulted in a bias against cloning and sequencing of NotI flanks. A rough estimation (based primarily on chromosomes 21 and 22) is that the human genome contains 15 000-20 000 NotI sites, of which 6000-9000 are unmethylated in any particular cell. The results of the study suggest that the existing tools for computational determination of CpG islands fail to identify a significant fraction of functional CpG islands, and unmethylated DNA stretches with a high frequency of CpG dinucleotides can be found even in regions with low CG content.


Assuntos
DNA/metabolismo , Desoxirribonucleases de Sítio Específico do Tipo II/metabolismo , Análise de Sequência de DNA/métodos , Linhagem Celular Transformada , Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , Ilhas de CpG/genética , DNA/química , DNA/genética , Bases de Dados de Ácidos Nucleicos , Genes/genética , Genoma Humano , Humanos , Dados de Sequência Molecular , Sequências Repetitivas de Ácido Nucleico/genética
4.
J Bioinform Comput Biol ; 3(3): 743-70, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16108092

RESUMO

Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system that is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts is described. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7344 produced good quality models (F-measure >0.7, nearly 60% of which were >0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.


Assuntos
Algoritmos , Genes , MEDLINE , Processamento de Linguagem Natural , Proteínas/classificação , Software , Terminologia como Assunto , Bases de Dados de Proteínas , Humanos , Vocabulário Controlado
5.
PLoS One ; 3(1): e1440, 2008 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-18213364

RESUMO

BACKGROUND: We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets. METHODOLOGY/PRINCIPAL FINDINGS: The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation. CONCLUSIONS/SIGNIFICANCE: The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/.


Assuntos
Biologia Computacional , Genoma Humano , Humanos
6.
Artigo em Inglês | MEDLINE | ID: mdl-16448034

RESUMO

Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system is described which is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7,344 produced good quality models (F-measure > 0.7, nearly 60% of which were > 0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.


Assuntos
Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , MEDLINE , Processamento de Linguagem Natural , Proteínas/classificação , Software , Terminologia como Assunto , Genes , Interface Usuário-Computador , Vocabulário Controlado
7.
Comp Funct Genomics ; 5(8): 584-95, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-18629180

RESUMO

In this paper we aim to create a reference data collection of Northern blot results and demonstrate how such a collection can enable a quantitative comparison of modern expression profiling techniques, a central component of functional genomics studies. Historically, Northern blots were the de facto standard for determining RNA transcript levels. However, driven by the demand for analysis of large sets of genes in parallel, high-throughput methods, such as microarrays, dominate modern profiling efforts. To facilitate assessment of these methods, in comparison to Northern blots, we created a database of published Northern results obtained with a standardized commercial multiple tissue blot (dbMTN). In order to demonstrate the utility of the dbMTN collection for technology comparison, we also generated expression profiles for genes across a set of human tissues, using multiple profiling techniques. No method produced profiles that were strongly correlated with the Northern blot data. The highest correlations to the Northern blot data were determined with microarrays for the subset of genes observed to be specifically expressed in a single tissue in the Northern analyses. The database and expression profiling data are available via the project website (http://www.cisreg.ca). We believe that emphasis on multitechnique validation of expression profiles is justified, as the correlation results between platforms are not encouraging on the whole. Supplementary material for this article can be found at: http://www.interscience.wiley.com/jpages/1531-6912/suppmat.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA