Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nature ; 508(7497): 469-76, 2014 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-24759409

RESUMO

The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.


Assuntos
Doença , Predisposição Genética para Doença/genética , Variação Genética/genética , Guias como Assunto , Reações Falso-Positivas , Genes/genética , Humanos , Disseminação de Informação , Editoração , Reprodutibilidade dos Testes , Projetos de Pesquisa , Pesquisa Translacional Biomédica/normas
2.
J Mol Biol ; 360(4): 893-906, 2006 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-16784753

RESUMO

The geometry of the polypeptide exit tunnel has been determined using the crystal structure of the large ribosomal subunit from Haloarcula marismortui. The tunnel is a component of a much larger, interconnected system of channels accessible to solvent that permeates the subunit and is connected to the exterior at many points. Since water and other small molecules can diffuse into and out of the tunnel along many different trajectories, the large subunit cannot be part of the seal that keeps ions from passing through the ribosome-translocon complex. The structure referred to as the tunnel is the only passage in the solvent channel system that is both large enough to accommodate nascent peptides, and that traverses the particle. For objects of that size, it is effectively an unbranched tube connecting the peptidyl transferase center of the large subunit and the site where nascent peptides emerge. At no point is the tunnel big enough to accommodate folded polypeptides larger than alpha-helices.


Assuntos
Haloarcula marismortui/química , Peptídeos/química , Ribossomos/química , Chaperoninas/química , Modelos Moleculares , Solventes , Propriedades de Superfície , Água/química
3.
Artigo em Inglês | MEDLINE | ID: mdl-17381286

RESUMO

We have used genomic tiling arrays to identify transcribed regions throughout the human genome. Analysis of the mapping results of RNA isolated from five cell/tissue types, NB4 cells, NB4 cells treated with retinoic acid (RA), NB4 cells treated with 12-O-tetradecanoylphorbol-13 acetate (TPA), neutrophils, and placenta, throughout the ENCODE region reveals a large number of novel transcribed regions. Interestingly, neutrophils exhibit a great deal of novel expression in several intronic regions. Comparison of the hybridization results of NB4 cells treated with different stimuli relative to untreated cells reveals that many new regions are expressed upon cell differentiation. One such region is the Hox locus, which contains a large number of novel regions expressed in a number of cell types. Analysis of the trinucleotide composition of the novel transcribed regions reveals that it is similar to that of known exons. These results suggest that many of the novel transcribed regions may have a functional role.


Assuntos
Genoma Humano , Transcrição Gênica , Diferenciação Celular/genética , Éxons , Perfilação da Expressão Gênica , Humanos , Íntrons , Neutrófilos/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , RNA/genética , RNA/metabolismo
4.
Exp Neurol ; 196(1): 18-29, 2005 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16081066

RESUMO

A strong relationship between hypoxia and fetal brain damage has been described. Specific susceptibility of the GABAergic neurons to these conditions may be crucial to the damage induced. We have previously shown, in a mouse model, that maternal pretreatment with magnesium sulfate (Mg) partially prevented the behavioral consequences of maternal hypoxia in the adult offspring. Here, we tested the effect of maternal hypoxia and maternal Mg load on the GABAergic system of 8-month-old offspring. The immunoreactivity (IR) of several proteins expressed in GABAergic neurons and inhibitory synapses was analyzed in the following regions of the adult offspring brain: hippocampus, cortical M1, caudate putamen, and lateral globus pallidus. Maternal hypoxia reduced the density of parvalbumin (PV)-IR neurons in the hippocampus. The density of PV-IR and calbindin (CB)-IR neurons was also reduced in the deep and superficial layers of the M1. Maternal pretreatment with Mg had a prophylactic action in the superficial, but not the deep, layers of M1. Also, in offspring from the maternal hypoxia group, the vesicular GABA transporter (VGAT)-IR was enhanced in the hippocampal CA1 and hilus regions. No effect of maternal hypoxia on VGAT-IR was observed in the M1. However, maternal pretreatment with Mg enhanced VGAT-IR and glutamate decarboxylase-IR in the deep layers of the M1. In the globus pallidus, maternal hypoxia enhanced CB-IR, which was prevented by maternal pretreatment with Mg. In conclusion, maternal hypoxia induced a loss of PV-IR and CB-IR neurons; maternal pretreatment with Mg partially protected these neuron populations. An increase in proteins of inhibitory synapses, observed under hypoxic conditions in several brain regions, may be a result of some compensatory mechanism.


Assuntos
Lesões Encefálicas/prevenção & controle , Hipóxia Encefálica/fisiopatologia , Sulfato de Magnésio/farmacologia , Fármacos Neuroprotetores/farmacologia , Efeitos Tardios da Exposição Pré-Natal , Ácido gama-Aminobutírico/metabolismo , Animais , Lesões Encefálicas/etiologia , Lesões Encefálicas/patologia , Calbindinas , Feminino , Glutamato Descarboxilase/efeitos dos fármacos , Glutamato Descarboxilase/metabolismo , Hipocampo/metabolismo , Hipocampo/patologia , Hipóxia Encefálica/complicações , Imuno-Histoquímica , Camundongos , Neurônios/efeitos dos fármacos , Neurônios/metabolismo , Neurônios/patologia , Parvalbuminas/efeitos dos fármacos , Parvalbuminas/metabolismo , Gravidez , Proteína G de Ligação ao Cálcio S100/efeitos dos fármacos , Proteína G de Ligação ao Cálcio S100/metabolismo , Proteínas Vesiculares de Transporte de Aminoácidos Inibidores/efeitos dos fármacos , Proteínas Vesiculares de Transporte de Aminoácidos Inibidores/metabolismo , Ácido gama-Aminobutírico/efeitos dos fármacos
5.
J Mol Biol ; 346(2): 477-92, 2005 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-15670598

RESUMO

Traditionally, for biomolecular packing calculations research has focused on proteins. Besides proteins, RNA is the other large biomolecule that has tertiary structure interactions and complex packing. No one has yet quantitatively investigated RNA packing or compared its packing to that of proteins because, until recently, there were no large RNA structures. Here we address this question in detail, using Voronoi volume calculations on a set of high-resolution RNA crystal structures. We do a careful parameterization, taking into account many factors such as atomic radii, crystal packing, structural complexity, solvent, and associated protein to obtain a self-consistent, universal set of volumes that can be applied to both RNA and protein. We report this set of volumes, which we call the NucProt parameter set. Our measured values are consistent across the many different RNA structures and packing environments. When common atom types are compared between proteins and RNA, nine of 12 types show that RNA has a smaller volume and packs more tightly than protein, suggesting that close-packing may be as important for the folding of RNAs as it is for proteins. Moreover, calculated partial specific volumes show that RNA bases pack more densely than corresponding aromatic residues from proteins. Finally, we find that RNA bases have similar packing volumes to DNA bases, despite the absence of tertiary contacts in DNA. Programs, parameter sets and raw data are available online at.


Assuntos
Modelos Moleculares , Proteínas/química , RNA/química , Cristalografia por Raios X , Conformação de Ácido Nucleico , Conformação Proteica
6.
Prostate Cancer Prostatic Dis ; 6(4): 286-9, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-14663468

RESUMO

Some studies suggest that several tumors have a greater incidence in those patients with a high fat diet, such as colon, breast, and prostate. However, we wanted to determine the effects of obesity alone, independent of diet, on the progression of prostate tumor growth. Using a genetic model of obese and lean Zucker rats, we wanted to demonstrate any sera differences in the concentration of basic fibroblast growth factor (FGF-2) and vascular endothelial cell growth factor (VEGF), two important factors involved in the growth and progression of prostate cancer. We also wanted to investigate if there were any differences in immune function between the two sera, which could also account for uninhibited tumor growth, as well as differences in mitogenic stimulation. Female Zucker rat obese and lean sera were analyzed using ELISA assays for FGF-2, VEGF, and macrophage inflammatory protein-1 alpha (MIP-1a), as a measure of macrophage function. In addition, the sera of lean and obese sera were plated on wells growing LNCaP prostate cancer cells to determine differences in mitogenicity. We found a greater concentration of FGF-2 in the sera from obese Zucker rats compared to lean Zucker rats: 6.32+/-0.56 vs 3.48+/-0.34 pg/ml, respectively, P<0.05). We also demonstrated a greater concentration of VEGF in obese rat sera compared to lean sera: 54.4+/-4.1 vs 38.0+/-2.9 pg/mL, respectively, P<0.05). We detected a trend in mitogenic stimulation among LNCaP cells along the higher concentrations of the dose-response curve (0.72+/-0.06 vs 0.51+/-0.5). However, this was not statistically significant. In addition, we did not find a significant difference in MIP-1a macrophage activity levels between sera. To conclude, we speculate that the greater concentrations of VEGF and FGF-2 in the sera of obese rodents vs lean rodents may account for some of the differences seen in obesity-related tumor growth seen in the human condition. However, the lack of any sera differences of immune function, as measured by macrophage activity, as well as no significant differences on mitogenic proliferation on LNCaP prostate cancer cells, suggests that other mechanisms may exist to explain differences seen in obesity-related prostate tumor biology.


Assuntos
Indutores da Angiogênese/sangue , Fator 2 de Crescimento de Fibroblastos/sangue , Mitógenos/farmacologia , Obesidade/sangue , Obesidade/imunologia , Neoplasias da Próstata/complicações , Neoplasias da Próstata/patologia , Fator A de Crescimento do Endotélio Vascular/sangue , Animais , Divisão Celular/efeitos dos fármacos , Linhagem Celular Tumoral , Quimiocina CCL4 , Feminino , Humanos , Proteínas Inflamatórias de Macrófagos/metabolismo , Masculino , Obesidade/complicações , Neoplasias da Próstata/sangue , Neoplasias da Próstata/imunologia , Ratos , Ratos Zucker , Tetra-Hidroisoquinolinas/farmacologia
9.
Nucleic Acids Res ; 30(20): 4574-82, 2002 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-12384605

RESUMO

We present a prototype of a new database tool, GeneCensus, which focuses on comparing genomes globally, in terms of the collective properties of many genes, rather than in terms of the attributes of a single gene (e.g. sequence similarity for a particular ortholog). The comparisons are presented in a visual fashion over the web at GeneCensus.org. The system concentrates on two types of comparisons: (i) trees based on the sharing of generalized protein families between genomes, and (ii) whole pathway analysis in terms of activity levels. For the trees, we have developed a module (TreeViewer) that clusters genomes in terms of the folds, superfamilies or orthologs--all can be considered as generalized 'families' or 'protein parts'--they share, and compares the resulting trees side-by-side with those built from sequence similarity of individual genes (e.g. a traditional tree built on ribosomal similarity). We also include comparisons to trees built on whole-genome dinucleotide or codon composition. For pathway comparisons, we have implemented a module (PathwayPainter) that graphically depicts, in selected metabolic pathways, the fluxes or expression levels of the associated enzymes (i.e. generalized 'activities'). One can, consequently, compare organisms (and organism states) in terms of representations of these systemic quantities. Develop ment of this module involved compiling, calculating and standardizing flux and expression information from many different sources. We illustrate pathway analysis for enzymes involved in central metabolism. We are able to show that, to some degree, flux and expression fluctuations have characteristic values in different sections of the central metabolism and that control points in this system (e.g. hexokinase, pyruvate kinase, phosphofructokinase, isocitrate dehydrogenase and citric synthase) tend to be especially variable in flux and expression. Both the TreeViewer and PathwayPainter modules connect to other information sources related to individual-gene or organism properties (e.g. a single-gene structural annotation viewer).


Assuntos
Bases de Dados Genéticas , Genoma , Genômica/métodos , Proteínas/genética , Proteínas/metabolismo , Animais , Composição de Bases , Enzimas/genética , Expressão Gênica , Internet , Fases de Leitura Aberta , Proteínas/classificação , Análise de Sequência/métodos
10.
J Mol Biol ; 314(5): 1053-66, 2001 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-11743722

RESUMO

The complexity of biological systems provides for a great diversity of relationships between genes. The current analysis of whole-genome expression data focuses on relationships based on global correlation over a whole time-course, identifying clusters of genes whose expression levels simultaneously rise and fall. There are, of course, other potential relationships between genes, which are missed by such global clustering. These include activation, where one expects a time-delay between related expression profiles, and inhibition, where one expects an inverted relationship. Here, we propose a new method, which we call local clustering, for identifying these time-delayed and inverted relationships. It is related to conventional gene-expression clustering in a fashion analogous to the way local sequence alignment (the Smith-Waterman algorithm) is derived from global alignment (Needleman-Wunsch). An integral part of our method is the use of random score distributions to assess the statistical significance of each cluster. We applied our method to the yeast cell-cycle expression dataset and were able to detect a considerable number of additional biological relationships between genes, beyond those resulting from conventional correlation. We related these new relationships between genes to their similarity in function (as determined from the MIPS scheme) or their having known protein-protein interactions (as determined from the large-scale two-hybrid experiment); we found that genes strongly related by local clustering were considerably more likely than random to have a known interaction or a similar cellular role. This suggests that local clustering may be useful in functional annotation of uncharacterized genes. We examined many of the new relationships in detail. Some of them were already well-documented examples of inhibition or activation, which provide corroboration for our results. For instance, we found an inverted expression profile relationship between genes YME1 and YNT20, where the latter has been experimentally documented as a bypass suppressor of the former. We also found new relationships involving uncharacterized yeast genes and were able to suggest functions for many of them. In particular, we found a time-delayed expression relationship between J0544 (which has not yet been functionally characterized) and four genes associated with the mitochondria. This suggests that J0544 may be involved in the control or activation of mitochondrial genes. We have also looked at other, less extensive datasets than the yeast cell-cycle and found further interesting relationships. Our clustering program and a detailed website of clustering results is available at http://www.bioinfo.mbb.yale.edu/expression/cluster (or http://www.genecensus.org/expression/cluster).


Assuntos
Ciclo Celular/genética , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/genética , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Razão de Chances , Ligação Proteica , Alinhamento de Sequência/métodos , Software , Fatores de Tempo
11.
Urology ; 58(6): 864-9, 2001 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-11744447

RESUMO

OBJECTIVES: To describe our experience with patients with urologic cancers who also have malignancies of nonurologic origin, before, after, or simultaneously, to review the literature, and to suggest treatment options. METHODS: We reviewed our institutions' tumor registry from 1995 to 2000 to discover how many patients had a urologic malignancy and another nonurologic cancer (antecedent, subsequent, or synchronous). We reviewed Medline from 1966 to 2000 and also questioned several urologists at major centers in the United States concerning this clinical dilemma. RESULTS: We encountered 18 patients during a 6-year period with a urologic cancer and another primary malignancy. Thirteen patients had their second cancer detected during the workup of their primary urologic tumor. Two patients developed a second tumor within 1 year of treatment of the primary urologic tumor. Another patient was referred with two primaries already diagnosed, and another had renal carcinoma detected during her colon cancer workup. We found that multiple tumors, although very rare, are usually detected during the preoperative workup of the primary tumor, usually by physical examination and improved radiologic imaging, or during the follow-up examinations. Most reports suggest that treatment should be performed simultaneously, especially if the lesions are relatively small and require a single incision, and the patient's medical condition allows longer anesthesia exposure. If these prerequisites are not met, most investigators agree that treatment should be directed at the more aggressive lesion first, which may improve the condition and/or survival, and thus, if a second operation is warranted, it will be possible. CONCLUSIONS: Although patients with multiple malignancies are rare, the urologist and/or other specialist should be alerted to this possibility when evaluating patients for the initially presenting symptoms and/or detected tumor, as well as during the follow-up evaluations.


Assuntos
Linfoma/epidemiologia , Neoplasias Primárias Múltiplas/epidemiologia , Neoplasias Urológicas/epidemiologia , Idoso , Neoplasias da Mama/epidemiologia , Neoplasias da Mama/cirurgia , Neoplasias do Sistema Digestório/epidemiologia , Neoplasias do Sistema Digestório/cirurgia , Feminino , Humanos , Neoplasias Renais/epidemiologia , Neoplasias Renais/cirurgia , Neoplasias Pulmonares/epidemiologia , Neoplasias Pulmonares/cirurgia , Linfoma/cirurgia , Masculino , Melanoma/epidemiologia , Melanoma/cirurgia , Pessoa de Meia-Idade , Neoplasias Primárias Múltiplas/cirurgia , Neoplasias da Próstata/epidemiologia , Neoplasias da Próstata/cirurgia , Neoplasias da Bexiga Urinária/epidemiologia , Neoplasias da Bexiga Urinária/cirurgia , Neoplasias Urológicas/cirurgia
12.
J Mol Biol ; 313(4): 673-81, 2001 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-11697896

RESUMO

Global surveys of genomes measure the usage of essential molecular parts, defined here as protein families, superfamilies or folds, in different organisms. Based on surveys of the first 20 completely sequenced genomes, we observe that the occurrence of these parts follows a power-law distribution. That is, the number of distinct parts (F) with a given genomic occurrence (V) decays as F=aV(-b), with a few parts occurring many times and most occurring infrequently. For a given organism, the distributions of families, superfamilies and folds are nearly identical, and this is reflected in the size of the decay exponent b. Moreover, the exponent varies between different organisms, with those of smaller genomes displaying a steeper decay (i.e. larger b). Clearly, the power law indicates a preference to duplicate genes that encode for molecular parts which are already common. Here, we present a minimal, but biologically meaningful model that accurately describes the observed power law. Although the model performs equally well for all three protein classes, we focus on the occurrence of folds in preference to families and superfamilies. This is because folds are comparatively insensitive to the effects of point mutations that can cause a family member to diverge beyond detectable similarity. In the model, genomes evolve through two basic operations: (i) duplication of existing genes; (ii) net flow of new genes. The flow term is closely related to the exponent b and can accommodate considerable gene loss; however, we demonstrate that the observed data is reproduced best with a net inflow, i.e. with more gene gain than loss. Moreover, we show that prokaryotes have much higher rates of gene acquisition than eukaryotes, probably reflecting lateral transfer. A further natural outcome from our model is an estimation of the fold composition of the initial genome, which potentially relates to the common ancestor for modern organisms. Supplementary material pertaining to this work is available from www.partslist.org/powerlaw.


Assuntos
Evolução Molecular , Genoma , Família Multigênica , Dobramento de Proteína , Proteínas/química , Proteínas/genética , Animais , Biologia Computacional , Simulação por Computador , Genes Duplicados/genética , Humanos , Modelos Genéticos , Família Multigênica/genética , Proteínas/classificação , Proteínas/metabolismo , Proteoma
14.
Genome Res ; 11(10): 1632-40, 2001 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-11591640

RESUMO

Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available at http://partslist.org/func or http://bioinfo.mbb.yale.edu/partslist/func.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Sequência Conservada , Bases de Dados Factuais , Dobramento de Proteína , Estrutura Secundária de Proteína/fisiologia , Estrutura Terciária de Proteína/fisiologia , Relação Quantitativa Estrutura-Atividade
15.
Bioinformatics ; 17(10): 949-56, 2001 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-11673240

RESUMO

MOTIVATION: Traditionally, for packing calculations people have collected atoms together into a number of distinct 'types'. These, in fact, often represent a heavy atom and its associated hydrogens (i.e. a united atom). Also, atom typing is usually done according to basic chemistry, giving rise to 20-30 protein atom types, such as carbonyl carbons, methyl groups, and hydroxyl groups. No one has yet investigated how similar in packing these chemically derived types are. Here we address this question in detail, using Voronoi volume calculations on a set of high-resolution crystal structures. RESULTS: We perform a rigorous clustering analysis with cross-validation on tens of thousands of atom volumes and attempt to compile them into types based purely on packing. From our analysis, we are able to determine a 'minimal' set of 18 atom types that most efficiently represent the spectrum of packing in proteins. Furthermore, we are able to uncover a number of inconsistencies in traditional chemical typing schemes, where differently typed atoms have almost the same effective size. In particular, we find that tetrahedral carbons with two hydrogens are almost identical in size to many aromatic carbons with a single hydrogen. AVAILABILITY: Programs available from http://geometry.molmovdb.org. CONTACT: JerryTsai@TAMU.edu; neil.voss@yale.edu; Mark.Gerstein@yale.edu SUPPLEMENTARY INFORMATION: Available at http://geometry.molmovdb.org.


Assuntos
Proteínas/química , Carbono/química , Análise por Conglomerados , Biologia Computacional , Bases de Dados de Proteínas , Hidrogênio/química , Estrutura Molecular , Software
16.
Methods Inf Med ; 40(4): 346-58, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11552348

RESUMO

BACKGROUND: The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. OBJECTIVES: Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. METHODS: Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. RESULTS AND CONCLUSIONS: Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.


Assuntos
Biologia Computacional , Biologia Computacional/tendências , Proteínas de Ligação a DNA , Desenho de Fármacos , Expressão Gênica , Genômica , Humanos , Homologia de Sequência , Terminologia como Assunto
17.
Genome Res ; 11(9): 1463-8, 2001 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-11544189

RESUMO

With the completion of genome sequences, the current challenge for biology is to determine the functions of all gene products and to understand how they contribute in making an organism viable. For the first time, biological systems can be viewed as being finite, with a limited set of molecular parts. However, the full range of biological processes controlled by these parts is extremely complex. Thus, a key approach in genomic research is to divide the cellular contents into distinct sub-populations, which are often given an "-omic" term. For example, the proteome is the full complement of proteins encoded by the genome, and the secretome is the part of it secreted from the cell. Carrying this further, we suggest the term "translatome" to describe the members of the proteome weighted by their abundance, and the "functome" to describe all the functions carried out by these. Once the individual sub-populations are defined and analyzed, we can then try to reconstruct the full organism by interrelating them, eventually allowing for a full and dynamic view of the cell. All this is, of course, made possible because of the increasing amount of large-scale data resulting from functional genomics experiments. However, there are still many difficulties resulting from the noisiness and complexity of the information. To some degree, these can be overcome through averaging with broad proteomic categories such as those implicit in functional and structural classifications. For illustration, we discuss one example in detail, interrelating transcript and cellular protein populations (transcriptome and translatome). Further information is available at http://bioinfo.mbb.yale.edu/what-is-it.


Assuntos
Bacillus subtilis/genética , Genoma Bacteriano , Proteoma/fisiologia , Bacillus subtilis/fisiologia , Biologia Computacional , Proteoma/genética , Proteoma/metabolismo
19.
Science ; 293(5537): 2101-5, 2001 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-11474067

RESUMO

To facilitate studies of the yeast proteome, we cloned 5800 open reading frames and overexpressed and purified their corresponding proteins. The proteins were printed onto slides at high spatial density to form a yeast proteome microarray and screened for their ability to interact with proteins and phospholipids. We identified many new calmodulin- and phospholipid-interacting proteins; a common potential binding motif was identified for many of the calmodulin-binding proteins. Thus, microarrays of an entire eukaryotic proteome can be prepared and screened for diverse biochemical activities. The microarrays can also be used to screen protein-drug interactions and to detect posttranslational modifications.


Assuntos
Proteínas Fúngicas/metabolismo , Proteoma , Saccharomyces cerevisiae/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Calmodulina/metabolismo , Proteínas de Ligação a Calmodulina/metabolismo , Membrana Celular/metabolismo , Clonagem Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Glucose/metabolismo , Lipossomos/metabolismo , Proteínas de Membrana/metabolismo , Dados de Sequência Molecular , Fases de Leitura Aberta , Biblioteca de Peptídeos , Fosfatidilcolinas/metabolismo , Fosfatidilinositóis/metabolismo , Fosfolipídeos/metabolismo , Ligação Proteica , Proteínas Recombinantes de Fusão/metabolismo , Saccharomyces cerevisiae/genética , Transdução de Sinais , Estreptavidina/metabolismo
20.
Nucleic Acids Res ; 29(13): 2884-98, 2001 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-11433035

RESUMO

High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information about cloning, expression, purification, biophysical characterization and structure determination via NMR spectroscopy or X-ray crystallography. It will be essential to develop specifications and ontologies for standardizing this information to make it amenable to retrospective analysis. To this end we created the SPINE database and analysis system for the Northeast Structural Genomics Consortium. SPINE, which is available at bioinfo.mbb.yale.edu/nesg or nesg.org, is specifically designed to enable distributed scientific collaboration via the Internet. It was designed not just as an information repository but as an active vehicle to standardize proteomics data in a form that would enable systematic data mining. The system features an intuitive user interface for interactive retrieval and modification of expression construct data, query forms designed to track global project progress and external links to many other resources. Currently the database contains experimental data on 985 constructs, of which 740 are drawn from Methanobacterium thermoautotrophicum, 123 from Saccharomyces cerevisiae, 93 from Caenorhabditis elegans and the remainder from other organisms. We developed a comprehensive set of data mining features for each protein, including several related to experimental progress (e.g. expression level, solubility and crystallization) and 42 based on the underlying protein sequence (e.g. amino acid composition, secondary structure and occurrence of low complexity regions). We demonstrate in detail the application of a particular machine learning approach, decision trees, to the tasks of predicting a protein's solubility and propensity to crystallize based on sequence features. We are able to extract a number of key rules from our trees, in particular that soluble proteins tend to have significantly more acidic residues and fewer hydrophobic stretches than insoluble ones. One of the characteristics of proteomics data sets, currently and in the foreseeable future, is their intermediate size ( approximately 500-5000 data points). This creates a number of issues in relation to error estimation. Initially we estimate the overall error in our trees based on standard cross-validation. However, this leaves out a significant fraction of the data in model construction and does not give error estimates on individual rules. Therefore, we present alternative methods to estimate the error in particular rules.


Assuntos
Biologia Computacional/métodos , Bases de Dados como Assunto , Proteoma/química , Software , Animais , Caenorhabditis elegans/química , Clonagem Molecular , Cristalização , Árvores de Decisões , Perfilação da Expressão Gênica , Armazenamento e Recuperação da Informação , Internet , Methanobacterium/química , Probabilidade , Conformação Proteica , Proteoma/genética , Reprodutibilidade dos Testes , Projetos de Pesquisa , Saccharomyces cerevisiae/química , Solubilidade , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA