Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
2.
Curr Biol ; 11(21): 1706-10, 2001 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-11696330

RESUMEN

An important quest in modern biology is to identify genes involved in aging. Model organisms such as the nematode Caenorhabditis elegans are particularly useful in this regard. The C. elegans genome has been sequenced [1], and single gene mutations that extend adult life span have been identified [2]. Among these longevity-controlling loci are four apparently unrelated genes that belong to the clk family. In mammals, telomere length and structure can influence cellular, and possibly organismal, aging. Here, we show that clk-2 encodes a regulator of telomere length in C. elegans.


Asunto(s)
Envejecimiento/genética , Proteínas de Caenorhabditis elegans/genética , Genes de Helminto , Proteínas de Saccharomyces cerevisiae , Proteínas de Unión a Telómeros , Telómero/genética , Secuencia de Aminoácidos , Animales , Proteínas de Unión al ADN/genética , Datos de Secuencia Molecular , Mutación , ARN sin Sentido , ARN Interferente Pequeño , Tolerancia a Radiación , Homología de Secuencia de Aminoácido , Rayos X
3.
Mol Cell Biol ; 20(14): 5196-207, 2000 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-10866675

RESUMEN

Telomerase is a ribonucleoprotein reverse transcriptase responsible for the maintenance of one strand of telomere terminal repeats. The key protein subunit of the telomerase complex, known as TERT, possesses reverse transcriptase-like motifs that presumably mediate catalysis. These motifs are located in the C-terminal region of the polypeptide. Hidden Markov model-based sequence analysis revealed in the N-terminal region of all TERTs the presence of four conserved motifs, named GQ, CP, QFP, and T. Point mutation analysis of conserved residues confirmed the functional importance of the GQ motif. In addition, the distinct phenotypes of the GQ mutants suggest that this motif may play at least two distinct functions in telomere maintenance. Deletion analysis indicates that even the most N-terminal nonconserved region of yeast TERT (N region) is required for telomerase function. This N region exhibits a nonspecific nucleic acid binding activity that probably reflects an important physiologic function. Expression studies of various portions of the yeast TERT in Escherichia coli suggest that the N region and the GQ motif together may constitute a stable domain. We propose that all TERTs may have a bipartite organization, with an N-GQ domain connected to the other motifs through a flexible linker.


Asunto(s)
ARN , Telomerasa/genética , Telomerasa/metabolismo , Secuencia de Aminoácidos , Secuencia de Bases , Sitios de Unión , Secuencia Conservada , Proteínas de Unión al ADN , Endopeptidasas/metabolismo , Estabilidad de Enzimas , Datos de Secuencia Molecular , Mutación , Ácidos Nucleicos/metabolismo , Homología de Secuencia de Aminoácido
4.
Mol Cell Biol ; 21(16): 5591-604, 2001 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-11463840

RESUMEN

SATB1 is expressed primarily in thymocytes and orchestrates temporal and spatial expression of a large number of genes in the T-cell lineage. SATB1 binds to the bases of chromatin loop domains in vivo, recognizing a special DNA context with strong base-unpairing propensity. The majority of thymocytes are eliminated by apoptosis due to selection processes in the thymus. We investigated the fate of SATB1 during thymocyte and T-cell apoptosis. Here we show that SATB1 is specifically cleaved by a caspase 6-like protease at amino acid position 254 to produce a 65-kDa major fragment containing both a base-unpairing region (BUR)-binding domain and a homeodomain. We found that this cleavage separates the DNA-binding domains from amino acids 90 to 204, a region which we show to be a dimerization domain. The resulting SATB1 monomer loses its BUR-binding activity, despite containing both its DNA-binding domains, and rapidly dissociates from chromatin in vivo. We found this dimerization region to have sequence similarity to PDZ domains, which have been previously shown to be involved in signaling by conferring protein-protein interactions. SATB1 cleavage during Jurkat T-cell apoptosis induced by an anti-Fas antibody occurs concomitantly with the high-molecular-weight fragmentation of chromatin of ~50-kb fragments. Our results suggest that mechanisms of nuclear degradation early in apoptotic T cells involve efficient removal of SATB1 by disrupting its dimerization and cleavage of genomic DNA into loop domains to ensure rapid and efficient disassembly of higher-order chromatin structure.


Asunto(s)
Apoptosis/fisiología , Caspasas/fisiología , Cromatina/fisiología , Proteínas de Unión al ADN/fisiología , Proteínas de Unión a la Región de Fijación a la Matriz , Linfocitos T/patología , Linfocitos T/fisiología , Secuencia de Aminoácidos , Caspasa 6 , Proteínas de Unión al ADN/química , Dimerización , Humanos , Células Jurkat , Datos de Secuencia Molecular , Especificidad por Sustrato
5.
Mol Biol Cell ; 11(4): 1357-67, 2000 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-10749935

RESUMEN

To identify genes misregulated in the final stages of breast carcinogenesis, we performed differential display to compare the gene expression patterns of the human tumorigenic mammary epithelial cells, HMT-3522-T4-2, with those of their immediate premalignant progenitors, HMT-3522-S2. We identified a novel gene, called anti-zuai-1 (AZU-1), that was abundantly expressed in non- and premalignant cells and tissues but was appreciably reduced in breast tumor cell types and in primary tumors. The AZU-1 gene encodes an acidic 571-amino-acid protein containing at least two structurally distinct domains with potential protein-binding functions: an N-terminal serine and proline-rich domain with a predicted immunoglobulin-like fold and a C-terminal coiled-coil domain. In HMT-3522 cells, the bulk of AZU-1 protein resided in a detergent-extractable cytoplasmic pool and was present at much lower levels in tumorigenic T4-2 cells than in their nonmalignant counterparts. Reversion of the tumorigenic phenotype of T4-2 cells, by means described previously, was accompanied by the up-regulation of AZU-1. In addition, reexpression of AZU-1 in T4-2 cells, using viral vectors, was sufficient to reduce their malignant phenotype substantially, both in culture and in vivo. These results indicate that AZU-1 is a candidate breast tumor suppressor that may exert its effects by promoting correct tissue morphogenesis.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Proteínas Portadoras/metabolismo , Genes Supresores de Tumor/genética , Proteínas Supresoras de Tumor , Secuencia de Aminoácidos , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/aislamiento & purificación , Northern Blotting , Western Blotting , Neoplasias de la Mama , Proteínas Portadoras/genética , Proteínas Portadoras/aislamiento & purificación , Células Epiteliales/metabolismo , Femenino , Técnica del Anticuerpo Fluorescente , Perfilación de la Expresión Génica , Humanos , Datos de Secuencia Molecular , Lesiones Precancerosas , Estructura Terciaria de Proteína , ARN Neoplásico/análisis , Alineación de Secuencia , Células Tumorales Cultivadas
6.
BMC Bioinformatics ; 7: 250, 2006 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-16681860

RESUMEN

BACKGROUND: The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caenorhabditis Genetic Center (CGC) Bibliography using techniques from statistical information retrieval. Items in the CGC biomedical text corpus were modeled using the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian model which represents a document as a random mixture over latent topics; each topic is characterized by a distribution over words. RESULTS: An LDA model estimated from CGC items had better predictive performance than two standard models (unigram and mixture of unigrams) trained using the same data. To illustrate the practical utility of LDA models of biomedical corpora, a trained CGC LDA model was used for a retrospective study of nematode genes known to be associated with life span modification. Corpus-, document-, and word-level LDA parameters were combined with terms from the Gene Ontology to enhance the explanatory value of the CGC LDA model, and to suggest additional candidates for age-related genes. A novel, pairwise document similarity measure based on the posterior distribution on the topic simplex was formulated and used to search the CGC database for "homologs" of a "query" document discussing the life span-modifying clk-2 gene. Inspection of these document homologs enabled and facilitated the production of hypotheses about the function and role of clk-2. CONCLUSION: Like other graphical models for genetic, genomic and other types of biological data, LDA provides a method for extracting unanticipated insights and generating predictions amenable to subsequent experimental validation.


Asunto(s)
Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Bases de Datos Bibliográficas , Almacenamiento y Recuperación de la Información , Longevidad/genética , Modelos Estadísticos , Proteínas de Unión a Telómeros/genética , Animales , Teorema de Bayes , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas , Terminología como Asunto , Vocabulario Controlado
7.
BMC Bioinformatics ; 7: 147, 2006 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-16542449

RESUMEN

BACKGROUND: Ensemble attribute profile clustering is a novel, text-based strategy for analyzing a user-defined list of genes and/or proteins. The strategy exploits annotation data present in gene-centered corpora and utilizes ideas from statistical information retrieval to discover and characterize properties shared by subsets of the list. The practical utility of this method is demonstrated by employing it in a retrospective study of two non-overlapping sets of genes defined by a published investigation as markers for normal human breast luminal epithelial cells and myoepithelial cells. RESULTS: Each genetic locus was characterized using a finite set of biological properties and represented as a vector of features indicating attributes associated with the locus (a gene attribute profile). In this study, the vector space models for a pre-defined list of genes were constructed from the Gene Ontology (GO) terms and the Conserved Domain Database (CDD) protein domain terms assigned to the loci by the gene-centered corpus LocusLink. This data set of GO- and CDD-based gene attribute profiles, vectors of binary random variables, was used to estimate multiple finite mixture models and each ensuing model utilized to partition the profiles into clusters. The resultant partitionings were combined using a unanimous voting scheme to produce consensus clusters, sets of profiles that co-occurred consistently in the same cluster. Attributes that were important in defining the genes assigned to a consensus cluster were identified. The clusters and their attributes were inspected to ascertain the GO and CDD terms most associated with subsets of genes and in conjunction with external knowledge such as chromosomal location, used to gain functional insights into human breast biology. The 52 luminal epithelial cell markers and 89 myoepithelial cell markers are disjoint sets of genes. Ensemble attribute profile clustering-based analysis indicated that both lists contained groups of genes with the functional properties of membrane receptor biology/signal transduction and nucleic acid binding/transcription. A subset of the luminal markers was associated with metabolic and oxidoreductase activities, whereas a subset of myoepithelial markers was associated with protein hydrolase activity. CONCLUSION: Given a set of genes and/or proteins associated with a phenomenon, process or system of interest, ensemble attribute profile clustering provides a simple method for collating and sythesizing the annotation data pertaining to them that are present in text-based, gene-centered corpora. The results provide information about properties common and unique to subsets of the list and hence insights into the biology of the problem under investigation.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis de Secuencia de ADN/métodos , Alineación de Secuencia/métodos
8.
Nucleic Acids Res ; 29(8): 1772-80, 2001 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-11292850

RESUMEN

Yeast co-expressing rat APOBEC-1 and a fragment of human apolipoprotein B (apoB) mRNA assembled functional editosomes and deaminated C6666 to U in a mooring sequence-dependent fashion. The occurrence of APOBEC-1-complementing proteins suggested a naturally occurring mRNA editing mechanism in yeast. Previously, a hidden Markov model identified seven yeast genes encoding proteins possessing putative zinc-dependent deaminase motifs. Here, only CDD1, a cytidine deaminase, is shown to have the capacity to carry out C-->U editing on a reporter mRNA. This is only the second report of a cytidine deaminase that can use mRNA as a substrate. CDD1-dependent editing was growth phase regulated and demonstrated mooring sequence-dependent editing activity. Candidate yeast mRNA substrates were identified based on their homology with the mooring sequence-containing tripartite motif at the editing site of apoB mRNA and their ability to be edited by ectopically expressed APOBEC-1. Naturally occurring yeast mRNAs edited to a significant extent by CDD1 were, however, not detected. We propose that CDD1 be designated an orphan C-->U editase until its native RNA substrate, if any, can be identified and that it be added to the CDAR (cytidine deaminase acting on RNA) family of editing enzymes.


Asunto(s)
Citidina Desaminasa/metabolismo , Edición de ARN , Levaduras/enzimología , Desaminasas APOBEC-1 , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Western Blotting , Citidina Desaminasa/análisis , Citidina Desaminasa/química , Citidina Desaminasa/genética , Técnica del Anticuerpo Fluorescente , Prueba de Complementación Genética , Humanos , Cinética , Cadenas de Markov , Datos de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Estructura Terciaria de Proteína , Edición de ARN/genética , ARN de Hongos/genética , ARN de Hongos/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Ratas , Proteínas Recombinantes de Fusión/análisis , Proteínas Recombinantes de Fusión/química , Proteínas Recombinantes de Fusión/metabolismo , Alineación de Secuencia , Levaduras/genética
9.
Biochim Biophys Acta ; 870(1): 177-9, 1986 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-3947647

RESUMEN

Modified quantum mechanical calculations predict the binding enthalpies of saccharide inhibitors of lysozyme to within a few kilocalories of experimental measurements.


Asunto(s)
Muramidasa/antagonistas & inhibidores , Acetilglucosamina/farmacología , Sitios de Unión , Termodinámica
10.
Mech Ageing Dev ; 126(1): 193-208, 2005 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-15610779

RESUMEN

The diverse nature of cancer- and aging-related genes presents a challenge for large-scale studies based on molecular sequence and profiling data. An underexplored source of data for modeling and analysis is the textual descriptions and annotations present in curated gene-centered biomedical corpora. Here, 450 genes designated by surveys of the scientific literature as being associated with cancer and aging were analyzed using two complementary approaches. The first, ensemble attribute profile clustering, is a recently formulated, text-based, semi-automated data interpretation strategy that exploits ideas from statistical information retrieval to discover and characterize groups of genes with common structural and functional properties. Groups of genes with shared and unique Gene Ontology terms and protein domains were defined and examined. Human homologs of a group of known Drosphila aging-related genes are candidates for genes that may influence lifespan (hep/MAPK2K7, bsk/MAPK8, puc/LOC285193). These JNK pathway-associated proteins may specify a molecular hub that coordinates and integrates multiple intra- and extracellular processes via space- and time-dependent interactions with proteins in other pathways. The second approach, a qualitative examination of the chromosomal locations of 311 human cancer- and aging-related genes, provides anecdotal evidence for a "phenotype position effect": genes that are proximal in the linear genome often encode proteins involved in the same phenomenon. Comparative genomics was employed to enhance understanding of several genes, including open reading frames, identified as new candidates for genes with roles in aging or cancer. Overall, the results highlight fundamental molecular and mechanistic connections between progenitor/stem cell lineage determination, embryonic morphogenesis, cancer, and aging. Despite diversity in the nature of the molecular and cellular processes associated with these phenomena, they seem related to the architectural hub of tissue polarity and a need to generate and control this property in a timely manner.


Asunto(s)
Envejecimiento/genética , Algoritmos , Bases de Datos Genéticas , Genes , Neoplasias/genética , Proteínas/genética , Biología Computacional/métodos
11.
J Mol Biol ; 284(1): 71-84, 1998 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-9811543

RESUMEN

A new method was used to probe the conformation of chromatin in living mammalian cells. The method employs ionizing radiation and is based on the concept that such radiation induces correlated breaks in DNA strands that are in spatial proximity. Human dermal fibroblasts in G0 phase of the cell cycle and Chinese hamster ovary cells in mitosis were irradiated by X-rays or accelerated ions. Following lysis of the cells, DNA fragments induced by correlated breaks were end-labeled and separated according to size on denaturing polyacrylamide gels. A characteristic peak was obtained for a fragment size of 78 bases, which is the size that corresponds to one turn of DNA around the nucleosome. Additional peaks between 175 and 450 bases reflect the relative position of nearest-neighbor nucleosomes. Theoretical calculations that simulate the indirect and direct effect of radiation on DNA demonstrate that the fragment size distributions are closely related to the chromatin structure model used. Comparison of the experimental data with theoretical results support a zig-zag model of the chromatin fiber rather than a simple helical model. Thus, radiation-induced damage analysis can provide information on chromatin structure in the living cell.


Asunto(s)
Cromatina/química , ADN de Cadena Simple/análisis , Modelos Biológicos , Biología Molecular/métodos , Animales , Células CHO/efectos de la radiación , Cromatina/efectos de la radiación , Simulación por Computador , Cricetinae , Daño del ADN/genética , Daño del ADN/efectos de la radiación , ADN de Cadena Simple/química , ADN de Cadena Simple/metabolismo , Fibroblastos/efectos de la radiación , Humanos , Mitosis , Modelos Moleculares , Nucleosomas/química , Nucleosomas/metabolismo
12.
J Mol Biol ; 217(1): 133-51, 1991 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-1988675

RESUMEN

Do antibody combining sites possess general properties that enable them to bind different antigens with varying affinities and to bind novel antigens? Here, we address this question by examining the physical and chemical characteristics most favourable for residues involved in antigen accommodation and binding. Amphipathic amino acids could readily tolerate the change of environment from hydrophilic to hydrophobic that occurs upon antibody-antigen complex formation. Residues that are large and can participate in a wide variety of van der Waals' and electrostatic interactions would permit binding to a range of antigens. Amino acids with flexible side-chains could generate a structurally plastic region, i.e. a binding site possessing the ability to mould itself around the antigen to improve complementarity of the interacting surfaces. Hence, antibodies could bind to an array of novel antigens using a limited set of residues interspersed with more unique residues to which greater binding specificity can be attributed. An individual antibody molecule could thus be cross-reactive and have the capacity to bind structurally similar ligands. The accommodation of variations in antigenic structure by modest combining site flexibility could make an important contribution to immune defence by allowing antibody binding to distinct but closely related pathogens. Tyr and Trp most readily fulfil these catholic physicochemical requirements and thus would be expected to be common in combining sites on theoretical grounds. Experimental support for this comes from three sources, (1) the high frequency of participation by these amino acids in the antigen binding observed in six crystallographically determined antibody-antigen complexes, (2) their frequent occurrence in the putative binding regions of antibodies as determined from structural and sequence data and (3) the potential for movement of their side-chains in known antibody binding sites and model systems. The six bound antigens comprise two small different haptens, non-overlapping regions of the same large protein and a 19 amino acid residue peptide. Out of a total of 85 complementarity determining region positions, only 37 locations (plus 3 framework) are directly involved in antigen interaction. Of these, light chain residue 91 is utilized by all the complexes examined, whilst light chain 32, light chain 96 and heavy chain 33 are employed by five out of the six. The binding sites in known antibody-antigen complexes as well as the postulated combining sites in free Fab fragments show similar characteristics with regard to the types of amino acids present. The possible role of other amino acids is also assessed.(ABSTRACT TRUNCATED AT 400 WORDS)


Asunto(s)
Complejo Antígeno-Anticuerpo , Sitios de Unión de Anticuerpos , Aminoácidos/química , Aminoácidos/metabolismo , Afinidad de Anticuerpos , Diversidad de Anticuerpos , Especificidad de Anticuerpos , Enlace de Hidrógeno , Fragmentos Fab de Inmunoglobulinas/química , Fragmentos Fab de Inmunoglobulinas/metabolismo , Modelos Moleculares , Conformación Proteica
13.
J Mol Biol ; 235(5): 1501-31, 1994 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-8107089

RESUMEN

Hidden Markov Models (HMMs) are applied to the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated on the globin family, the protein kinase catalytic domain, and the EF-hand calcium binding motif. In each case the parameters of an HMM are estimated from a training set of unaligned sequences. After the HMM is built, it is used to obtain a multiple alignment of all the training sequences. It is also used to search the SWISS-PROT 22 database for other sequences that are members of the given protein family, or contain the given domain. The HMM produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate three-dimensional structural information. When employed in discrimination tests (by examining how closely the sequences in a database fit the globin, kinase and EF-hand HMMs), the HMM is able to distinguish members of these families from non-members with a high degree of accuracy. Both the HMM and PROFILESEARCH (a technique used to search for relationships between a protein sequence and multiply aligned sequences) perform better in these tests than PROSITE (a dictionary of sites and patterns in proteins). The HMM appears to have a slight advantage over PROFILESEARCH in terms of lower rates of false negatives and false positives, even though the HMM is trained using only unaligned sequences, whereas PROFILESEARCH requires aligned training sequences. Our results suggest the presence of an EF-hand calcium binding motif in a highly conserved and evolutionary preserved putative intracellular region of 155 residues in the alpha-1 subunit of L-type calcium channels which play an important role in excitation-contraction coupling. This region has been suggested to contain the functional domains that are typical or essential for all L-type calcium channels regardless of whether they couple to ryanodine receptors, conduct ions or both.


Asunto(s)
Secuencia de Aminoácidos , Globinas/química , Cadenas de Markov , Proteínas Quinasas/química , Proteínas/química , Algoritmos , Animales , Sitios de Unión , Calcio/metabolismo , Globinas/metabolismo , Humanos , Datos de Secuencia Molecular , Homología de Secuencia de Aminoácido
14.
Physiol Genomics ; 5(2): 99-111, 2001 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-11242594

RESUMEN

Transcription profiling experiments permit the expression levels of many genes to be measured simultaneously. Given profiling data from two types of samples, genes that most distinguish the samples (marker genes) are good candidates for subsequent in-depth experimental studies and developing decision support systems for diagnosis, prognosis, and monitoring. This work proposes a mixture of feature relevance experts as a method for identifying marker genes and illustrates the idea using published data from samples labeled as acute lymphoblastic and myeloid leukemia (ALL, AML). A feature relevance expert implements an algorithm that calculates how well a gene distinguishes samples, reorders genes according to this relevance measure, and uses a supervised learning method [here, support vector machines (SVMs)] to determine the generalization performances of different nested gene subsets. The mixture of three feature relevance experts examined implement two existing and one novel feature relevance measures. For each expert, a gene subset consisting of the top 50 genes distinguished ALL from AML samples as completely as all 7,070 genes. The 125 genes at the union of the top 50s are plausible markers for a prototype decision support system. Chromosomal aberration and other data support the prediction that the three genes at the intersection of the top 50s, cystatin C, azurocidin, and adipsin, are good targets for investigating the basic biology of ALL/AML. The same data were employed to identify markers that distinguish samples based on their labels of T cell/B cell, peripheral blood/bone marrow, and male/female. Selenoprotein W may discriminate T cells from B cells. Results from analysis of transcription profiling data from tumor/nontumor colon adenocarcinoma samples support the general utility of the aforementioned approach. Theoretical issues such as choosing SVM kernels and their parameters, training and evaluating feature relevance experts, and the impact of potentially mislabeled samples on marker identification (feature selection) are discussed.


Asunto(s)
Biomarcadores de Tumor/genética , Perfilación de la Expresión Génica , Leucemia Mieloide/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Transcripción Genética/genética , Enfermedad Aguda , Adenocarcinoma/diagnóstico , Adenocarcinoma/genética , Algoritmos , Linfocitos B/metabolismo , Teorema de Bayes , Células de la Médula Ósea/metabolismo , Niño , Aberraciones Cromosómicas/genética , Biología Computacional/métodos , Interpretación Estadística de Datos , Femenino , Regulación Neoplásica de la Expresión Génica , Marcadores Genéticos/genética , Humanos , Leucemia Mieloide/diagnóstico , Masculino , Especificidad de Órganos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , ARN Neoplásico/análisis , ARN Neoplásico/genética , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Caracteres Sexuales , Linfocitos T/metabolismo
15.
Physiol Genomics ; 4(2): 109-126, 2000 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-11120872

RESUMEN

A modular framework is proposed for modeling and understanding the relationships between molecular profile data and other domain knowledge using a combination of generative (here, graphical models) and discriminative [Support Vector Machines (SVMs)] methods. As illustration, naive Bayes models, simple graphical models, and SVMs were applied to published transcription profile data for 1,988 genes in 62 colon adenocarcinoma tissue specimens labeled as tumor or nontumor. These unsupervised and supervised learning methods identified three classes or subtypes of specimens, assigned tumor or nontumor labels to new specimens and detected six potentially mislabeled specimens. The probability parameters of the three classes were utilized to develop a novel gene relevance, ranking, and selection method. SVMs trained to discriminate nontumor from tumor specimens using only the 50-200 top-ranked genes had the same or better generalization performance than the full repertoire of 1,988 genes. Approximately 90 marker genes were pinpointed for use in understanding the basic biology of colon adenocarcinoma, defining targets for therapeutic intervention and developing diagnostic tools. These potential markers highlight the importance of tissue biology in the etiology of cancer. Comparative analysis of molecular profile data is proposed as a mechanism for predicting the physiological function of genes in instances when comparative sequence analysis proves uninformative, such as with human and yeast translationally controlled tumour protein. Graphical models and SVMs hold promise as the foundations for developing decision support systems for diagnosis, prognosis, and monitoring as well as inferring biological networks.


Asunto(s)
Perfilación de la Expresión Génica , Genes/genética , Teorema de Bayes , Humanos , Modelos Genéticos , Neoplasias/genética
16.
Physiol Genomics ; 4(2): 127-135, 2000 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-11120873

RESUMEN

A novel suite of analytical techniques and visualization tools are applied to 78 published transcription profiling experiments monitoring 5,687 Saccharomyces cerevisiae genes in studies examining cell cycle, responses to stress, and diauxic shift. A naive Bayes model discovered and characterized 45 classes of gene profile vectors. An enrichment measure quantified the association between these classes and specific external knowledge defined by four sets of categories to which genes can be assigned: 106 protein functions, 5 stages of the cell cycle, 265 transcription factors, and 16 chromosomal locations. Many of the 38 genes in class 42 are known to play roles in copper and iron homeostasis. The 17 uncharacterized open reading frames in this class may be involved in similar homeostatic processes; human homologs of two of them could be associated with as yet undefined disease states arising from aberrant metal ion regulation. The Met4, Met31, and Met32 transcription factors may play a role in coregulating genes involved in copper and iron metabolism. Extensions of the simple graphical model used for clustering to learning more complex models of genetic networks are discussed.


Asunto(s)
Cobre/metabolismo , Hierro/metabolismo , Saccharomyces cerevisiae/genética , Teorema de Bayes , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Genes Fúngicos/genética , Homeostasis , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Saccharomyces cerevisiae/metabolismo
17.
Mech Ageing Dev ; 124(1): 109-14, 2003 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-12618013

RESUMEN

Transcript profiling can be used to elucidate the molecular and cellular mechanisms involved in ageing and cancer. A recent study of human gastrointestinal stromal tumours (GISTs) with mutations in the KIT gene, Cancer Res. 61 (2001) 8624 exemplifies a common type of investigation. cDNA microarrays were used to generate measurements for 1987 clones in two types of tissues: 13 KIT mutation-positive GISTs and 6 spindle cell tumours from locations outside the gastrointestinal tract. Statistical problems associated with such two-class, high-dimensional profiling data include simultaneous classification and relevant feature identification, probabilistic clustering and protein sequence family modelling. Here, the GIST data were reexamined using specific solutions to these problems, namely sparse hyperplanes, nai;ve Bayes models and profile hidden Markov models respectively. The integrated analysis of molecular profiling and sequence data highlighted 6 clones that may be of clinical and experimental interest. The protein encoded by one of these putative biomarkers defined a novel protein family present in diverse eucarya. The family may be involved in chromosome segregation and/or stability. One family member is a potential biomarker identified recently from a retrospective analysis of transcript profiles for sporadic breast cancer samples from patients with poor and good prognosis, Signal Process. (in press).


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de Secuencia de Proteína/estadística & datos numéricos , Secuencia de Aminoácidos , Animales , Teorema de Bayes , Carcinoma/genética , Análisis por Conglomerados , Interpretación Estadística de Datos , Neoplasias Gastrointestinales/genética , Humanos , Cadenas de Markov , Modelos Estadísticos , Datos de Secuencia Molecular , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Proteínas Proto-Oncogénicas c-kit/genética , Homología de Secuencia de Aminoácido , Transcripción Genética
18.
J Comput Biol ; 7(6): 849-62, 2000.
Artículo en Inglés | MEDLINE | ID: mdl-11382366

RESUMEN

This work addresses the issues of data representation and incorporation of domain knowledge into the design of learning systems for reasoning about protein families. Given the limited expressive capacity of a particular method, a mixture of protein annotation and fold recognition experts, each implementing a different underlying representation, should provide a robust method for assigning sequences to families. These ideas are illustrated using two data-driven learning methods that make use of different prior information and employ independent, yet complementary, projections of a family: hidden Markov models (HMMs) based on a multiple sequence alignment and neural networks (NNs) based on global sequence descriptors of proteins. Examination of seven protein families indicates that combining a generative (HMM) and a discriminative (NN) method is better than either method on its own. Biologically, human 4-hydroxyphenylpyruvic acid dioxygenase, involved in tyrosinemia type 3, is predicted to be structurally and functionally related to the glyoxalase I family.


Asunto(s)
Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Secuencia de Aminoácidos , Cadenas de Markov , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , Proteínas/metabolismo , Alineación de Secuencia/métodos
19.
J Comput Biol ; 5(1): 57-72, 1998.
Artículo en Inglés | MEDLINE | ID: mdl-9541871

RESUMEN

Deamination reactions are catalyzed by a variety of enzymes including those involved in nucleoside/nucleotide metabolism and cytosine to uracil (C-->U) and adenosine to inosine (A-->I) mRNA editing. The active site of the deaminase (DM) domain in these enzymes contains a conserved histidine (or rarely cysteine), two cysteines and a glutamate proposed to act as a proton shuttle during deamination. Here, a statistical model, a hidden Markov model (HMM), of the DM domain has been created which identifies currently known DM domains and suggests new DM domains in viral, bacterial and eucaryotic proteins. However, no DM domains were identified in the currently predicted proteins from the archaeon Methanococcus jannaschii and possible causes for, and a potential means to ameliorate this situation are discussed. In some of the newly identified DM domains, the glutamate is changed to a residue that could not function as a proton shuttle and in one instance (Mus musculus spermatid protein TENR) the cysteines are also changed to lysine and serine. These may be non-competent DM domains able to bind but not act upon their substrate. Phylogenetic analysis using an HMM-generated alignment of DM domains reveals three branches with clear substructure in each branch. The results suggest DM domains that are candidates for yeast, platyhelminth, plant and mammalian C-->U and A-->I mRNA editing enzymes. Some bacterial and eucaryotic DM domains form distinct branches in the phylogenetic tree suggesting the existence of common, novel substrates.


Asunto(s)
Escherichia coli/enzimología , Nucleósido Desaminasas/química , Secuencia de Aminoácidos , Animales , Proteínas Bacterianas/química , Sitios de Unión/genética , Citidina Desaminasa/química , Citidina Desaminasa/genética , Bases de Datos Factuales , Desaminación , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Cadenas de Markov , Modelos Moleculares , Datos de Secuencia Molecular , Nucleósido Desaminasas/genética , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/genética , Alineación de Secuencia
20.
J Comput Biol ; 4(2): 193-214, 1997.
Artículo en Inglés | MEDLINE | ID: mdl-9228618

RESUMEN

Inteins, introns spliced at the protein level, and the hedgehog family of proteins involved in eucaryotic development both undergo autocatalytic proteolysis. Here, a specific and sensitive hidden Markov model (HMM) of protein splicing domain shared by inteins and the hedgehog proteins has been trained and employed for further analysis. The HMM characterizes the common features of this domain including the position where a site-specific DNA endonuclease domain is inserted in the majority of the inteins. The HMM was used to identify several new putative inteins, such as that in the Methanococcus jannaschii klbA protein, and to generate a multiple sequence alignment of sequences possessing this domain. Phylogenetic analysis suggests that hedgehog proteins evolved from inteins. Secondary and tertiary structure predictions suggest that the domain has a structure similar to a beta-sandwich. Similarities between the serine protease cleavage mechanism and the protein splicing reaction mechanism are discussed. Examination of the locations of inteins indicates that they are not inserted randomly in an extein, but are often inserted at functionally important positions in the host proteins. A specific and sensitive HMM for a domain present in klbA proteins identified several additional bacterial and archaeal family members, and analysis of the site of insertion of the intein suggests residues that may be functionally important. This domain may play a role in formation of surface-associated protein complexes.


Asunto(s)
Algoritmos , Proteínas de Drosophila , Proteínas de Insectos/química , Filogenia , Empalme de Proteína , Proteínas/química , Proteínas/fisiología , Secuencia de Aminoácidos , Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Sitios de Unión , Bases de Datos Factuales , Proteínas Hedgehog , Proteínas de Insectos/fisiología , Intrones , Cadenas de Markov , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA