Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Genome Biol Evol ; 6(10): 2721-30, 2014 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-25260584

RESUMEN

Prototype galectins, endogenously expressed animal lectins with a single carbohydrate recognition domain, are well-known regulators of tissue properties such as growth and adhesion. The earliest discovered and best studied of the prototype galectins is Galectin-1 (Gal-1). In the Gallus gallus (chicken) genome, Gal-1 is represented by two homologs: Gal-1A and Gal-1B, with distinct biochemical properties, tissue expression, and developmental functions. We investigated the origin of the Gal-1A/Gal-1B divergence to gain insight into when their developmental functions originated and how they could have contributed to vertebrate phenotypic evolution. Sequence alignment and phylogenetic tree construction showed that the Gal-1A/Gal-1B divergence can be traced back to the origin of the sauropsid lineage (consisting of extinct and extant reptiles and birds) although lineage-specific duplications also occurred in the amphibian and actinopterygian genomes. Gene synteny analysis showed that sauropsid gal-1b (the gene for Gal-1B) and its frog and actinopterygian gal-1 homologs share a similar chromosomal location, whereas sauropsid gal-1a has translocated to a new position. Surprisingly, we found that chicken Gal-1A, encoded by the translocated gal-1a, was more similar in its tertiary folding pattern than Gal-1B, encoded by the untranslocated gal-1b, to experimentally determined and predicted folds of nonsauropsid Gal-1s. This inference is consistent with our finding of a lower proportion of conserved residues in sauropsid Gal-1Bs, and evidence for positive selection of sauropsid gal-1b, but not gal-1a genes. We propose that the duplication and structural divergence of Gal-1B away from Gal-1A led to specialization in both expression and function in the sauropsid lineage.


Asunto(s)
Galectinas/química , Vertebrados/clasificación , Animales , Galectinas/genética , Filogenia , Estructura Secundaria de Proteína , Vertebrados/genética
2.
Integr Biol (Camb) ; 3(4): 350-67, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21424025

RESUMEN

In this Perspective, we propose that communication theory--a field of mathematics concerned with the problems of signal transmission, reception and processing--provides a new quantitative lens for investigating multicellular biology, ancient and modern. What underpins the cohesive organisation and collective behaviour of multicellular ecosystems such as microbial colonies and communities (microbiomes) and multicellular organisms such as plants and animals, whether built of simple tissue layers (sponges) or of complex differentiated cells arranged in tissues and organs (members of the 35 or so phyla of the subkingdom Metazoa)? How do mammalian tissues and organs develop, maintain their architecture, become subverted in disease, and decline with age? How did single-celled organisms coalesce to produce many-celled forms that evolved and diversified into the varied multicellular organisms in existence today? Some answers can be found in the blueprints or recipes encoded in (epi)genomes, yet others lie in the generic physical properties of biological matter such as the ability of cell aggregates to attain a certain complexity in size, shape, and pattern. We suggest that Lasswell's maxim "Who says what to whom in what channel with what effect" provides a foundation for understanding not only the emergence and evolution of multicellularity, but also the assembly and sculpting of multicellular ecosystems and many-celled structures, whether of natural or human-engineered origin. We explore how the abstraction of communication theory as an organising principle for multicellular biology could be realised. We highlight the inherent ability of communication theory to be blind to molecular and/or genetic mechanisms. We describe selected applications that analyse the physics of communication and use energy efficiency as a central tenet. Whilst communication theory has and could contribute to understanding a myriad of problems in biology, investigations of multicellular biology could, in turn, lead to advances in communication theory, especially in the still immature field of network information theory.


Asunto(s)
Evolución Biológica , Comunicación Celular/fisiología , Teoría de la Información , Envejecimiento/fisiología , Algoritmos , Animales , Tipificación del Cuerpo/fisiología , Quimiotaxis/fisiología , Cromosomas/fisiología , Dictyosteliida/fisiología , Femenino , Código Genético/fisiología , Fenómenos Genéticos/fisiología , Crecimiento y Desarrollo/fisiología , Humanos , Glándulas Mamarias Animales/crecimiento & desarrollo , Feromonas/metabolismo , Polisacáridos/fisiología , Percepción de Quorum/fisiología , Saccharomyces cerevisiae/fisiología , Transducción de Señal/fisiología , Huso Acromático/fisiología
3.
BMC Bioinformatics ; 7: 250, 2006 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-16681860

RESUMEN

BACKGROUND: The statistical modeling of biomedical corpora could yield integrated, coarse-to-fine views of biological phenomena that complement discoveries made from analysis of molecular sequence and profiling data. Here, the potential of such modeling is demonstrated by examining the 5,225 free-text items in the Caenorhabditis Genetic Center (CGC) Bibliography using techniques from statistical information retrieval. Items in the CGC biomedical text corpus were modeled using the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian model which represents a document as a random mixture over latent topics; each topic is characterized by a distribution over words. RESULTS: An LDA model estimated from CGC items had better predictive performance than two standard models (unigram and mixture of unigrams) trained using the same data. To illustrate the practical utility of LDA models of biomedical corpora, a trained CGC LDA model was used for a retrospective study of nematode genes known to be associated with life span modification. Corpus-, document-, and word-level LDA parameters were combined with terms from the Gene Ontology to enhance the explanatory value of the CGC LDA model, and to suggest additional candidates for age-related genes. A novel, pairwise document similarity measure based on the posterior distribution on the topic simplex was formulated and used to search the CGC database for "homologs" of a "query" document discussing the life span-modifying clk-2 gene. Inspection of these document homologs enabled and facilitated the production of hypotheses about the function and role of clk-2. CONCLUSION: Like other graphical models for genetic, genomic and other types of biological data, LDA provides a method for extracting unanticipated insights and generating predictions amenable to subsequent experimental validation.


Asunto(s)
Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Bases de Datos Bibliográficas , Almacenamiento y Recuperación de la Información , Longevidad/genética , Modelos Estadísticos , Proteínas de Unión a Telómeros/genética , Animales , Teorema de Bayes , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas , Terminología como Asunto , Vocabulario Controlado
4.
BMC Bioinformatics ; 7: 147, 2006 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-16542449

RESUMEN

BACKGROUND: Ensemble attribute profile clustering is a novel, text-based strategy for analyzing a user-defined list of genes and/or proteins. The strategy exploits annotation data present in gene-centered corpora and utilizes ideas from statistical information retrieval to discover and characterize properties shared by subsets of the list. The practical utility of this method is demonstrated by employing it in a retrospective study of two non-overlapping sets of genes defined by a published investigation as markers for normal human breast luminal epithelial cells and myoepithelial cells. RESULTS: Each genetic locus was characterized using a finite set of biological properties and represented as a vector of features indicating attributes associated with the locus (a gene attribute profile). In this study, the vector space models for a pre-defined list of genes were constructed from the Gene Ontology (GO) terms and the Conserved Domain Database (CDD) protein domain terms assigned to the loci by the gene-centered corpus LocusLink. This data set of GO- and CDD-based gene attribute profiles, vectors of binary random variables, was used to estimate multiple finite mixture models and each ensuing model utilized to partition the profiles into clusters. The resultant partitionings were combined using a unanimous voting scheme to produce consensus clusters, sets of profiles that co-occurred consistently in the same cluster. Attributes that were important in defining the genes assigned to a consensus cluster were identified. The clusters and their attributes were inspected to ascertain the GO and CDD terms most associated with subsets of genes and in conjunction with external knowledge such as chromosomal location, used to gain functional insights into human breast biology. The 52 luminal epithelial cell markers and 89 myoepithelial cell markers are disjoint sets of genes. Ensemble attribute profile clustering-based analysis indicated that both lists contained groups of genes with the functional properties of membrane receptor biology/signal transduction and nucleic acid binding/transcription. A subset of the luminal markers was associated with metabolic and oxidoreductase activities, whereas a subset of myoepithelial markers was associated with protein hydrolase activity. CONCLUSION: Given a set of genes and/or proteins associated with a phenomenon, process or system of interest, ensemble attribute profile clustering provides a simple method for collating and sythesizing the annotation data pertaining to them that are present in text-based, gene-centered corpora. The results provide information about properties common and unique to subsets of the list and hence insights into the biology of the problem under investigation.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis de Secuencia de ADN/métodos , Alineación de Secuencia/métodos
5.
Mech Ageing Dev ; 126(1): 193-208, 2005 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-15610779

RESUMEN

The diverse nature of cancer- and aging-related genes presents a challenge for large-scale studies based on molecular sequence and profiling data. An underexplored source of data for modeling and analysis is the textual descriptions and annotations present in curated gene-centered biomedical corpora. Here, 450 genes designated by surveys of the scientific literature as being associated with cancer and aging were analyzed using two complementary approaches. The first, ensemble attribute profile clustering, is a recently formulated, text-based, semi-automated data interpretation strategy that exploits ideas from statistical information retrieval to discover and characterize groups of genes with common structural and functional properties. Groups of genes with shared and unique Gene Ontology terms and protein domains were defined and examined. Human homologs of a group of known Drosphila aging-related genes are candidates for genes that may influence lifespan (hep/MAPK2K7, bsk/MAPK8, puc/LOC285193). These JNK pathway-associated proteins may specify a molecular hub that coordinates and integrates multiple intra- and extracellular processes via space- and time-dependent interactions with proteins in other pathways. The second approach, a qualitative examination of the chromosomal locations of 311 human cancer- and aging-related genes, provides anecdotal evidence for a "phenotype position effect": genes that are proximal in the linear genome often encode proteins involved in the same phenomenon. Comparative genomics was employed to enhance understanding of several genes, including open reading frames, identified as new candidates for genes with roles in aging or cancer. Overall, the results highlight fundamental molecular and mechanistic connections between progenitor/stem cell lineage determination, embryonic morphogenesis, cancer, and aging. Despite diversity in the nature of the molecular and cellular processes associated with these phenomena, they seem related to the architectural hub of tissue polarity and a need to generate and control this property in a timely manner.


Asunto(s)
Envejecimiento/genética , Algoritmos , Bases de Datos Genéticas , Genes , Neoplasias/genética , Proteínas/genética , Biología Computacional/métodos
6.
J Comput Biol ; 11(6): 1073-89, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-15662199

RESUMEN

Molecular profiling studies can generate abundance measurements for thousands of transcripts, proteins, metabolites, or other species in, for example, normal and tumor tissue samples. Treating such measurements as features and the samples as labeled data points, sparse hyperplanes provide a statistical methodology for classifying data points into one of two categories (classification and prediction) and defining a small subset of discriminatory features (relevant feature identification). However, this and other extant classification methods address only implicitly the issue of observed data being a combination of underlying signals and noise. Recently, robust optimization has emerged as a powerful framework for handling uncertain data explicitly. Here, ideas from this field are exploited to develop robust sparse hyperplanes, i.e., classification and relevant feature identification algorithms that are resilient to variation in the data. Specifically, each data point is associated with an explicit data uncertainty model in the form of an ellipsoid parameterized by a center and covariance matrix. The task of learning a robust sparse hyperplane from such data is formulated as a second order cone program (SOCP). Gaussian and distribution-free data uncertainty models are shown to yield SOCPs that are equivalent to the SCOP based on ellipsoidal uncertainty. The real-world utility of robust sparse hyperplanes is demonstrated via retrospective analysis of breast cancer related transcript profiles. Data-dependent heuristics are used to compute the parameters of each ellipsoidal data uncertainty model. The generalization performance of a specific implementation, designated "robust LIKNON," is better than its nominal counterpart. Finally, the strengths and limitations of robust sparse hyperplanes are discussed.


Asunto(s)
Biología Computacional , Análisis de Secuencia de ADN/estadística & datos numéricos , Análisis de Secuencia de Proteína/estadística & datos numéricos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Interpretación Estadística de Datos , Femenino , Genes BRCA1 , Genes BRCA2 , Humanos
7.
Mech Ageing Dev ; 124(1): 109-14, 2003 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-12618013

RESUMEN

Transcript profiling can be used to elucidate the molecular and cellular mechanisms involved in ageing and cancer. A recent study of human gastrointestinal stromal tumours (GISTs) with mutations in the KIT gene, Cancer Res. 61 (2001) 8624 exemplifies a common type of investigation. cDNA microarrays were used to generate measurements for 1987 clones in two types of tissues: 13 KIT mutation-positive GISTs and 6 spindle cell tumours from locations outside the gastrointestinal tract. Statistical problems associated with such two-class, high-dimensional profiling data include simultaneous classification and relevant feature identification, probabilistic clustering and protein sequence family modelling. Here, the GIST data were reexamined using specific solutions to these problems, namely sparse hyperplanes, nai;ve Bayes models and profile hidden Markov models respectively. The integrated analysis of molecular profiling and sequence data highlighted 6 clones that may be of clinical and experimental interest. The protein encoded by one of these putative biomarkers defined a novel protein family present in diverse eucarya. The family may be involved in chromosome segregation and/or stability. One family member is a potential biomarker identified recently from a retrospective analysis of transcript profiles for sporadic breast cancer samples from patients with poor and good prognosis, Signal Process. (in press).


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de Secuencia de Proteína/estadística & datos numéricos , Secuencia de Aminoácidos , Animales , Teorema de Bayes , Carcinoma/genética , Análisis por Conglomerados , Interpretación Estadística de Datos , Neoplasias Gastrointestinales/genética , Humanos , Cadenas de Markov , Modelos Estadísticos , Datos de Secuencia Molecular , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Proteínas Proto-Oncogénicas c-kit/genética , Homología de Secuencia de Aminoácido , Transcripción Genética
8.
Radiat Res ; 158(5): 568-80, 2002 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-12385634

RESUMEN

We have developed a theoretical model for evaluating radiation-induced chromosomal exchanges by explicitly taking into account interphase (G(0)/G(1)) chromosome structure, nuclear organization of chromosomes, the production of double-strand breaks (DSBs), and the subsequent rejoinings in a faithful or unfaithful manner. Each of the 46 chromosomes for human lymphocytes (40 chromosomes for mouse lymphocytes) is modeled as a random polymer inside a spherical volume. The chromosome spheres are packed randomly inside a spherical nucleus with an allowed overlap controlled by a parameter Omega. The rejoining of DSBs is determined by a Monte Carlo procedure using a Gaussian proximity function with an interaction range parameter sigma. Values of Omega and sigma have been found which yield calculated results of interchromosomal aberration frequencies that agree with a wide range of experimental data. Our preferred solution is one with an interaction range of 0.5 microm coupled with a relatively small overlap parameter of 0.675 microm, which more or less confirms previous estimates. We have used our model with these parameter values and with resolution or detectability limits to calculate yields of translocations and dicentrics for human lymphocytes exposed to low-LET radiation that agree with experiments in the dose range 0.09 to 4 Gy. Five different experimental data sets have been compared with the theoretical results. Essentially all of the experimental data fall between theoretical curves corresponding to resolution limits of 1 Mbp and 20 Mbp, which may reflect the fact that different investigators use different limits for sensitivity or detectability. Translocation yields for mouse lymphocytes have also been calculated and are in good agreement with experimental data from 1 cGy to 10 cGy. There is also good agreement with recent data on complex aberrations. Our model is expected to be applicable to both low- and high-LET radiation, and we include a sample prediction of the yield of interchromosomal rejoining in the dose range 0.22 Gy to 2 Gy of 1000 MeV/nucleon iron particles. This dose range corresponds to average particle traversals per nucleus ranging from 1.0 to 9.12.


Asunto(s)
Aberraciones Cromosómicas/efectos de la radiación , Cromosomas de los Mamíferos/efectos de la radiación , Interfase/efectos de la radiación , Modelos Teóricos , Animales , Núcleo Celular/genética , Núcleo Celular/efectos de la radiación , Rotura Cromosómica , Daño del ADN/efectos de la radiación , Humanos , Linfocitos/metabolismo , Linfocitos/efectos de la radiación , Matemática , Ratones , Dosis de Radiación , Recombinación Genética/efectos de la radiación
9.
Curr Biol ; 11(21): 1706-10, 2001 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-11696330

RESUMEN

An important quest in modern biology is to identify genes involved in aging. Model organisms such as the nematode Caenorhabditis elegans are particularly useful in this regard. The C. elegans genome has been sequenced [1], and single gene mutations that extend adult life span have been identified [2]. Among these longevity-controlling loci are four apparently unrelated genes that belong to the clk family. In mammals, telomere length and structure can influence cellular, and possibly organismal, aging. Here, we show that clk-2 encodes a regulator of telomere length in C. elegans.


Asunto(s)
Envejecimiento/genética , Proteínas de Caenorhabditis elegans/genética , Genes de Helminto , Proteínas de Saccharomyces cerevisiae , Proteínas de Unión a Telómeros , Telómero/genética , Secuencia de Aminoácidos , Animales , Proteínas de Unión al ADN/genética , Datos de Secuencia Molecular , Mutación , ARN sin Sentido , ARN Interferente Pequeño , Tolerancia a Radiación , Homología de Secuencia de Aminoácido , Rayos X
10.
Mol Cell Biol ; 21(16): 5591-604, 2001 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-11463840

RESUMEN

SATB1 is expressed primarily in thymocytes and orchestrates temporal and spatial expression of a large number of genes in the T-cell lineage. SATB1 binds to the bases of chromatin loop domains in vivo, recognizing a special DNA context with strong base-unpairing propensity. The majority of thymocytes are eliminated by apoptosis due to selection processes in the thymus. We investigated the fate of SATB1 during thymocyte and T-cell apoptosis. Here we show that SATB1 is specifically cleaved by a caspase 6-like protease at amino acid position 254 to produce a 65-kDa major fragment containing both a base-unpairing region (BUR)-binding domain and a homeodomain. We found that this cleavage separates the DNA-binding domains from amino acids 90 to 204, a region which we show to be a dimerization domain. The resulting SATB1 monomer loses its BUR-binding activity, despite containing both its DNA-binding domains, and rapidly dissociates from chromatin in vivo. We found this dimerization region to have sequence similarity to PDZ domains, which have been previously shown to be involved in signaling by conferring protein-protein interactions. SATB1 cleavage during Jurkat T-cell apoptosis induced by an anti-Fas antibody occurs concomitantly with the high-molecular-weight fragmentation of chromatin of ~50-kb fragments. Our results suggest that mechanisms of nuclear degradation early in apoptotic T cells involve efficient removal of SATB1 by disrupting its dimerization and cleavage of genomic DNA into loop domains to ensure rapid and efficient disassembly of higher-order chromatin structure.


Asunto(s)
Apoptosis/fisiología , Caspasas/fisiología , Cromatina/fisiología , Proteínas de Unión al ADN/fisiología , Proteínas de Unión a la Región de Fijación a la Matriz , Linfocitos T/patología , Linfocitos T/fisiología , Secuencia de Aminoácidos , Caspasa 6 , Proteínas de Unión al ADN/química , Dimerización , Humanos , Células Jurkat , Datos de Secuencia Molecular , Especificidad por Sustrato
11.
Mol Cell ; 7(6): 1201-11, 2001 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-11430823

RESUMEN

The key protein subunit of the telomerase complex, known as TERT, possesses a reverse transcriptase (RT)-like domain that is conserved in enzymes encoded by retroviruses and retroelements. Structural and functional analysis of HIV-1 RT suggests that RT processivity is governed, in part, by the conserved motif C, motif E, and a C-terminal domain. Mutations in analogous regions of the yeast TERT were found to have anticipated effects on telomerase processivity in vitro, suggesting a great deal of mechanistic and structural similarity between TERT and retroviral RTs, and a similarity that goes beyond the homologous domain. A close correlation was uncovered between telomerase processivity and telomere length in vivo, suggesting that enzyme processivity is a limiting factor for telomere maintenance.


Asunto(s)
Transcriptasa Inversa del VIH/metabolismo , Telomerasa/metabolismo , Telómero/enzimología , Dominio Catalítico , Proteínas de Unión al ADN , Técnicas In Vitro , Datos de Secuencia Molecular , Mutagénesis , ARN , Homología de Secuencia de Aminoácido , Levaduras
12.
J Biol Chem ; 276(29): 27591-6, 2001 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-11353770

RESUMEN

The Drosophila S3 ribosomal protein has important roles in both protein translation and DNA repair. In regards to the latter activity, it has been shown that S3 contains vigorous N-glycosylase activity for the removal of 8-oxoguanine residues in DNA that leaves baseless sites in their places. Drosophila S3 also possesses an apurinic/apyrimidinic (AP) lyase activity in which the enzyme catalyzes a beta-elimination reaction that cleaves phosphodiester bonds 3' and adjacent to an AP lesion in DNA. In certain situations, this is followed by a delta-elimination reaction that ultimately leads to the formation of a single nucleotide gap in DNA bordered by 5'- and 3'-phosphate groups. The human S3 protein, although 80% identical to its Drosophila homolog and shorter by only two amino acids, has only marginal N-glycosylase activity. Its lyase activity only cleaves AP DNA by a beta-elimination reaction, thus further distinguishing itself from the Drosophila S3 protein in lacking a delta-elimination activity. Using a hidden Markov model analysis based on the crystal structures of several DNA repair proteins, the enzymatic differences between Drosophila and human S3 were suggested by the absence of a conserved glutamine residue in human S3 that usually resides at the cleft of the deduced active site pocket of DNA glycosylases. Here we show that the replacement of the Drosophila glutamine by an alanine residue leads to the complete loss of glycosylase activity. Unexpectedly, the delta-elimination reaction at AP sites was also abrogated by a change in the Drosophila glutamine residue. Thus, a single amino acid change converted the Drosophila activity into one that is similar to that possessed by the human S3 protein. In support of this were experiments executed in vivo that showed that human S3 and the Drosophila site-directed glutamine-changed S3 performed poorly when compared with Drosophila wild-type S3 and its ability to protect a bacterial mutant from the harmful effects of DNA-damaging agents.


Asunto(s)
Reparación del ADN , Guanina/análogos & derivados , Guanina/metabolismo , Proteínas Ribosómicas/metabolismo , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Animales , Secuencia de Bases , Borohidruros/química , Catálisis , ADN/metabolismo , Daño del ADN , Cartilla de ADN , Drosophila , Humanos , Mutagénesis Sitio-Dirigida , Mutágenos/toxicidad , Proteínas Ribosómicas/química , Proteínas Ribosómicas/genética , Homología de Secuencia de Aminoácido
13.
Nucleic Acids Res ; 29(8): 1772-80, 2001 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-11292850

RESUMEN

Yeast co-expressing rat APOBEC-1 and a fragment of human apolipoprotein B (apoB) mRNA assembled functional editosomes and deaminated C6666 to U in a mooring sequence-dependent fashion. The occurrence of APOBEC-1-complementing proteins suggested a naturally occurring mRNA editing mechanism in yeast. Previously, a hidden Markov model identified seven yeast genes encoding proteins possessing putative zinc-dependent deaminase motifs. Here, only CDD1, a cytidine deaminase, is shown to have the capacity to carry out C-->U editing on a reporter mRNA. This is only the second report of a cytidine deaminase that can use mRNA as a substrate. CDD1-dependent editing was growth phase regulated and demonstrated mooring sequence-dependent editing activity. Candidate yeast mRNA substrates were identified based on their homology with the mooring sequence-containing tripartite motif at the editing site of apoB mRNA and their ability to be edited by ectopically expressed APOBEC-1. Naturally occurring yeast mRNAs edited to a significant extent by CDD1 were, however, not detected. We propose that CDD1 be designated an orphan C-->U editase until its native RNA substrate, if any, can be identified and that it be added to the CDAR (cytidine deaminase acting on RNA) family of editing enzymes.


Asunto(s)
Citidina Desaminasa/metabolismo , Edición de ARN , Levaduras/enzimología , Desaminasas APOBEC-1 , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Western Blotting , Citidina Desaminasa/análisis , Citidina Desaminasa/química , Citidina Desaminasa/genética , Técnica del Anticuerpo Fluorescente , Prueba de Complementación Genética , Humanos , Cinética , Cadenas de Markov , Datos de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Estructura Terciaria de Proteína , Edición de ARN/genética , ARN de Hongos/genética , ARN de Hongos/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Ratas , Proteínas Recombinantes de Fusión/análisis , Proteínas Recombinantes de Fusión/química , Proteínas Recombinantes de Fusión/metabolismo , Alineación de Secuencia , Levaduras/genética
14.
Physiol Genomics ; 5(2): 99-111, 2001 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-11242594

RESUMEN

Transcription profiling experiments permit the expression levels of many genes to be measured simultaneously. Given profiling data from two types of samples, genes that most distinguish the samples (marker genes) are good candidates for subsequent in-depth experimental studies and developing decision support systems for diagnosis, prognosis, and monitoring. This work proposes a mixture of feature relevance experts as a method for identifying marker genes and illustrates the idea using published data from samples labeled as acute lymphoblastic and myeloid leukemia (ALL, AML). A feature relevance expert implements an algorithm that calculates how well a gene distinguishes samples, reorders genes according to this relevance measure, and uses a supervised learning method [here, support vector machines (SVMs)] to determine the generalization performances of different nested gene subsets. The mixture of three feature relevance experts examined implement two existing and one novel feature relevance measures. For each expert, a gene subset consisting of the top 50 genes distinguished ALL from AML samples as completely as all 7,070 genes. The 125 genes at the union of the top 50s are plausible markers for a prototype decision support system. Chromosomal aberration and other data support the prediction that the three genes at the intersection of the top 50s, cystatin C, azurocidin, and adipsin, are good targets for investigating the basic biology of ALL/AML. The same data were employed to identify markers that distinguish samples based on their labels of T cell/B cell, peripheral blood/bone marrow, and male/female. Selenoprotein W may discriminate T cells from B cells. Results from analysis of transcription profiling data from tumor/nontumor colon adenocarcinoma samples support the general utility of the aforementioned approach. Theoretical issues such as choosing SVM kernels and their parameters, training and evaluating feature relevance experts, and the impact of potentially mislabeled samples on marker identification (feature selection) are discussed.


Asunto(s)
Biomarcadores de Tumor/genética , Perfilación de la Expresión Génica , Leucemia Mieloide/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Transcripción Genética/genética , Enfermedad Aguda , Adenocarcinoma/diagnóstico , Adenocarcinoma/genética , Algoritmos , Linfocitos B/metabolismo , Teorema de Bayes , Células de la Médula Ósea/metabolismo , Niño , Aberraciones Cromosómicas/genética , Biología Computacional/métodos , Interpretación Estadística de Datos , Femenino , Regulación Neoplásica de la Expresión Génica , Marcadores Genéticos/genética , Humanos , Leucemia Mieloide/diagnóstico , Masculino , Especificidad de Órganos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , ARN Neoplásico/análisis , ARN Neoplásico/genética , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Caracteres Sexuales , Linfocitos T/metabolismo
16.
Physiol Genomics ; 4(2): 109-126, 2000 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-11120872

RESUMEN

A modular framework is proposed for modeling and understanding the relationships between molecular profile data and other domain knowledge using a combination of generative (here, graphical models) and discriminative [Support Vector Machines (SVMs)] methods. As illustration, naive Bayes models, simple graphical models, and SVMs were applied to published transcription profile data for 1,988 genes in 62 colon adenocarcinoma tissue specimens labeled as tumor or nontumor. These unsupervised and supervised learning methods identified three classes or subtypes of specimens, assigned tumor or nontumor labels to new specimens and detected six potentially mislabeled specimens. The probability parameters of the three classes were utilized to develop a novel gene relevance, ranking, and selection method. SVMs trained to discriminate nontumor from tumor specimens using only the 50-200 top-ranked genes had the same or better generalization performance than the full repertoire of 1,988 genes. Approximately 90 marker genes were pinpointed for use in understanding the basic biology of colon adenocarcinoma, defining targets for therapeutic intervention and developing diagnostic tools. These potential markers highlight the importance of tissue biology in the etiology of cancer. Comparative analysis of molecular profile data is proposed as a mechanism for predicting the physiological function of genes in instances when comparative sequence analysis proves uninformative, such as with human and yeast translationally controlled tumour protein. Graphical models and SVMs hold promise as the foundations for developing decision support systems for diagnosis, prognosis, and monitoring as well as inferring biological networks.


Asunto(s)
Perfilación de la Expresión Génica , Genes/genética , Teorema de Bayes , Humanos , Modelos Genéticos , Neoplasias/genética
17.
Physiol Genomics ; 4(2): 127-135, 2000 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-11120873

RESUMEN

A novel suite of analytical techniques and visualization tools are applied to 78 published transcription profiling experiments monitoring 5,687 Saccharomyces cerevisiae genes in studies examining cell cycle, responses to stress, and diauxic shift. A naive Bayes model discovered and characterized 45 classes of gene profile vectors. An enrichment measure quantified the association between these classes and specific external knowledge defined by four sets of categories to which genes can be assigned: 106 protein functions, 5 stages of the cell cycle, 265 transcription factors, and 16 chromosomal locations. Many of the 38 genes in class 42 are known to play roles in copper and iron homeostasis. The 17 uncharacterized open reading frames in this class may be involved in similar homeostatic processes; human homologs of two of them could be associated with as yet undefined disease states arising from aberrant metal ion regulation. The Met4, Met31, and Met32 transcription factors may play a role in coregulating genes involved in copper and iron metabolism. Extensions of the simple graphical model used for clustering to learning more complex models of genetic networks are discussed.


Asunto(s)
Cobre/metabolismo , Hierro/metabolismo , Saccharomyces cerevisiae/genética , Teorema de Bayes , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Genes Fúngicos/genética , Homeostasis , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Saccharomyces cerevisiae/metabolismo
18.
Mol Cell Biol ; 20(14): 5196-207, 2000 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-10866675

RESUMEN

Telomerase is a ribonucleoprotein reverse transcriptase responsible for the maintenance of one strand of telomere terminal repeats. The key protein subunit of the telomerase complex, known as TERT, possesses reverse transcriptase-like motifs that presumably mediate catalysis. These motifs are located in the C-terminal region of the polypeptide. Hidden Markov model-based sequence analysis revealed in the N-terminal region of all TERTs the presence of four conserved motifs, named GQ, CP, QFP, and T. Point mutation analysis of conserved residues confirmed the functional importance of the GQ motif. In addition, the distinct phenotypes of the GQ mutants suggest that this motif may play at least two distinct functions in telomere maintenance. Deletion analysis indicates that even the most N-terminal nonconserved region of yeast TERT (N region) is required for telomerase function. This N region exhibits a nonspecific nucleic acid binding activity that probably reflects an important physiologic function. Expression studies of various portions of the yeast TERT in Escherichia coli suggest that the N region and the GQ motif together may constitute a stable domain. We propose that all TERTs may have a bipartite organization, with an N-GQ domain connected to the other motifs through a flexible linker.


Asunto(s)
ARN , Telomerasa/genética , Telomerasa/metabolismo , Secuencia de Aminoácidos , Secuencia de Bases , Sitios de Unión , Secuencia Conservada , Proteínas de Unión al ADN , Endopeptidasas/metabolismo , Estabilidad de Enzimas , Datos de Secuencia Molecular , Mutación , Ácidos Nucleicos/metabolismo , Homología de Secuencia de Aminoácido
19.
Mol Vis ; 6: 30-9, 2000 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-10756179

RESUMEN

PURPOSE: We compared the structure and function of interphotoreceptor retinoid-binding protein (IRBP) related proteins and predicted domain and secondary structure within each repeat of IRBP and its relatives. We tested whether tail specific protease (Tsp), which bears sequence similarity to IRBP Domain B, binds fatty acids or retinoids, and whether IRBP possessed protease activity resembling Tsp's catalytic function. These tests helped us to learn whether the primary sequence similarities of family members extended to higher order structural and functional levels. METHODS: Predictions derived from multiple sequence alignments among IRBP and Tsp family members and secondary structure computer programs were carried out. The first repeat of human IRBP (EcR1) and Tsp were expressed, purified, and tested for binding properties. Tsp was examined for fluorescence enhancement of retinol or 16-anthroyloxy-palmitic acid (16-AP) to test for ligand binding. IRBP was tested for protease activity. RESULTS: Tsp did not exhibit fluorescence enhancement with retinol or 16-AP. IRBP did not exhibit protease activity. The positions of critical residues needed for the ligand binding properties of retinol were predicted. Primary sequence and three-dimensional similarity was found between Domain A of IRBP Repeat 3 and eglin c. CONCLUSIONS: The sequence similarity of Tsp and IRBP raised the possibility that each might share the function of the other protein: IRBP might possess protease activity or Tsp might possess retinoid or fatty acid binding activity. Our studies do not support such a shared function hypothesis, and suggest that the sequence similarity is the result of maintenance of structure. The finding of similarity to eglin c in Domain A suggests the possibility of a tight interaction between Domain A and Domain B, possibly implying the need for Domain A in retinoid-binding, and suggesting that both Domains should be present in testing mutations. The positions of predicted critical amino acids suggest models in which a large binding pocket holds the retinoid or fatty acid ligand. These predictions are tested in a companion paper.


Asunto(s)
Endopeptidasas/química , Proteínas del Ojo , Proteínas de Unión al Retinol/química , Análisis por Conglomerados , Humanos , Ligandos , Cadenas de Markov , Ácidos Palmíticos/química , Unión Proteica , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Proteínas , Alineación de Secuencia , Serpinas/química , Espectrometría de Fluorescencia , Vitamina A/química
20.
Mol Vis ; 6: 40-50, 2000 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-10756180

RESUMEN

PURPOSE: The purpose of this study was to measure the effects of mutations on the retinol binding capability of human Repeat 1 of interphotoreceptor retinoid-binding protein (IRBP). First, we predicted important functional amino acids by several computer programs. We also noted the lack of shared functions between Tail-specific protease (Tsp) and IRBP, which bear sequence similarity, and this aided in predicting functional residues. We analyzed the effects of point substitutions on the retinol and fatty acid binding properties of Repeat 1 of human IRBP at 25 and 50 degrees C. METHODS: To find residues critical to retinol binding that might affect function, a series of thirteen mutations were created by site-specific mutagenesis between positions 140 and 280 in Repeat 1 of human IRBP. These mutants were expressed, purified, and tested for binding properties. The conformations of the proteins were examined by circular dichroism (CD) scans. RESULTS: Seven of the mutations exhibited reduced binding capacity, and five were not expressed at high enough levels to assess binding activity. Four of the mutants were purified, and their CD scans were very similar to those of Repeat 1. Only one of the mutations did not affect binding, folding, or expression when compare to wild type Repeat 1. CONCLUSIONS: Several IRBP mutants containing point mutations retained native structure but lost retinol binding function. The data suggest that retinol binding is affected by many different amino acid substitutions in or near a binding pocket. That even a single point substitution can profoundly affect binding without affecting overall conformation suggests that much of Domain B (from amino acid positions 80 to 300) is involved with ligand binding. This excludes three previously proposed IRBP-retinol binding mechanisms: (1) retinol binds to a small portion of the protein repeat, (2) retinol can bind to any hydrophobic patch in the protein, and (3) native conformation is not required for retinol binding to the repeat.


Asunto(s)
Proteínas del Ojo , Proteínas de Unión al Retinol/química , Sustitución de Aminoácidos , Sitios de Unión , Western Blotting , Tampones (Química) , Dicroismo Circular , Endopeptidasas/química , Escherichia coli/metabolismo , Humanos , Mutagénesis Sitio-Dirigida , Mutación Puntual , Desnaturalización Proteica , Pliegue de Proteína , Proteínas de Unión al Retinol/genética , Proteínas de Unión al Retinol/aislamiento & purificación , Proteínas de Unión al Retinol/metabolismo , Espectrometría de Fluorescencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA