Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Mol Cell ; 77(4): 688-708, 2020 02 20.
Artigo em Inglês | MEDLINE | ID: mdl-32001106

RESUMO

Rapidly developing technologies have recently fueled an exciting era of discovery in the field of chromosome structure and nuclear organization. In addition to chromosome conformation capture (3C) methods, new alternative techniques have emerged to study genome architecture and biological processes in the nucleus, often in single or living cells. This sets an unprecedented stage for exploring the mechanisms that link chromosome structure and biological function. Here we review popular as well as emerging approaches to study chromosome organization, focusing on the contribution of complementary methodologies to our understanding of structures revealed by 3C methods and their biological implications, and discuss the next technical and conceptual frontiers.


Assuntos
Cromossomos de Mamíferos/química , Animais , Núcleo Celular/genética , Reparo do DNA , Período de Replicação do DNA , Técnicas Genéticas , Modelos Genéticos , Análise de Célula Única , Transcrição Gênica
2.
Nucleic Acids Res ; 51(3): 1103-1119, 2023 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-36629266

RESUMO

The Hi-C method has revolutionized the study of genome organization, yet interpretation of Hi-C interaction frequency maps remains a major challenge. Genomic compartments are a checkered Hi-C interaction pattern suggested to represent the partitioning of the genome into two self-interacting states associated with active and inactive chromatin. Based on a few elementary mechanistic assumptions, we derive a generative probabilistic model of genomic compartments, called deGeco. Testing our model, we find it can explain observed Hi-C interaction maps in a highly robust manner, allowing accurate inference of interaction probability maps from extremely sparse data without any training of parameters. Taking advantage of the interpretability of the model parameters, we then test hypotheses regarding the nature of genomic compartments. We find clear evidence of multiple states, and that these states self-interact with different affinities. We also find that the interaction rules of chromatin states differ considerably within and between chromosomes. Inspecting the molecular underpinnings of a four-state model, we show that a simple classifier can use histone marks to predict the underlying states with 87% accuracy. Finally, we observe instances of mixed-state loci and analyze these loci in single-cell Hi-C maps, finding that mixing of states occurs mainly at the cell level.


Assuntos
Cromatina , Genoma , Genômica/métodos , Cromossomos , Probabilidade
3.
Nature ; 563(7729): 121-125, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30333624

RESUMO

Many evolutionarily distant pathogenic organisms have evolved similar survival strategies to evade the immune responses of their hosts. These include antigenic variation, through which an infecting organism prevents clearance by periodically altering the identity of proteins that are visible to the immune system of the host1. Antigenic variation requires large reservoirs of immunologically diverse antigen genes, which are often generated through homologous recombination, as well as mechanisms to ensure the expression of one or very few antigens at any given time. Both homologous recombination and gene expression are affected by three-dimensional genome architecture and local DNA accessibility2,3. Factors that link three-dimensional genome architecture, local chromatin conformation and antigenic variation have, to our knowledge, not yet been identified in any organism. One of the major obstacles to studying the role of genome architecture in antigenic variation has been the highly repetitive nature and heterozygosity of antigen-gene arrays, which has precluded complete genome assembly in many pathogens. Here we report the de novo haplotype-specific assembly and scaffolding of the long antigen-gene arrays of the model protozoan parasite Trypanosoma brucei, using long-read sequencing technology and conserved features of chromosome folding4. Genome-wide chromosome conformation capture (Hi-C) reveals a distinct partitioning of the genome, with antigen-encoding subtelomeric regions that are folded into distinct, highly compact compartments. In addition, we performed a range of analyses-Hi-C, fluorescence in situ hybridization, assays for transposase-accessible chromatin using sequencing and single-cell RNA sequencing-that showed that deletion of the histone variants H3.V and H4.V increases antigen-gene clustering, DNA accessibility across sites of antigen expression and switching of the expressed antigen isoform, via homologous recombination. Our analyses identify histone variants as a molecular link between global genome architecture, local chromatin conformation and antigenic variation.


Assuntos
Variação Antigênica/genética , Cromatina/genética , Cromatina/metabolismo , DNA de Protozoário/metabolismo , Genoma/genética , Trypanosoma brucei brucei/genética , Trypanosoma brucei brucei/imunologia , DNA de Protozoário/genética , Haplótipos/genética , Histonas/deficiência , Histonas/genética , Família Multigênica/genética , Isoformas de Proteínas/biossíntese , Isoformas de Proteínas/genética , Glicoproteínas Variantes de Superfície de Trypanosoma/biossíntese , Glicoproteínas Variantes de Superfície de Trypanosoma/genética
4.
Nature ; 535(7613): 575-9, 2016 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-27437574

RESUMO

X-chromosome inactivation (XCI) involves major reorganization of the X chromosome as it becomes silent and heterochromatic. During female mammalian development, XCI is triggered by upregulation of the non-coding Xist RNA from one of the two X chromosomes. Xist coats the chromosome in cis and induces silencing of almost all genes via its A-repeat region, although some genes (constitutive escapees) avoid silencing in most cell types, and others (facultative escapees) escape XCI only in specific contexts. A role for Xist in organizing the inactive X (Xi) chromosome has been proposed. Recent chromosome conformation capture approaches have revealed global loss of local structure on the Xi chromosome and formation of large mega-domains, separated by a region containing the DXZ4 macrosatellite. However, the molecular architecture of the Xi chromosome, in both the silent and expressed regions,remains unclear. Here we investigate the structure, chromatin accessibility and expression status of the mouse Xi chromosome in highly polymorphic clonal neural progenitors (NPCs) and embryonic stem cells. We demonstrate a crucial role for Xist and the DXZ4-containing boundary in shaping Xi chromosome structure using allele-specific genome-wide chromosome conformation capture (Hi-C) analysis, an assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq) and RNA sequencing. Deletion of the boundary disrupts mega-domain formation, and induction of Xist RNA initiates formation of the boundary and the loss of DNA accessibility. We also show that in NPCs, the Xi chromosome lacks active/inactive compartments and topologically associating domains (TADs), except around genes that escape XCI. Escapee gene clusters display TAD-like structures and retain DNA accessibility at promoter-proximal and CTCF-binding sites. Furthermore, altered patterns of facultative escape genes indifferent neural progenitor clones are associated with the presence of different TAD-like structures after XCI. These findings suggest a key role for transcription and CTCF in the formation of TADs in the context of the Xi chromosome in neural progenitors.


Assuntos
Cromossomos de Mamíferos/metabolismo , Inativação do Cromossomo X , Cromossomo X/metabolismo , Alelos , Animais , Sítios de Ligação , Fator de Ligação a CCCTC , Cromatina/química , Cromatina/genética , Cromatina/metabolismo , Cromossomos de Mamíferos/química , Cromossomos de Mamíferos/genética , Células-Tronco Embrionárias/metabolismo , Feminino , Inativação Gênica , Masculino , Camundongos , Células-Tronco Neurais/metabolismo , Regiões Promotoras Genéticas/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Proteínas Repressoras/metabolismo , Análise de Sequência , Transcrição Gênica , Cromossomo X/química , Cromossomo X/genética , Inativação do Cromossomo X/genética
5.
Genome Res ; 28(10): 1455-1466, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30166406

RESUMO

Mitosis encompasses key molecular changes including chromatin condensation, nuclear envelope breakdown, and reduced transcription levels. Immediately after mitosis, the interphase chromatin structure is reestablished and transcription resumes. The reestablishment of the interphase chromatin is probably achieved by "bookmarking," i.e., the retention of at least partial information during mitosis. To gain a deeper understanding of the contribution of histone modifications to the mitotic bookmarking process, we merged proteomics, immunofluorescence, and ChIP-seq approaches. We focused on key histone modifications and employed HeLa-S3 cells as a model system. Generally, in spite of the general hypoacetylation observed during mitosis, we observed a global concordance between the genomic organization of histone modifications in interphase and mitosis, suggesting that the epigenomic landscape may serve as a component of the mitotic bookmarking process. Next, we investigated the nucleosome that enters nucleosome depleted regions (NDRs) during mitosis. We observed that in ∼60% of the NDRs, the entering nucleosome is distinct from the surrounding highly acetylated nucleosomes and appears to have either low levels of acetylation or high levels of phosphorylation in adjacent residues (since adjacent phosphorylation may interfere with the ability to detect acetylation). Inhibition of histone deacetylases (HDACs) by the small molecule TSA reverts this pattern, suggesting that these nucleosomes are specifically deacetylated during mitosis. Altogether, by merging multiple approaches, our study provides evidence to support a model where histone modifications may play a role in mitotic bookmarking and uncovers new insights into the deposition of nucleosomes during mitosis.


Assuntos
Histonas/metabolismo , Mitose , Nucleossomos/genética , Acetilação/efeitos dos fármacos , Imunoprecipitação da Cromatina , Células HeLa , Código das Histonas , Inibidores de Histona Desacetilases/farmacologia , Histona Desacetilases/metabolismo , Humanos , Nucleossomos/efeitos dos fármacos , Nucleossomos/metabolismo , Fosforilação , Proteômica
6.
Methods ; 142: 89-99, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29684640

RESUMO

Assembly of reference-quality genomes from next-generation sequencing data is a key challenge in genomics. Recently, we and others have shown that Hi-C data can be used to address several outstanding challenges in the field of genome assembly. This principle has since been developed in academia and industry, and has been used in the assembly of several major genomes. In this paper, we explore the central principles underlying Hi-C-based assembly approaches, by quantitatively defining and characterizing three invariant Hi-C interaction patterns on which these approaches can build: Intrachromosomal interaction enrichment, distance-dependent interaction decay and local interaction smoothness. Specifically, we evaluate to what degree each invariant pattern holds on a single locus level in different species, cell types and Hi-C map resolutions. We find that these patterns are generally consistent across species and cell types but are affected by sequencing depth, and that matrix balancing improves consistency of loci with all three invariant patterns. Finally, we overview current Hi-C-based assembly approaches in light of these invariant patterns and demonstrate how local interaction smoothness can be used to easily detect scaffolding errors in extremely sparse Hi-C maps. We suggest that simultaneously considering all three invariant patterns may lead to better Hi-C-based genome assembly methods.


Assuntos
Mapeamento Cromossômico/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Modelos Genéticos , Anotação de Sequência Molecular/métodos , Animais , Mapeamento Cromossômico/instrumentação , DNA/química , DNA/genética , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Imageamento Tridimensional/instrumentação , Imageamento Tridimensional/métodos , Metagenômica/instrumentação , Modelos Estatísticos , Imagem Molecular/instrumentação , Imagem Molecular/métodos , Conformação de Ácido Nucleico , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos
7.
Genome Res ; 25(11): 1727-38, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26330564

RESUMO

A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream from transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes and affects codon usage and amino acid composition in genes. Given that the AT-rich Tetrahymena genome is intrinsically unfavorable for nucleosome formation, we propose that these "seed" nucleosomes--together with trans-acting factors--may facilitate the establishment of nucleosome arrays within genes in vivo, while minimizing changes to the underlying coding sequences.


Assuntos
Genoma de Protozoário , Nucleossomos/genética , Fases de Leitura Aberta , Tetrahymena thermophila/genética , Mapeamento Cromossômico , DNA de Protozoário/genética , Estudos de Associação Genética , Análise de Sequência de DNA , Transcrição Gênica
8.
Methods ; 72: 65-75, 2015 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-25448293

RESUMO

Over the last decade, development and application of a set of molecular genomic approaches based on the chromosome conformation capture method (3C), combined with increasingly powerful imaging approaches, have enabled high resolution and genome-wide analysis of the spatial organization of chromosomes. The aim of this paper is to provide guidelines for analyzing and interpreting data obtained with genome-wide 3C methods such as Hi-C and 3C-seq that rely on deep sequencing to detect and quantify pairwise chromatin interactions.


Assuntos
Cromatina/metabolismo , Genômica/métodos , Conformação de Ácido Nucleico , Cromatina/química , Mapeamento Cromossômico/métodos , Interpretação Estatística de Dados , Conjuntos de Dados como Assunto , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Conformação Molecular
9.
Nature ; 458(7236): 362-6, 2009 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-19092803

RESUMO

Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for approximately 40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.


Assuntos
Células Eucarióticas/metabolismo , Genoma Fúngico/genética , Nucleossomos/genética , Saccharomyces cerevisiae/genética , Animais , Sequência de Bases , Caenorhabditis elegans/genética , Galinhas , Biologia Computacional , Simulação por Computador , Nuclease do Micrococo/metabolismo , Nucleossomos/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Saccharomyces cerevisiae/crescimento & desenvolvimento , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo
10.
Commun Biol ; 6(1): 1110, 2023 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-37919399

RESUMO

The noisy and high-dimensional nature of biological data has spawned advanced clustering algorithms that are tailored for specific biological datatypes. However, the performance of such methods varies greatly between datasets and they require post hoc tuning of cryptic hyperparameters. We present k minimal distance (KMD) clustering, a general-purpose method based on a generalization of single and average linkage hierarchical clustering. We introduce a generalized silhouette-like function to eliminate the cryptic hyperparameter k, and use sampling to enable application to million-object datasets. Rigorous comparisons to general and specialized clustering methods on simulated, mass cytometry and scRNA-seq datasets show consistent high performance of KMD clustering across all datasets.


Assuntos
Algoritmos , Análise de Sequência de RNA/métodos , Análise por Conglomerados
11.
bioRxiv ; 2023 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-38076840

RESUMO

Spermatogenesis is a unidirectional differentiation process that generates haploid sperm, but how the gene expression program that directs this process is established is largely unknown. Here we determine the high-resolution 3D chromatin architecture of male germ cells during spermatogenesis and show that CTCF-mediated 3D chromatin predetermines the gene expression program required for spermatogenesis. In undifferentiated spermatogonia, CTCF-mediated chromatin contacts on autosomes pre-establish meiosis-specific super-enhancers (SE). These meiotic SE recruit the master transcription factor A-MYB in meiotic spermatocytes, which strengthens their 3D contacts and instructs a burst of meiotic gene expression. We also find that at the mitosis-to-meiosis transition, the germline-specific Polycomb protein SCML2 resolves chromatin loops that are specific to mitotic spermatogonia. Moreover, SCML2 and A-MYB establish the unique 3D chromatin organization of sex chromosomes during meiotic sex chromosome inactivation. We propose that CTCF-mediated 3D chromatin organization enforces epigenetic priming that directs unidirectional differentiation, thereby determining the cellular identity of the male germline.

12.
PLoS Comput Biol ; 4(11): e1000216, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18989395

RESUMO

The detailed positions of nucleosomes profoundly impact gene regulation and are partly encoded by the genomic DNA sequence. However, less is known about the functional consequences of this encoding. Here, we address this question using a genome-wide map of approximately 380,000 yeast nucleosomes that we sequenced in their entirety. Utilizing the high resolution of our map, we refine our understanding of how nucleosome organizations are encoded by the DNA sequence and demonstrate that the genomic sequence is highly predictive of the in vivo nucleosome organization, even across new nucleosome-bound sequences that we isolated from fly and human. We find that Poly(dA:dT) tracts are an important component of these nucleosome positioning signals and that their nucleosome-disfavoring action results in large nucleosome depletion over them and over their flanking regions and enhances the accessibility of transcription factors to their cognate sites. Our results suggest that the yeast genome may utilize these nucleosome positioning signals to regulate gene expression with different transcriptional noise and activation kinetics and DNA replication with different origin efficiency. These distinct functions may be achieved by encoding both relatively closed (nucleosome-covered) chromatin organizations over some factor binding sites, where factors must compete with nucleosomes for DNA access, and relatively open (nucleosome-depleted) organizations over other factor sites, where factors bind without competition.


Assuntos
DNA Fúngico/genética , Região de Controle de Locus Gênico , Nucleossomos/genética , Saccharomyces cerevisiae/genética , Transcrição Gênica/genética , Animais , Sequência de Bases/genética , Sítios de Ligação/genética , Montagem e Desmontagem da Cromatina/genética , Drosophila melanogaster/genética , Regulação Fúngica da Expressão Gênica/genética , Células HeLa , Humanos , Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/metabolismo
13.
Nat Struct Mol Biol ; 26(3): 175-184, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30778237

RESUMO

Germ cells manifest a unique gene expression program and regain totipotency in the zygote. Here, we perform Hi-C analysis to examine 3D chromatin organization in male germ cells during spermatogenesis. We show that the highly compartmentalized 3D chromatin organization characteristic of interphase nuclei is attenuated in meiotic prophase. Meiotic prophase is predominated by short-range intrachromosomal interactions that represent a condensed form akin to that of mitotic chromosomes. However, unlike mitotic chromosomes, meiotic chromosomes display weak genomic compartmentalization, weak topologically associating domains, and localized point interactions in prophase. In postmeiotic round spermatids, genomic compartmentalization increases and gives rise to the strong compartmentalization seen in mature sperm. The X chromosome lacks domain organization during meiotic sex-chromosome inactivation. We propose that male meiosis occurs amid global reprogramming of 3D chromatin organization and that strengthening of chromatin compartmentalization takes place in spermiogenesis to prepare the next generation of life.


Assuntos
Montagem e Desmontagem da Cromatina/fisiologia , Meiose/fisiologia , Espermátides/crescimento & desenvolvimento , Espermatócitos/crescimento & desenvolvimento , Espermatogênese/fisiologia , Animais , Cromatina/metabolismo , Cromossomos/metabolismo , Interfase/fisiologia , Masculino , Prófase Meiótica I/fisiologia , Camundongos , Camundongos Endogâmicos C57BL , Domínios Proteicos/fisiologia
14.
J Mol Biol ; 369(2): 553-66, 2007 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-17433819

RESUMO

Most animal toxins are short proteins that appear in venom and vary in sequence, structure and function. A common characteristic of many such toxins is their apparent structural stability. Sporadic instances of endogenous toxin-like proteins that function in non-venom context have been reported. We have utilized machine learning methodology, based on sequence-derived features and guided by the notion of structural stability, in order to conduct a large-scale search for toxin and toxin-like proteins. Application of the method to insect and mammalian sequences revealed novel families of toxin-like proteins. One of these proteins shows significant similarity to ion channel inhibitors that are expressed in cone snail and assassin bug venom, and is surprisingly expressed in the bee brain. A toxicity assay in which the protein was injected to fish induced a strong yet reversible paralytic effect. We suggest that the protein may function as an endogenous modulator of voltage-gated Ca(2+) channels. Additionally, we have identified a novel mammalian cluster of toxin-like proteins that are expressed in the testis. We suggest that these proteins might be involved in regulation of nicotinic acetylcholine receptors that affect the acrosome reaction and sperm motility. Finally, we highlight a possible evolutionary link between venom toxins and antibacterial proteins. We expect our methodology to enhance the discovery of additional novel protein families.


Assuntos
Simulação por Computador , Peptídeos/genética , Toxinas Biológicas/química , Toxinas Biológicas/genética , Sequência de Aminoácidos , Animais , Peptídeos Catiônicos Antimicrobianos/química , Peptídeos Catiônicos Antimicrobianos/genética , Apamina/química , Apamina/genética , Sequência de Bases , Abelhas , Humanos , Proteínas de Insetos/química , Proteínas de Insetos/genética , Insetos , Camundongos , Dados de Sequência Molecular , Neuropeptídeos/química , Peptídeos/química , Peptídeos/classificação , Conformação Proteica , Reprodutibilidade dos Testes , Alinhamento de Sequência , Toxinas Biológicas/classificação
15.
J Magn Reson ; 184(1): 44-50, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17029992

RESUMO

In analogy with Nuclear MRI, the ESR signal phase shift of conduction electrons moving in electrical currents along controlled magnetic field gradients can be used to generate spatial electronic current density maps. First two-dimensional images of the current density distribution in quasi-one-dimensional organic conductors are presented.


Assuntos
Algoritmos , Espectroscopia de Ressonância de Spin Eletrônica/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Processamento de Sinais Assistido por Computador , Espectroscopia de Ressonância de Spin Eletrônica/instrumentação , Imageamento por Ressonância Magnética/instrumentação
16.
Nucleic Acids Res ; 33(Database issue): D216-8, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608180

RESUMO

ProtoNet is an automatic hierarchical classification of the protein sequence space. In 2004, the ProtoNet (version 4.0) presents the analysis of over one million proteins merged from SwissProt and TrEMBL databases. In addition to rich visualization and analysis tools to navigate the clustering hierarchy, we incorporated several improvements that allow a simplified view of the scaffold of the proteins. An unsupervised, biologically valid method that was developed resulted in a condensation of the ProtoNet hierarchy to only 12% of the clusters. A large portion of these clusters was automatically assigned high confidence biological names according to their correspondence with functional annotations. ProtoNet is available at: http://www.protonet.cs.huji.ac.il.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Análise de Sequência de Proteína , Animais , Análise por Conglomerados , Humanos , Internet , Camundongos , Proteínas/química
17.
Protein Sci ; 15(6): 1557-62, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16672244

RESUMO

In an era of rapid genome sequencing and high-throughput technology, automatic function prediction for a novel sequence is of utter importance in bioinformatics. While automatic annotation methods based on local alignment searches can be simple and straightforward, they suffer from several drawbacks, including relatively low sensitivity and assignment of incorrect annotations that are not associated with the region of similarity. ProtoNet is a hierarchical organization of the protein sequences in the UniProt database. Although the hierarchy is constructed in an unsupervised automatic manner, it has been shown to be coherent with several biological data sources. We extend the ProtoNet system in order to assign functional annotations automatically. By leveraging on the scaffold of the hierarchical classification, the method is able to overcome some frequent annotation pitfalls.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/metabolismo , Bases de Dados de Proteínas , Proteínas/química
18.
Nucleic Acids Res ; 31(19): 5617-26, 2003 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-14500825

RESUMO

Recent advances in high-throughput methods and the application of computational tools for automatic classification of proteins have made it possible to carry out large-scale proteomic analyses. Biological analysis and interpretation of sets of proteins is a time-consuming undertaking carried out manually by experts. We have developed PANDORA (Protein ANnotation Diagram ORiented Analysis), a web-based tool that provides an automatic representation of the biological knowledge associated with any set of proteins. PANDORA uses a unique approach of keyword-based graphical analysis that focuses on detecting subsets of proteins that share unique biological properties and the intersections of such sets. PANDORA currently supports SwissProt keywords, NCBI Taxonomy, InterPro entries and the hierarchical classification terms from ENZYME, SCOP and GO databases. The integrated study of several annotation sources simultaneously allows a representation of biological relations of structure, function, cellular location, taxonomy, domains and motifs. PANDORA is also integrated into the ProtoNet system, thus allowing testing thousands of automatically generated clusters. We illustrate how PANDORA enhances the biological understanding of large, non-uniform sets of proteins originating from experimental and computational sources, without the need for prior biological knowledge on individual proteins.


Assuntos
Biologia Computacional/métodos , Proteínas/classificação , Software , Gráficos por Computador , Bases de Dados de Proteínas , Internet , Proteínas/química , Proteínas/fisiologia , Integração de Sistemas , Terminologia como Assunto
19.
BMC Bioinformatics ; 6: 46, 2005 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-15755318

RESUMO

BACKGROUND: Computational protein annotation methods occasionally introduce errors. False-positive (FP) errors are annotations that are mistakenly associated with a protein. Such false annotations introduce errors that may spread into databases through similarity with other proteins. Generally, methods used to minimize the chance for FPs result in decreased sensitivity or low throughput. We present a novel protein-clustering method that enables automatic separation of FP from true hits. The method quantifies the biological similarity between pairs of proteins by examining each protein's annotations, and then proceeds by clustering sets of proteins that received similar annotation into biological groups. RESULTS: Using a test set of all PROSITE signatures that are marked as FPs, we show that the method successfully separates FPs in 69% of the 327 test cases supplied by PROSITE. Furthermore, we constructed an extensive random FP simulation test and show a high degree of success in detecting FP, indicating that the method is not specifically tuned for PROSITE and performs well on larger scales. We also suggest some means of predicting in which cases this approach would be successful. CONCLUSION: Automatic detection of FPs may greatly facilitate the manual validation process and increase annotation sensitivity. With the increasing number of automatic annotations, the tendency of biological properties to be clustered, once a biological similarity measure is introduced, may become exceedingly helpful in the development of such automatic methods.


Assuntos
Biologia Computacional/métodos , Algoritmos , Motivos de Aminoácidos , Automação , Análise por Conglomerados , Gráficos por Computador , Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Bases de Dados Genéticas , Bases de Dados de Proteínas , Modelos Estatísticos , Ligação Proteica , Mapeamento de Interação de Proteínas , Proteínas/química , Proteômica , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Alinhamento de Sequência , Análise de Sequência de Proteína , Software
20.
BMC Bioinformatics ; 5: 196, 2004 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-15596019

RESUMO

BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity. RESULTS: In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust. CONCLUSIONS: We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Inteligência Artificial , Automação , Cátions , Análise por Conglomerados , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação , Análise Numérica Assistida por Computador , Reconhecimento Automatizado de Padrão , Software , Homologia Estrutural de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA