Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 39(4): 1208-19, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20972208

RESUMO

According to current estimations ∼95% of multi-exonic human protein-coding genes undergo alternative splicing (AS). However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms. Surveying these structural isoforms revealed that the maximum insertion accommodated by an isoform of a fully ordered protein domain was 5 amino acids, other instances of domain changes involved intrinsic structural disorder. After collecting 505 minor isoforms of human proteins with evidence for their existence we analyzed their length, protein disorder and exposed hydrophobic surface. We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered. We also observed an inverse correlation between the domain fraction lost and the full length of the minor isoform containing the domain, possibly indicating a buffering effect for the isoform protein counteracting the domain truncation effect. These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.


Assuntos
Processamento Alternativo , Isoformas de Proteínas/química , Bases de Dados de Proteínas , Variação Genética , Humanos , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Estrutura Terciária de Proteína , Análise de Sequência de Proteína
2.
PLoS Comput Biol ; 5(10): e1000552, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19888473

RESUMO

Chromosomal translocations, which often generate chimeric proteins by fusing segments of two distinct genes, represent the single major genetic aberration leading to cancer. We suggest that the unifying theme of these events is a high level of intrinsic structural disorder, enabling fusion proteins to evade cellular surveillance mechanisms that eliminate misfolded proteins. Predictions in 406 translocation-related human proteins show that they are significantly enriched in disorder (43.3% vs. 20.7% in all human proteins), they have fewer Pfam domains, and their translocation breakpoints tend to avoid domain splitting. The vicinity of the breakpoint is significantly more disordered than the rest of these already highly disordered fusion proteins. In the unlikely event of domain splitting in fusion it usually spares much of the domain or splits at locations where the newly exposed hydrophobic surface area approximates that of an intact domain. The mechanisms of action of fusion proteins suggest that in most cases their structural disorder is also essential to the acquired oncogenic function, enabling the long-range structural communication of remote binding and/or catalytic elements. In this respect, there are three major mechanisms that contribute to generating an oncogenic signal: (i) a phosphorylation site and a tyrosine-kinase domain are fused, and structural disorder of the intervening region enables intramolecular phosphorylation (e.g., BCR-ABL); (ii) a dimerisation domain fuses with a tyrosine kinase domain and disorder enables the two subunits within the homodimer to engage in permanent intermolecular phosphorylations (e.g., TFG-ALK); (iii) the fusion of a DNA-binding element to a transactivator domain results in an aberrant transcription factor that causes severe misregulation of transcription (e.g. EWS-ATF). Our findings also suggest novel strategies of intervention against the ensuing neoplastic transformations.


Assuntos
Biologia Computacional/métodos , Proteínas de Fusão Oncogênica/química , Proteínas de Fusão Oncogênica/fisiologia , Algoritmos , Sobrevivência Celular/fisiologia , Transformação Celular Neoplásica , Quebra Cromossômica , Simulação por Computador , Humanos , Modelos Logísticos , Modelos Moleculares , Proteínas de Fusão Oncogênica/genética , Oncogenes , Conformação Proteica , Dobramento de Proteína , Translocação Genética
3.
Nucleic Acids Res ; 36(1): e3, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18079152

RESUMO

Oligonucleotide microarrays have been applied to microbial surveillance and discovery where highly multiplexed assays are required to address a wide range of genetic targets. Although printing density continues to increase, the design of comprehensive microbial probe sets remains a daunting challenge, particularly in virology where rapid sequence evolution and database expansion confound static solutions. Here, we present a strategy for probe design based on protein sequences that is responsive to the unique problems posed in virus detection and discovery. The method uses the Protein Families database (Pfam) and motif finding algorithms to identify oligonucleotide probes in conserved amino acid regions and untranslated sequences. In silico testing using an experimentally derived thermodynamic model indicated near complete coverage of the viral sequence database.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos , Sondas de Oligonucleotídeos/química , Análise de Sequência de Proteína/métodos , Proteínas Virais/química , Vírus/isolamento & purificação , Motivos de Aminoácidos , Sequência de Aminoácidos , Sequência Conservada , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genes Virais , Genoma Viral , HIV-1/genética , Humanos , Homologia de Sequência de Aminoácidos , Proteínas Virais/genética , Vírus/genética
4.
Bioinformatics ; 24(12): 1469-70, 2008 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-18434342

RESUMO

UNLABELLED: The TOPDOM database is a collection of domains and sequence motifs located consistently on the same side of the membrane in alpha-helical transmembrane proteins. The database was created by scanning well-annotated transmembrane protein sequences in the UniProt database by specific domain or motif detecting algorithms. The identified domains or motifs were added to the database if they were uniformly annotated on the same side of the membrane of the various proteins in the UniProt database. The information about the location of the collected domains and motifs can be incorporated into constrained topology prediction algorithms, like HMMTOP, increasing the prediction accuracy. AVAILABILITY: The TOPDOM database and the constrained HMMTOP prediction server are available on the page http://topdom.enzim.hu CONTACT: tusi@enzim.hu; lkalmar@enzim.hu.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Proteínas de Membrana/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Interface Usuário-Computador , Algoritmos , Motivos de Aminoácidos , Sequência de Aminoácidos , Sequência Conservada , Dados de Sequência Molecular , Estrutura Terciária de Proteína
5.
PLoS Comput Biol ; 4(3): e1000017, 2008 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-18369417

RESUMO

Intrinsically disordered/unstructured proteins (IDPs) are extremely sensitive to proteolysis in vitro, but show no enhanced degradation rates in vivo. Their existence and functioning may be explained if IDPs are preferentially associated with chaperones in the cell, which may offer protection against degradation by proteases. To test this inference, we took pairwise interaction data from high-throughput interaction studies and analyzed to see if predicted disorder correlates with the tendency of chaperone binding by proteins. Our major finding is that disorder predicted by the IUPred algorithm actually shows negative correlation with chaperone binding in E. coli, S. cerevisiae, and metazoa species. Since predicted disorder positively correlates with the tendency of partner binding in the interactome, the difference between the disorder of chaperone-binding and non-binding proteins is even more pronounced if normalized to their overall tendency to be involved in pairwise protein-protein interactions. We argue that chaperone binding is primarily required for folding of globular proteins, as reflected in an increased preference for chaperones of proteins in which at least one Pfam domain exists. In terms of the functional consequences of chaperone binding of mostly disordered proteins, we suggest that its primary reason is not the assistance of folding, but promotion of assembly with partners. In support of this conclusion, we show that IDPs that bind chaperones also tend to bind other proteins.


Assuntos
Modelos Químicos , Modelos Moleculares , Chaperonas Moleculares/química , Mapeamento de Interação de Proteínas/métodos , Análise de Sequência de Proteína/métodos , Sítios de Ligação , Simulação por Computador , Ligação Proteica
6.
BMC Bioinformatics ; 9: 353, 2008 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-18752676

RESUMO

BACKGROUND: Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes. RESULTS: Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries. CONCLUSION: MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Internet , Processamento de Linguagem Natural , Proteínas/classificação , Terminologia como Assunto , Artefatos , Proteínas/química , Proteínas/metabolismo , Controle de Qualidade , Análise de Sequência de Proteína/métodos
7.
BMC Struct Biol ; 7: 65, 2007 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-17922903

RESUMO

BACKGROUND: The idea that the assembly of protein complexes is linked with protein disorder has been inferred from a few large complexes, such as the viral capsid or bacterial flagellar system, only. The relationship, which suggests that larger complexes have more disorder, has never been systematically tested. The recent high-throughput analyses of protein-protein interactions and protein complexes in the cell generated data that enable to address this issue by bioinformatic means. RESULTS: In this work we predicted structural disorder for both E. coli and S. cerevisiae, and correlated it with the size of complexes. Using IUPred to predict the disorder for each complex, we found a statistically significant correlation between disorder and the number of proteins assembled into complexes. The distribution of disorder has a median value of 10% in yeast for complexes of 2-4 components (6% in E. coli), but 18% for complexes in the size range of 11-100 proteins (12% in E. coli). The level of disorder as assessed for regions longer than 30 consecutive disordered residues shows an even stronger division between small and large complexes (median values about 4% for complexes of 2-4 components, but 12% for complexes of 11-100 components in yeast). The predicted correlation is also supported by experimental evidence, by observing the structural disorder in protein components of complexes that can be found in the Protein Data Bank (median values 1. 5% for complexes of 2-4 components, and 9.6% for complexes of 11-100 components in yeast). Further analysis shows that this correlation is not directly linked with the increased disorder in hub proteins, but reflects a genuine systemic property of the proteins that make up the complexes. CONCLUSION: Overall, it is suggested and discussed that the assembly of protein-protein complexes is enabled and probably promoted by protein disorder.


Assuntos
Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Motivos de Aminoácidos , Escherichia coli/metabolismo , Ligação Proteica , Saccharomyces cerevisiae/metabolismo
8.
Sci Rep ; 7: 45494, 2017 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-28382934

RESUMO

Combining genome-wide mapping of SNP-rich regions in schizophrenics and gene expression data in all brain compartments across the human life span revealed that genes with promoters most frequently mutated in schizophrenia are expression hubs interacting with far more genes than the rest of the genome. We summed up the differentially methylated "expression neighbors" of genes that fall into one of 108 distinct schizophrenia-associated loci with high number of SNPs. Surprisingly, the number of expression neighbors of the genes in these loci were 35 times higher for the positively correlating genes (32 times higher for the negatively correlating ones) than for the rest of the ~16000 genes. While the genes in the 108 loci have little known impact in schizophrenia, we identified many more known schizophrenia-related important genes with a high degree of connectedness (e.g. MOBP, SYNGR1 and DGCR6), validating our approach. Both the most connected positive and negative hubs affected synapse-related genes the most, supporting the synaptic origin of schizophrenia. At least half of the top genes in both the correlating and anti-correlating categories are cancer-related, including oncogenes (RRAS and ALDOA), providing further insight into the observed inverse relationship between the two diseases.


Assuntos
Proteínas da Mielina/metabolismo , Polimorfismo de Nucleotídeo Único , Esquizofrenia/patologia , Sinapses/metabolismo , Encéfalo/metabolismo , Metilação de DNA , Proteínas da Matriz Extracelular/genética , Proteínas da Matriz Extracelular/metabolismo , Redes Reguladoras de Genes/genética , Loci Gênicos , Humanos , Proteínas da Mielina/genética , Proteínas Nucleares , Regiões Promotoras Genéticas , Esquizofrenia/genética , Sinaptogirinas/genética , Sinaptogirinas/metabolismo
9.
Sci Rep ; 5: 9165, 2015 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-25772493

RESUMO

G-quadruplexes are guanine-rich nucleic acid sequences capable of forming a four-stranded structure through Hoogsteen hydrogen bonding. G-quadruplexes are highly concentrated near promoters and transcription start sites suggesting a role in gene regulation. They are less often found on the template than non-template strand where they either inhibit or enhance transcription, respectively. However, their potential role in enhancers and other distal regulatory elements has not been assessed yet. Here we show that DNAse hypersensitive (DHS) cis-regulatory elements are also enriched in Gs and their G-content correlate with that of their respective promoters. Besides local G4s, the distal cis regions may form G-quadruplexes together with the promoters, each contributing half a G4. This model is supported more for the non-template strand and we hypothesised that the G4 forming capability of the promoter and the enhancer non-template strand could facilitate their binding together and making the DHS regions accessible for the transcription factory.


Assuntos
Elementos Facilitadores Genéticos , Quadruplex G , Regiões Promotoras Genéticas , Composição de Bases , Sítios de Ligação , Guanina , Humanos , Modelos Biológicos , Ligação Proteica , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo
10.
Biol Direct ; 10: 59, 2015 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-26450699

RESUMO

BACKGROUND: While hundreds of genes have been implicated already in the etiology of schizophrenia, the exact cause is not known or the disease is considered multigenic in origin. Recent discoveries of new types of RNAs and the gradual elimination of the "junk DNA" hypothesis refocused the attention on the noncoding part of the human genome. Here we re-analyzed a recent dataset of differentially methylated genes from schizophrenic patients and cross-tabulated them with cis regulatory and repetitive elements and microRNAs known to be involved in schizophrenia. RESULTS: We found that the number of schizophrenia-related (SZ) microRNA targets follows a scale-free distribution with several microRNA hubs and that schizophrenia-related microRNAs with shared targets form a small-world network. The top ten microRNAs with the highest number of SZ gene targets regulate approximately 80 % of all microRNA-regulated genes whereas the top two microRNAs regulate 40-52 % of all such genes. We also found that genes that are regulated by the same microRNAs tend to have more protein-protein interactions than randomly selected schizophrenia genes. This highlights the role microRNAs possibly play in coordinating the abundance of interacting proteins, an important function that has not been sufficiently explored before. The analysis revealed that GABBR1 is regulated by both of the top two microRNAs and acts as a hub by interacting with many schizophrenia-related genes and sharing several types of transcription-binding sites with its interactors. We also found that differentially methylated repetitive elements are significantly more methylated in schizophrenia, pointing out their potential role in the disease. CONCLUSIONS: We find that GABBR1 has a central importance in schizophrenia, even if no direct cause and effect have been shown for it for the time. In addition to being a hub in microRNA-derived regulatory pathways and protein-protein interactions, its centrality is also supported by the high number of cis regulatory elements and transcription factor-binding sites that regulate its transcription. These findings are in line with several genome-wide association studies that repeatedly find the major histocompatibility region (where GABBR1 is located) to have the highest number of single nucleotide polymorphisms in schizophrenics. Our model also offers an explanation for the downregulation of protein kinase B, another consistent finding in schizophrenic patients. Our observations support the notion that microRNAs fine-tune the amount of proteins acting in the same biological pathways in schizophrenia, giving further support to the emerging theory of competing endogenous RNAs.


Assuntos
Metilação de DNA , MicroRNAs/genética , Esquizofrenia/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Proteínas Proto-Oncogênicas c-akt/genética , Proteínas Proto-Oncogênicas c-akt/metabolismo , Receptores de GABA-B/genética , Receptores de GABA-B/metabolismo
11.
Proteins ; 47(2): 126-41, 2002 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-11933060

RESUMO

We conducted a structural genomics analysis of the folds and structural superfamilies in the first 20 completely sequenced genomes by focusing on the patterns of fold usage and trying to identify structural characteristics of typical and atypical folds. We assigned folds to sequences using PSI-blast, run with a systematic protocol to reduce the amount of computational overhead. On average, folds could be assigned to about a fourth of the ORFs in the genomes and about a fifth of the amino acids in the proteomes. More than 80% of all the folds in the SCOP structural classification were identified in one of the 20 organisms, with worm and E. coli having the largest number of distinct folds. Folds are particularly effective at comprehensively measuring levels of gene duplication, because they group together even very remote homologues. Using folds, we find the average level of duplication varies depending on the complexity of the organism, ranging from 2.4 in M. genitalium to 32 for the worm, values significantly higher than those observed based purely on sequence similarity. We rank the common folds in the 20 organisms, finding that the top three are the P-loop NTP hydrolase, the ferrodoxin fold, and the TIM-barrel, and discuss in detail the many factors that affect and bias these rankings. We also identify atypical folds that are "unique" to one of the organisms in our study and compare the characteristics of these folds with the most common ones. We find that common folds tend be more multifunctional and associated with more regular, "symmetrical" structures than the unique ones. In addition, many of the unique folds are associated with proteins involved in cell defense (e.g., toxins). We analyze specific patterns of fold occurrence in the genomes by associating some of them with instances of horizontal transfer and others with gene loss. In particular, we find three possible examples of transfer between archaea and bacteria and six between eukarya and bacteria. We make available our detailed results at http://genecensus.org/20.


Assuntos
Genômica/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Animais , Sequência Conservada , Evolução Molecular , Deleção de Genes , Duplicação Gênica , Transferência Genética Horizontal , Fases de Leitura Aberta , Filogenia , Dobramento de Proteína , Estrutura Secundária de Proteína , Proteoma/análise
12.
Proteins ; 56(2): 188-200, 2004 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-15211504

RESUMO

A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of "all families" on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes.


Assuntos
Genômica , Conformação Proteica , Estrutura Terciária de Proteína/genética , Proteínas/genética , Algoritmos , Proteínas de Arabidopsis/genética , Proteínas de Caenorhabditis elegans/genética , Bases de Dados de Proteínas , Proteínas de Drosophila/genética , Estudos de Viabilidade , Humanos , Internet , Família Multigênica , National Institutes of Health (U.S.) , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/genética , Projetos Piloto , Dobramento de Proteína , Proteínas/classificação , Proteômica , Proteínas de Saccharomyces cerevisiae/genética , Homologia de Sequência , Estados Unidos
13.
Pharmacogenomics ; 3(3): 393-402, 2002 May.
Artigo em Inglês | MEDLINE | ID: mdl-12052146

RESUMO

SNPs are useful for genome-wide mapping and the study of disease genes. Previous studies have focused on SNPs in specific genes or SNPs pooled from a variety of different sources. Here, a systematic approach to the analysis of SNPs in relation to various features on a genome-wide scale, with emphasis on protein features and pseudogenes, is presented. We have performed a comprehensive analysis of 39,408 SNPs on human chromosomes 21 and 22 from the SNP consortium (TSC) database, where SNPs are obtained by random sequencing using consistent and uniform methods. Our study indicates that the occurrence of SNPs is lowest in exons and higher in repeats, introns and pseudogenes. Moreover, in comparing genes and pseudogenes, we find that the SNP density is higher in pseudogenes and the ratio of nonsynonymous to synonymous changes is also much higher. These observations may be explained by the increased rate of SNP accumulation in pseudogenes, which presumably are not under selective pressure. We have also performed secondary structure prediction on all coding regions and found that there is no preferential distribution of SNPs in a -helices, b -sheets or coils. This could imply that protein structures, in general, can tolerate a wide degree of substitutions. Tables relating to our results are available from http://genecensus.org/pseudogene.


Assuntos
Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Pseudogenes/genética , Algoritmos , Bases de Dados como Assunto , Éxons/genética , Humanos , Estrutura Secundária de Proteína , Proteínas/química , Homologia de Sequência de Aminoácidos
14.
Biol Direct ; 8: 5, 2013 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-23391219

RESUMO

Schizophrenia is a complex disease with uncertain aetiology. We suggest GABBR1, GABA receptor B1 implicated in schizophrenia based on a HERV-W LTR in the regulatory region of GABBR1. Our hypothesis is supported by: (i) GABBR1 is in the 6p22 genomic region most often implicated in schizophrenia; (ii) microarray studies found that only presynaptic pathway-related genes, including GABA receptors, have altered expression in schizophrenic patients and (iii) it explains how HERV-W elements, expressed in schizophrenia, play a role in the disease: by altering the expression of GABBR1 via a long terminal repeat that is also a regulatory element to GABBR1.


Assuntos
Retrovirus Endógenos/genética , Produtos do Gene env/genética , Proteínas da Gravidez/genética , Receptores de GABA-B/genética , Esquizofrenia/genética , Esquizofrenia/virologia , Retrovirus Endógenos/metabolismo , Produtos do Gene env/metabolismo , Humanos , Proteínas da Gravidez/metabolismo , Análise Serial de Proteínas , Receptores de GABA-B/metabolismo , Sequências Repetidas Terminais
15.
Mol Biosyst ; 8(1): 229-36, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22105808

RESUMO

Intrinsic protein disorder has been studied with respect to the chromosomal location of each protein in the human proteome and also in other fully sequenced organisms. We found that in all studied mammalian species the sex chromosome-coded proteins were significantly more disordered than the autosome-coded ones, the strongest discrepancy being observed in humans. In explaining this phenomenon we analyzed local chromosomal features and found that (1) the autosomes have a stronger correlation between the GC content of the transcripts and the structural disorder of the coded proteins than the sex chromosomes; (2) the neighbors' protein disorder correlates the strongest on the sex chromosomes; (3) the GO functions on chromosome X are somewhat biased towards functions with higher disorder but do not account for the entire phenomenon; (4) the protein-protein interactions show a non-random chromosomal distribution, the Y chromosome-coded proteins having the lowest overall frequency for interactions but the largest bias towards intra-chromosomal interactions. Tissue-specific distributions showed the most protein disorder for sex-chromosome coded proteins expressed in the testis and the ovary. We raise the possibility that the high disorder of X- and Y-encoded proteins facilitates the fast evolution of testis- and cancer-specific antigenic protein clusters on these chromosomes, in relation to their immunogenic properties and likely contribution to speciation.


Assuntos
Cromossomos Humanos/metabolismo , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Cromossomos Sexuais/metabolismo , Animais , Análise por Conglomerados , Bases de Dados de Proteínas , Humanos , Camundongos , Especificidade de Órgãos , Ligação Proteica , Conformação Proteica , Mapas de Interação de Proteínas , Proteoma/metabolismo , Sintenia/genética
16.
Genome Biol ; 12(12): R120, 2011 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-22182830

RESUMO

BACKGROUND: Sequencing the genomes of the first few eukaryotes created the impression that gene number shows no correlation with organism complexity, often referred to as the G-value paradox. Several attempts have previously been made to resolve this paradox, citing multifunctionality of proteins, alternative splicing, microRNAs or non-coding DNA. As intrinsic protein disorder has been linked with complex responses to environmental stimuli and communication between cells, an additional possibility is that structural disorder may effectively increase the complexity of species. RESULTS: We revisited the G-value paradox by analyzing many new proteomes whose complexity measured with their number of distinct cell types is known. We found that complexity and proteome size measured by the total number of amino acids correlate significantly and have a power function relationship. We systematically analyzed numerous other features in relation to complexity in several organisms and tissues and found: the fraction of protein structural disorder increases significantly between prokaryotes and eukaryotes but does not further increase over the course of evolution; the number of predicted binding sites in disordered regions in a proteome increases with complexity; the fraction of protein disorder, predicted binding sites, alternative splicing and protein-protein interactions all increase with the complexity of human tissues. CONCLUSIONS: We conclude that complexity is a multi-parametric trait, determined by interaction potential, alternative splicing capacity, tissue-specific protein disorder and, above all, proteome size. The G-value paradox is only apparent when plants are grouped with metazoans, as they have a different relationship between complexity and proteome size.


Assuntos
Bactérias/genética , Eucariotos/genética , Plantas/genética , Proteoma/genética , Deficiências na Proteostase/genética , Algoritmos , Processamento Alternativo , Aminoácidos/genética , Sítios de Ligação , Evolução Biológica , Biologia Computacional , Bases de Dados de Proteínas , Tamanho do Genoma , Humanos , Dobramento de Proteína , Mapas de Interação de Proteínas
17.
Emerg Infect Dis ; 13(1): 73-81, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17370518

RESUMO

To facilitate rapid, unbiased, differential diagnosis of infectious diseases, we designed GreeneChipPm, a panmicrobial microarray comprising 29,455 sixty-mer oligonucleotide probes for vertebrate viruses, bacteria, fungi, and parasites. Methods for nucleic acid preparation, random primed PCR amplification, and labeling were optimized to allow the sensitivity required for application with nucleic acid extracted from clinical materials and cultured isolates. Analysis of nasopharyngeal aspirates, blood, urine, and tissue from persons with various infectious diseases confirmed the presence of viruses and bacteria identified by other methods, and implicated Plasmodium falciparum in an unexplained fatal case of hemorrhagic feverlike disease during the Marburg hemorrhagic fever outbreak in Angola in 2004-2005.


Assuntos
Doenças Transmissíveis/diagnóstico , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Viroses/diagnóstico , Doenças Transmissíveis/virologia , Surtos de Doenças , Evolução Fatal , Humanos , Malária Falciparum/diagnóstico , Filogenia , Sensibilidade e Especificidade , Viroses/virologia
18.
Proc Natl Acad Sci U S A ; 104(13): 5495-500, 2007 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-17372197

RESUMO

Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.


Assuntos
Processamento Alternativo , Precursores de RNA , Bases de Dados Genéticas , Regulação da Expressão Gênica , Genoma Humano , Humanos , Internet , Modelos Moleculares , Conformação Proteica , Isoformas de Proteínas , Sinais Direcionadores de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Splicing de RNA
19.
Genome Res ; 12(2): 272-80, 2002 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-11827946

RESUMO

We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the former are reverse transcribed from mRNA (and therefore have no intron structure), whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e., with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centers. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 nonprocessed pseudogenes, and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene or http://genecensus.org/pseudogene.) By extrapolation, we predict that there could be up to approximately 20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, indicating the existence of pseudogenic "hot-spots" in the genome. We have looked at the distribution of InterPro families and Gene Ontology (GO) functional categories in our pseudogenes. Overall, the families in both processed and nonprocessed pseudogene populations occur according to a similar power-law distribution as that found for the occurrence of gene families, with a few big families and many small ones. The processed population is, in particular, enriched in highly expressed ribosomal-protein sequences (approximately 20%), which appear fairly evenly distributed across the chromosomes. We compared processed pseudogenes of different evolutionary ages, observing a high degree of similarity between "ancient" and "modern" subpopulations. This may be attributable to the consistently high expression of ribosomal proteins over evolutionary time. Finally, we find that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.


Assuntos
Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , Fósseis , Genoma Humano , Pseudogenes , Mapeamento Cromossômico/métodos , Evolução Molecular , Genes de Imunoglobulinas , Homologia de Genes , Humanos , Família Multigênica , Processamento Pós-Transcricional do RNA/genética , Análise de Sequência de DNA/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA