RESUMO
Large-scale genome sequencing is poised to provide a substantial increase in the rate of discovery of disease-associated mutations, but the functional interpretation of such mutations remains challenging. Here we show that deletions of a sequence on human chromosome 16 that we term the intestine-critical region (ICR) cause intractable congenital diarrhoea in infants1,2. Reporter assays in transgenic mice show that the ICR contains a regulatory sequence that activates transcription during the development of the gastrointestinal system. Targeted deletion of the ICR in mice caused symptoms that recapitulated the human condition. Transcriptome analysis revealed that an unannotated open reading frame (Percc1) flanks the regulatory sequence, and the expression of this gene was lost in the developing gut of mice that lacked the ICR. Percc1-knockout mice displayed phenotypes similar to those observed upon ICR deletion in mice and patients, whereas an ICR-driven Percc1 transgene was sufficient to rescue the phenotypes found in mice that lacked the ICR. Together, our results identify a gene that is critical for intestinal function and underscore the need for targeted in vivo studies to interpret the growing number of clinical genetic findings that do not affect known protein-coding genes.
Assuntos
Diarreia/congênito , Diarreia/genética , Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica no Desenvolvimento , Genes , Intestinos/fisiologia , Deleção de Sequência/genética , Animais , Cromossomos Humanos Par 16/genética , Modelos Animais de Doenças , Feminino , Genes Reporter , Loci Gênicos/genética , Humanos , Masculino , Camundongos , Camundongos Knockout , Camundongos Transgênicos , Linhagem , Fenótipo , Ativação Transcricional , Transcriptoma/genética , Transgenes/genéticaRESUMO
Circulating inflammatory markers are essential to human health and disease, and they are often dysregulated or malfunctioning in cancers as well as in cardiovascular, metabolic, immunologic and neuropsychiatric disorders. However, the genetic contribution to the physiological variation of levels of circulating inflammatory markers is largely unknown. Here we report the results of a genome-wide genetic study of blood concentration of ten cytokines, including the hitherto unexplored calcium-binding protein (S100B). The study leverages a unique sample of neonatal blood spots from 9,459 Danish subjects from the iPSYCH initiative. We estimate the SNP-heritability of marker levels as ranging from essentially zero for Erythropoietin (EPO) up to 73% for S100B. We identify and replicate 16 associated genomic regions (p < 5 x 10-9), of which four are novel. We show that the associated variants map to enhancer elements, suggesting a possible transcriptional effect of genomic variants on the cytokine levels. The identification of the genetic architecture underlying the basic levels of cytokines is likely to prompt studies investigating the relationship between cytokines and complex disease. Our results also suggest that the genetic architecture of cytokines is stable from neonatal to adult life.
Assuntos
Citocinas/genética , Inflamação/diagnóstico , Locos de Características Quantitativas , Biomarcadores/sangue , Estudos de Coortes , Citocinas/sangue , Citocinas/imunologia , Dinamarca , Elementos Facilitadores Genéticos/genética , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Recém-Nascido , Inflamação/sangue , Inflamação/imunologia , Masculino , Polimorfismo de Nucleotídeo Único , Subunidade beta da Proteína Ligante de Cálcio S100/sangue , Subunidade beta da Proteína Ligante de Cálcio S100/genética , Subunidade beta da Proteína Ligante de Cálcio S100/imunologiaRESUMO
A widespread dogma asserts that life could not have emerged without biopolymers - RNA and proteins. However, the widely acknowledged implausibility of a spontaneous appearance and proliferation of these complex molecules in primordial messy chemistry casts doubt on this scenario. A proposed alternative is "Lipid-First", based on the evidence that lipid assemblies may spontaneously emerge in heterogeneous environments, and are shown to undergo growth and fission, and to portray autocatalytic self-copying. What seems undecided is whether lipid assemblies have protein-like capacities for stereospecific interactions, a sine qua non of life processes. This Viewpoint aims to alleviate such doubts, pointing to growing experimental evidence that lipid aggregates possess dynamic surface configurations capable of stereospecific molecular recognition. Such findings help support a possible key role of lipids in seeding life's origin.
Assuntos
Lipídeos , Origem da Vida , Oligonucleotídeos , Proteínas , RNARESUMO
BACKGROUND: Olfactory receptors (ORs) are G protein-coupled receptors with a crucial role in odor detection. A typical mammalian genome harbors ~ 1000 OR genes and pseudogenes; however, different gene duplication/deletion events have occurred in each species, resulting in complex orthology relationships. While the human OR nomenclature is widely accepted and based on phylogenetic classification into 18 families and further into subfamilies, for other mammals different and multiple nomenclature systems are currently in use, thus concealing important evolutionary and functional insights. RESULTS: Here, we describe the Mutual Maximum Similarity (MMS) algorithm, a systematic classifier for assigning a human-centric nomenclature to any OR gene based on inter-species hierarchical pairwise similarities. MMS was applied to the OR repertoires of seven mammals and zebrafish. Altogether, we assigned symbols to 10,249 ORs. This nomenclature is supported by both phylogenetic and synteny analyses. The availability of a unified nomenclature provides a framework for diverse studies, where textual symbol comparison allows immediate identification of potential ortholog groups as well as species-specific expansions/deletions; for example, Or52e5 and Or52e5b represent a rat-specific duplication of OR52E5. Another example is the complete absence of OR subfamily OR6Z among primate OR symbols. In other mammals, OR6Z members are located in one genomic cluster, suggesting a large deletion in the great ape lineage. An additional 14 mammalian OR subfamilies are missing from the primate genomes. While in chimpanzee 87% of the symbols were identical to human symbols, this number decreased to ~ 50% in dog and cow and to ~ 30% in rodents, reflecting the adaptive changes of the OR gene superfamily across diverse ecological niches. Application of the proposed nomenclature to zebrafish revealed similarity to mammalian ORs that could not be detected from the current zebrafish olfactory receptor gene nomenclature. CONCLUSIONS: We have consolidated a unified standard nomenclature system for the vertebrate OR superfamily. The new nomenclature system will be applied to cow, horse, dog and chimpanzee by the Vertebrate Gene Nomenclature Committee and its implementation is currently under consideration by other relevant species-specific nomenclature committees.
Assuntos
Algoritmos , Receptores Odorantes , Terminologia como Assunto , Vertebrados , Animais , Bovinos , Cães , Genoma , Cavalos , Humanos , Pan troglodytes , Filogenia , Ratos , Receptores Odorantes/genética , Especificidade da Espécie , Sintenia , Vertebrados/genética , Peixe-ZebraRESUMO
The MalaCards human disease database (http://www.malacards.org/) is an integrated compendium of annotated diseases mined from 68 data sources. MalaCards has a web card for each of â¼20 000 disease entries, in six global categories. It portrays a broad array of annotation topics in 15 sections, including Summaries, Symptoms, Anatomical Context, Drugs, Genetic Tests, Variations and Publications. The Aliases and Classifications section reflects an algorithm for disease name integration across often-conflicting sources, providing effective annotation consolidation. A central feature is a balanced Genes section, with scores reflecting the strength of disease-gene associations. This is accompanied by other gene-related disease information such as pathways, mouse phenotypes and GO-terms, stemming from MalaCards' affiliation with the GeneCards Suite of databases. MalaCards' capacity to inter-link information from complementary sources, along with its elaborate search function, relational database infrastructure and convenient data dumps, allows it to tackle its rich disease annotation landscape, and facilitates systems analyses and genome sequence interpretation. MalaCards adopts a 'flat' disease-card approach, but each card is mapped to popular hierarchical ontologies (e.g. International Classification of Diseases, Human Phenotype Ontology and Unified Medical Language System) and also contains information about multi-level relations among diseases, thereby providing an optimal tool for disease representation and scrutiny.
Assuntos
Biologia Computacional , Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Variação Genética , Genômica/métodos , Humanos , Anotação de Sequência Molecular , NavegadorRESUMO
Pemphigus vulgaris (PV) is a life-threatening autoimmune mucocutaneous blistering disease caused by disruption of intercellular adhesion due to auto-antibodies directed against epithelial components. Treatment is limited to immunosuppressive agents, which are associated with serious adverse effects. The propensity to develop the disease is in part genetically determined. We therefore reasoned that the delineation of PV genetic basis may point to novel therapeutic strategies. Using a genome-wide association approach, we recently found that genetic variants in the vicinity of the ST18 gene confer a significant risk for the disease. Here, using targeted deep sequencing, we identified a PV-associated variant residing within the ST18 promoter region (p<0.0002; odds ratio = 2.03). This variant was found to drive increased gene transcription in a p53/p63-dependent manner, which may explain the fact that ST18 is up-regulated in the skin of PV patients. We then discovered that when overexpressed, ST18 stimulates PV serum-induced secretion of key inflammatory molecules and contributes to PV serum-induced disruption of keratinocyte cell-cell adhesion, two processes previously implicated in the pathogenesis of PV. Thus, the present findings indicate that ST18 may play a direct role in PV and consequently represents a potential target for the treatment of this disease.
Assuntos
Pênfigo/genética , Regiões Promotoras Genéticas/genética , Proteínas Repressoras/genética , Autoanticorpos/genética , Autoanticorpos/imunologia , Citocinas/genética , Citocinas/metabolismo , Feminino , Variação Genética , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Imunossupressores/efeitos adversos , Queratinócitos/metabolismo , Queratinócitos/patologia , Masculino , Linhagem , Pênfigo/sangue , Pênfigo/imunologia , Pênfigo/terapia , Polimorfismo de Nucleotídeo Único , Proteínas Repressoras/sangue , Fatores de Risco , Pele/metabolismo , Pele/patologiaRESUMO
BACKGROUND: A key challenge in the realm of human disease research is next generation sequencing (NGS) interpretation, whereby identified filtered variant-harboring genes are associated with a patient's disease phenotypes. This necessitates bioinformatics tools linked to comprehensive knowledgebases. The GeneCards suite databases, which include GeneCards (human genes), MalaCards (human diseases) and PathCards (human pathways) together with additional tools, are presented with the focus on MalaCards utility for NGS interpretation as well as for large scale bioinformatic analyses. RESULTS: VarElect, our NGS interpretation tool, leverages the broad information in the GeneCards suite databases. MalaCards algorithms unify disease-related terms and annotations from 69 sources. Further, MalaCards defines hierarchical relatedness-aliases, disease families, a related diseases network, categories and ontological classifications. GeneCards and MalaCards delineate and share a multi-tiered, scored gene-disease network, with stringency levels, including the definition of elite status-high quality gene-disease pairs, coming from manually curated trustworthy sources, that includes 4500 genes for 8000 diseases. This unique resource is key to NGS interpretation by VarElect. VarElect, a comprehensive search tool that helps infer both direct and indirect links between genes and user-supplied disease/phenotype terms, is robustly strengthened by the information found in MalaCards. The indirect mode benefits from GeneCards' diverse gene-to-gene relationships, including SuperPaths-integrated biological pathways from 12 information sources. We are currently adding an important information layer in the form of "disease SuperPaths", generated from the gene-disease matrix by an algorithm similar to that previously employed for biological pathway unification. This allows the discovery of novel gene-disease and disease-disease relationships. The advent of whole genome sequencing necessitates capacities to go beyond protein coding genes. GeneCards is highly useful in this respect, as it also addresses 101,976 non-protein-coding RNA genes. In a more recent development, we are currently adding an inclusive map of regulatory elements and their inferred target genes, generated by integration from 4 resources. CONCLUSIONS: MalaCards provides a rich big-data scaffold for in silico biomedical discovery within the gene-disease universe. VarElect, which depends significantly on both GeneCards and MalaCards power, is a potent tool for supporting the interpretation of wet-lab experiments, notably NGS analyses of disease. The GeneCards suite has thus transcended its 2-decade role in biomedical research, maturing into a key player in clinical investigation.
Assuntos
Biologia Computacional/métodos , Doença/genética , Sequenciamento de Nucleotídeos em Larga Escala , Bases de Dados Genéticas , Genômica , Humanos , FenótipoRESUMO
BACKGROUND: Olfaction is a versatile sensory mechanism for detecting thousands of volatile odorants. Although molecular basis of odorant signaling is relatively well understood considerable gaps remain in the complete charting of all relevant gene products. To address this challenge, we applied RNAseq to four well-characterized human olfactory epithelial samples and compared the results to novel and published mouse olfactory epithelium as well as 16 human control tissues. RESULTS: We identified 194 non-olfactory receptor (OR) genes that are overexpressed in human olfactory tissues vs. CONTROLS: The highest overexpression is seen for lipocalins and bactericidal/permeability-increasing (BPI)-fold proteins, which in other species include secreted odorant carriers. Mouse-human discordance in orthologous lipocalin expression suggests different mammalian evolutionary paths in this family. Of the overexpressed genes 36 have documented olfactory function while for 158 there is little or no previous such functional evidence. The latter group includes GPCRs, neuropeptides, solute carriers, transcription factors and biotransformation enzymes. Many of them may be indirectly implicated in sensory function, and ~70 % are over expressed also in mouse olfactory epithelium, corroborating their olfactory role. Nearly 90 % of the intact OR repertoire, and ~60 % of the OR pseudogenes are expressed in the olfactory epithelium, with the latter showing a 3-fold lower expression. ORs transcription levels show a 1000-fold inter-paralog variation, as well as significant inter-individual differences. We assembled 160 transcripts representing 100 intact OR genes. These include 1-4 short 5' non-coding exons with considerable alternative splicing and long last exons that contain the coding region and 3' untranslated region of highly variable length. Notably, we identified 10 ORs with an intact open reading frame but with seemingly non-functional transcripts, suggesting a yet unreported OR pseudogenization mechanism. Analysis of the OR upstream regions indicated an enrichment of the homeobox family transcription factor binding sites and a consensus localization of a specific transcription factor binding site subfamily (Olf/EBF). CONCLUSIONS: We provide an overview of expression levels of ORs and auxiliary genes in human olfactory epithelium. This forms a transcriptomic view of the entire OR repertoire, and reveals a large number of over-expressed uncharacterized human non-receptor genes, providing a platform for future discovery.
Assuntos
Lipocalinas/genética , Mucosa Olfatória/metabolismo , RNA Mensageiro/genética , Receptores Odorantes/genética , Olfato/genética , Transcriptoma , Animais , Autoantígenos/genética , Autoantígenos/metabolismo , Proteínas de Ligação a Ácido Graxo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Lipocalinas/classificação , Lipocalinas/metabolismo , Proteínas de Membrana Transportadoras/genética , Proteínas de Membrana Transportadoras/metabolismo , Camundongos , Neuropeptídeos/genética , Neuropeptídeos/metabolismo , Filogenia , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas/genética , Proteínas/metabolismo , Pseudogenes , RNA Mensageiro/metabolismo , Receptores Odorantes/metabolismo , Transdução de Sinais , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
BACKGROUND: Next generation sequencing (NGS) provides a key technology for deciphering the genetic underpinnings of human diseases. Typical NGS analyses of a patient depict tens of thousands non-reference coding variants, but only one or very few are expected to be significant for the relevant disorder. In a filtering stage, one employs family segregation, rarity in the population, predicted protein impact and evolutionary conservation as a means for shortening the variation list. However, narrowing down further towards culprit disease genes usually entails laborious seeking of gene-phenotype relationships, consulting numerous separate databases. Thus, a major challenge is to transition from the few hundred shortlisted genes to the most viable disease-causing candidates. RESULTS: We describe a novel tool, VarElect ( http://ve.genecards.org ), a comprehensive phenotype-dependent variant/gene prioritizer, based on the widely-used GeneCards, which helps rapidly identify causal mutations with extensive evidence. The GeneCards suite offers an effective and speedy alternative, whereby >120 gene-centric automatically-mined data sources are jointly available for the task. VarElect cashes on this wealth of information, as well as on GeneCards' powerful free-text Boolean search and scoring capabilities, proficiently matching variant-containing genes to submitted disease/symptom keywords. The tool also leverages the rich disease and pathway information of MalaCards, the human disease database, and PathCards, the unified pathway (SuperPaths) database, both within the GeneCards Suite. The VarElect algorithm infers direct as well as indirect links between genes and phenotypes, the latter benefitting from GeneCards' diverse gene-to-gene data links in GenesLikeMe. Finally, our tool offers an extensive gene-phenotype evidence portrayal ("MiniCards") and hyperlinks to the parent databases. CONCLUSIONS: We demonstrate that VarElect compares favorably with several often-used NGS phenotyping tools, thus providing a robust facility for ranking genes, pointing out their likelihood to be related to a patient's disease. VarElect's capacity to automatically process numerous NGS cases, either in stand-alone format or in VCF-analyzer mode (TGex and VarAnnot), is indispensable for emerging clinical projects that involve thousands of whole exome/genome NGS analyses.
Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Mineração de Dados , Bases de Dados Genéticas , Genoma Humano , Humanos , FenótipoRESUMO
The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.
Assuntos
Tentilhões/genética , Genoma/genética , Regiões 3' não Traduzidas/genética , Animais , Percepção Auditiva/genética , Encéfalo/fisiologia , Galinhas/genética , Evolução Molecular , Feminino , Tentilhões/fisiologia , Duplicação Gênica , Redes Reguladoras de Genes/genética , Masculino , MicroRNAs/genética , Modelos Animais , Família Multigênica/genética , Retroelementos/genética , Cromossomos Sexuais/genética , Sequências Repetidas Terminais/genética , Transcrição Gênica/genética , Vocalização Animal/fisiologiaRESUMO
We studied five individuals from three Jewish Bukharian families affected by an apparently autosomal-recessive form of hereditary spastic paraparesis accompanied by severe intellectual disability, fluctuating central hypoventilation, gastresophageal reflux disease, wake apnea, areflexia, and unique dysmorphic features. Exome sequencing identified one homozygous variant shared among all affected individuals and absent in controls: a 1 bp frameshift TECPR2 deletion leading to a premature stop codon and predicting significant degradation of the protein. TECPR2 has been reported as a positive regulator of autophagy. We thus examined the autophagy-related fate of two key autophagic proteins, SQSTM1 (p62) and MAP1LC3B (LC3), in skin fibroblasts of an affected individual, as compared to a healthy control, and found that both protein levels were decreased and that there was a more pronounced decrease in the lipidated form of LC3 (LC3II). siRNA knockdown of TECPR2 showed similar changes, consistent with aberrant autophagy. Our results are strengthened by the fact that autophagy dysfunction has been implicated in a number of other neurodegenerative diseases. The discovered TECPR2 mutation implicates autophagy, a central intracellular mechanism, in spastic paraparesis.
Assuntos
Autofagia/genética , Proteínas de Transporte/genética , Mutação , Proteínas do Tecido Nervoso/genética , Paraparesia Espástica/genética , Encéfalo/patologia , Éxons , Feminino , Fibroblastos/metabolismo , Fibroblastos/ultraestrutura , Genótipo , Células HeLa , Humanos , Judeus/genética , Imageamento por Ressonância Magnética , Masculino , Neuroimagem , Paraparesia Espástica/diagnóstico , Paraparesia Espástica/metabolismo , Linhagem , Fenótipo , Análise de Sequência de DNARESUMO
PURPOSE: Despite the recognized clinical value of exome-based diagnostics, methods for comprehensive genomic interpretation remain immature. Diagnoses are based on known or presumed pathogenic variants in genes already associated with a similar phenotype. Here, we extend this paradigm by evaluating novel bioinformatics approaches to aid identification of new gene-disease associations. METHODS: We analyzed 119 trios to identify both diagnostic genotypes in known genes and candidate genotypes in novel genes. We considered qualifying genotypes based on their population frequency and in silico predicted effects we also characterized the patterns of genotypes enriched among this collection of patients. RESULTS: We obtained a genetic diagnosis for 29 (24%) of our patients. We showed that patients carried an excess of damaging de novo mutations in intolerant genes, particularly those shown to be essential in mice (P = 3.4 × 10(-8)). This enrichment is only partially explained by mutations found in known disease-causing genes. CONCLUSION: This work indicates that the application of appropriate bioinformatics analyses to clinical sequence data can also help implicate novel disease genes and suggest expanded phenotypes for known disease genes. These analyses further suggest that some cases resolved by whole-exome sequencing will have direct therapeutic implications.
Assuntos
Exoma , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Sequenciamento de Nucleotídeos em Larga Escala , Biologia Computacional/métodos , Feminino , Estudos de Associação Genética , Genômica/métodos , Genótipo , Humanos , Masculino , Mutação , FenótipoRESUMO
Diabetic nephropathy, as the most prevalent chronic disease of the kidney, has also become the primary cause of end-stage renal disease with the incidence of kidney disease in type 2 diabetics continuously rising. As with most chronic diseases, the pathophysiology is multifactorial with a number of deregulated molecular processes contributing to disease manifestation and progression. Current therapy mainly involves interfering in the renin-angiotensin-aldosterone system using angiotensin-converting enzyme inhibitors or angiotensin-receptor blockers. Better understanding of molecular processes deregulated in the early stages and progression of disease hold the key for development of novel therapeutics addressing this complex disease. With the advent of high-throughput omics technologies, researchers set out to systematically study the disease on a molecular level. Results of the first omics studies were mainly focused on reporting the highest deregulated molecules between diseased and healthy subjects with recent attempts to integrate findings of multiple studies on the level of molecular pathways and processes. In this review, we will outline key omics studies on the genome, transcriptome, proteome and metabolome level in the context of DN. We will also provide concepts on how to integrate findings of these individual studies (i) on the level of functional processes using the gene-ontology vocabulary, (ii) on the level of molecular pathways and (iii) on the level of phenotype molecular models constructed based on protein-protein interaction data.
Assuntos
Biomarcadores/análise , Nefropatias Diabéticas/diagnóstico , Doença Crônica , Nefropatias Diabéticas/metabolismo , Progressão da Doença , HumanosRESUMO
The Model Organism Protein Expression Database (MOPED, http://moped.proteinspire.org) is an expanding proteomics resource to enable biological and biomedical discoveries. MOPED aggregates simple, standardized and consistently processed summaries of protein expression and metadata from proteomics (mass spectrometry) experiments from human and model organisms (mouse, worm, and yeast). The latest version of MOPED adds new estimates of protein abundance and concentration as well as relative (differential) expression data. MOPED provides a new updated query interface that allows users to explore information by organism, tissue, localization, condition, experiment, or keyword. MOPED supports the Human Proteome Project's efforts to generate chromosome- and diseases-specific proteomes by providing links from proteins to chromosome and disease information as well as many complementary resources. MOPED supports a new omics metadata checklist to harmonize data integration, analysis, and use. MOPED's development is driven by the user community, which spans 90 countries and guides future development that will transform MOPED into a multiomics resource. MOPED encourages users to submit data in a simple format. They can use the metadata checklist to generate a data publication for this submission. As a result, MOPED will provide even greater insights into complex biological processes and systems and enable deeper and more comprehensive biological and biomedical discoveries.
Assuntos
Bases de Dados de Proteínas , Proteômica , Animais , Humanos , Interface Usuário-ComputadorRESUMO
BACKGROUND: The quasispecies model refers to information carriers that undergo self-replication with errors. A quasispecies is a steady-state population of biopolymer sequence variants generated by mutations from a master sequence. A quasispecies error threshold is a minimal replication accuracy below which the population structure breaks down. Theory and experimentation of this model often refer to biopolymers, e.g. RNA molecules or viral genomes, while its prebiotic context is often associated with an RNA world scenario. Here, we study the possibility that compositional entities which code for compositional information, intrinsically different from biopolymers coding for sequential information, could show quasispecies dynamics. RESULTS: We employed a chemistry-based model, graded autocatalysis replication domain (GARD), which simulates the network dynamics within compositional molecular assemblies. In GARD, a compotype represents a population of similar assemblies that constitute a quasi-stationary state in compositional space. A compotype's center-of-mass is found to be analogous to a master sequence for a sequential quasispecies. Using single-cycle GARD dynamics, we measured the quasispecies transition matrix (Q) for the probabilities of transition from one center-of-mass Euclidean distance to another. Similarly, the quasispecies' growth rate vector (A) was obtained. This allowed computing a steady state distribution of distances to the center of mass, as derived from the quasispecies equation. In parallel, a steady state distribution was obtained via the GARD equation kinetics. Rewardingly, a significant correlation was observed between the distributions obtained by these two methods. This was only seen for distances to the compotype center-of-mass, and not to randomly selected compositions. A similar correspondence was found when comparing the quasispecies time dependent dynamics towards steady state. Further, changing the error rate by modifying basal assembly joining rate of GARD kinetics was found to display an error catastrophe, similar to the standard quasispecies model. Additional augmentation of compositional mutations leads to the complete disappearance of the master-like composition. CONCLUSIONS: Our results show that compositional assemblies, as simulated by the GARD formalism, portray significant attributes of quasispecies dynamics. This expands the applicability of the quasispecies model beyond sequence-based entities, and potentially enhances validity of GARD as a model for prebiotic evolution.
Assuntos
Células/química , Simulação por Computador , Modelos Químicos , Mutação , RNA/química , RNA/genéticaRESUMO
MOTIVATION: Non-coding RNA (ncRNA) genes are increasingly acknowledged for their importance in the human genome. However, there is no comprehensive non-redundant database for all such human genes. RESULTS: We leveraged the effective platform of GeneCards, the human gene compendium, together with the power of fRNAdb and additional primary sources, to judiciously unify all ncRNA gene entries obtainable from 15 different primary sources. Overlapping entries were clustered to unified locations based on an algorithm employing genomic coordinates. This allowed GeneCards' gamut of relevant entries to rise â¼5-fold, resulting in â¼80,000 human non-redundant ncRNAs, belonging to 14 classes. Such 'grand unification' within a regularly updated data structure will assist future ncRNA research. AVAILABILITY AND IMPLEMENTATION: All of these non-coding RNAs are included among the â¼122,500 entries in GeneCards V3.09, along with pertinent annotation, automatically mined by its built-in pipeline from 100 data sources. This information is available at www.genecards.org. CONTACT: Frida.Belinky@weizmann.ac.il SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Bases de Dados Genéticas , RNA não Traduzido/genética , Algoritmos , Análise por Conglomerados , Genes , Genoma Humano , Genômica , Humanos , Internet , Anotação de Sequência MolecularRESUMO
Present life portrays a two-tier phenomenology: molecules compose supramolecular structures, such as cells or organisms, which in turn portray population behaviors, including selection, evolution and ecological dynamics. Prebiotic models have often focused on evolution in populations of self-replicating molecules, without explicitly invoking the intermediate molecular-to-supramolecular transition. Here, we explore a prebiotic model that allows one to relate parameters of chemical interaction networks within molecular assemblies to emergent population dynamics. We use the graded autocatalysis replication domain (GARD) model, which simulates the network dynamics within amphiphile-containing molecular assemblies, and exhibits quasi-stationary compositional states termed compotype species. These grow by catalyzed accretion, divide and propagate their compositional information to progeny in a replication-like manner. The model allows us to ask how molecular network parameters influence assembly evolution and population dynamics parameters. In 1000 computer simulations, each embodying different parameter set of the global chemical interaction network parameters, we observed a wide range of behaviors. These were analyzed by a multi species logistic model often used for analyzing population ecology (r-K or Lotka-Volterra competition model). We found that compotypes with a larger intrinsic molecular repertoire show a higher intrinsic growth (r) and lower carrying capacity (K), as well as lower replication fidelity. This supports a prebiotic scenario initiated by fast-replicating assemblies with a high molecular diversity, evolving into more faithful replicators with narrower molecular repertoires.
Assuntos
Simulação por Computador , Modelos Químicos , PrebióticosRESUMO
We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Assuntos
Evolução Molecular , Genoma/genética , Ornitorrinco/genética , Animais , Composição de Bases , Dentição , Feminino , Impressão Genômica/genética , Humanos , Imunidade/genética , Masculino , Mamíferos/genética , MicroRNAs/genética , Proteínas do Leite/genética , Filogenia , Ornitorrinco/imunologia , Ornitorrinco/fisiologia , Receptores Odorantes/genética , Sequências Repetitivas de Ácido Nucleico/genética , Répteis/genética , Análise de Sequência de DNA , Espermatozoides/metabolismo , Peçonhas/genética , Zona Pelúcida/metabolismoRESUMO
Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43,000 proteins with at least one spectral match and more than 11 million high certainty spectra.