RESUMO
The evolutionarily conserved minor spliceosome (MiS) is required for protein expression of â¼714 minor intron-containing genes (MIGs) crucial for cell-cycle regulation, DNA repair, and MAP-kinase signaling. We explored the role of MIGs and MiS in cancer, taking prostate cancer (PCa) as an exemplar. Both androgen receptor signaling and elevated levels of U6atac, a MiS small nuclear RNA, regulate MiS activity, which is highest in advanced metastatic PCa. siU6atac-mediated MiS inhibition in PCa in vitro model systems resulted in aberrant minor intron splicing leading to cell-cycle G1 arrest. Small interfering RNA knocking down U6atac was â¼50% more efficient in lowering tumor burden in models of advanced therapy-resistant PCa compared with standard antiandrogen therapy. In lethal PCa, siU6atac disrupted the splicing of a crucial lineage dependency factor, the RE1-silencing factor (REST). Taken together, we have nominated MiS as a vulnerability for lethal PCa and potentially other cancers.
Assuntos
Neoplasias de Próstata Resistentes à Castração , Neoplasias da Próstata , Masculino , Humanos , Íntrons/genética , Neoplasias da Próstata/metabolismo , Splicing de RNA/genética , Spliceossomos/metabolismo , Transdução de Sinais , Receptores Androgênicos/genética , Receptores Androgênicos/metabolismo , Linhagem Celular Tumoral , Neoplasias de Próstata Resistentes à Castração/genéticaRESUMO
Substance use disorders (SUD) and drug addiction are major threats to public health, impacting not only the millions of individuals struggling with SUD, but also surrounding families and communities. One of the seminal challenges in treating and studying addiction in human populations is the high prevalence of co-morbid conditions, including an increased risk of contracting a human immunodeficiency virus (HIV) infection. Of the ~15 million people who inject drugs globally, 17% are persons with HIV. Conversely, HIV is a risk factor for SUD because chronic pain syndromes, often encountered in persons with HIV, can lead to an increased use of opioid pain medications that in turn can increase the risk for opioid addiction. We hypothesize that SUD and HIV exert shared effects on brain cell types, including adaptations related to neuroplasticity, neurodegeneration, and neuroinflammation. Basic research is needed to refine our understanding of these affected cell types and adaptations. Studying the effects of SUD in the context of HIV at the single-cell level represents a compelling strategy to understand the reciprocal interactions among both conditions, made feasible by the availability of large, extensively-phenotyped human brain tissue collections that have been amassed by the Neuro-HIV research community. In addition, sophisticated animal models that have been developed for both conditions provide a means to precisely evaluate specific exposures and stages of disease. We propose that single-cell genomics is a uniquely powerful technology to characterize the effects of SUD and HIV in the brain, integrating data from human cohorts and animal models. We have formed the Single-Cell Opioid Responses in the Context of HIV (SCORCH) consortium to carry out this strategy.
RESUMO
Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue-residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.
Assuntos
Biologia Computacional/métodos , Genômica/métodos , Proteínas de Neoplasias/química , Proteínas de Neoplasias/genética , Neoplasias/genética , Bases de Dados Genéticas , Exoma/genética , Humanos , Mutação , Conformação Proteica , Reprodutibilidade dos Testes , Fluxo de TrabalhoRESUMO
Acetylserotonin O-methyltransferase (ASMT) is a key enzyme in the synthesis of melatonin. Although melatonin has been shown to exhibit anticancer activity and prevents endocrine resistance in breast cancer, the role of ASMT in breast cancer progression remains unclear. In this retrospective study, we analyzed gene expression profiles in 27 data sets on 7244 patients from 11 countries. We found that ASMT expression was significantly reduced in breast cancer tumors relative to healthy tissue. Among breast cancer patients, those with higher levels of ASMT expression had better relapse-free survival outcomes and longer metastasis-free survival times. Following treatment with tamoxifen, patients with greater ASMT expression experienced longer periods before relapse or distance recurrence. Motivated by these results, we devised an ASMT gene signature that can correctly identify low-risk cases with a sensitivity and specificity of 0.997 and 0.916, respectively. This signature was robustly validated using 23 independent breast cancer mRNA array data sets from different platforms (consisting of 5800 patients) and an RNAseq data set from TCGA (comprising 1096 patients). Intriguingly, patients who are classified as high-risk by the signature benefit from adjuvant chemotherapy, and those with grade II tumors who are classified as low-risk exhibit improved overall survival and distance relapse-free outcomes following endocrine therapy. Together, our findings more clearly elucidate the roles of ASMT, provide strategies for improving the efficacy of tamoxifen treatment and help to identify those patients who may maximally benefit from adjuvant or endocrine therapies.
Assuntos
Acetilserotonina O-Metiltransferasa/genética , Neoplasias da Mama/tratamento farmacológico , Análise de Sequência de RNA/métodos , Tamoxifeno/uso terapêutico , Regulação para Cima , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Bases de Dados Genéticas , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Gradação de Tumores , Análise de Sequência com Séries de Oligonucleotídeos , Estudos Retrospectivos , Análise de Sobrevida , Resultado do TratamentoRESUMO
Composite biomaterial scaffolds consisting of natural polymers and bioceramics may offer an alternative to autologous grafts for applications such as bone repair. Herein, we sought to investigate the possibility of incorporating marine coral microparticles into a collagen-based scaffold, a process which we hypothesised would enhance the mechanical properties of the scaffold as well its capacity to promote osteogenesis of human mesenchymal stromal cells. Cryomilling and sieving were utilised to achieve coral microparticles of mean diameters 14 µm and 64 µm which were separately incorporated into collagen-based slurries and freeze-dried to form porous scaffolds. X-ray diffraction and Fourier transform infrared spectroscopy determined the coral microparticles to be comprised of calcium carbonate whereas collagen/coral composite scaffolds were shown to have a crystalline calcium ethanoate structure. Crosslinked collagen/coral scaffolds demonstrated enhanced compressive properties when compared to collagen only scaffolds and also promoted more robust osteogenic differentiation of mesenchymal stromal cells, as indicated by increased expression of bone morphogenetic protein 2 at the gene level, and enhanced alkaline phosphatase activity and calcium accumulation at the protein level. Only subtle differences were observed when comparing the effect of coral microparticles of different sizes, with improved osteogenesis occurring as a result of calcium ion signalling delivered from collagen/coral composite scaffolds. These scaffolds, fabricated from entirely natural sources, therefore show promise as novel biomaterials for tissue engineering applications such as bone regeneration.
Assuntos
Antozoários/química , Materiais Biocompatíveis/farmacologia , Osteogênese/efeitos dos fármacos , Engenharia Tecidual/métodos , Alicerces Teciduais/química , Animais , Materiais Biocompatíveis/química , Materiais Biocompatíveis/isolamento & purificação , Regeneração Óssea/efeitos dos fármacos , Cálcio/metabolismo , Células Cultivadas , Colágeno/química , Perfilação da Expressão Gênica , Humanos , Teste de Materiais , Células-Tronco Mesenquimais/efeitos dos fármacos , Células-Tronco Mesenquimais/metabolismo , Cultura Primária de Células , Espectroscopia de Infravermelho com Transformada de Fourier , Difração de Raios XRESUMO
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
Assuntos
DNA/genética , Enciclopédias como Assunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Alelos , Linhagem Celular , Fator de Transcrição GATA1/metabolismo , Perfilação da Expressão Gênica , Genômica , Humanos , Células K562 , Especificidade de Órgãos , Fosforilação/genética , Polimorfismo de Nucleotídeo Único/genética , Mapas de Interação de Proteínas , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Seleção Genética/genética , Sítio de Iniciação de TranscriçãoRESUMO
Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype-genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events.
Assuntos
Biologia Computacional/métodos , Variação Genética , Oncogenes , Proteínas/química , Bases de Dados de Proteínas , Evolução Molecular , Genes Supressores de Tumor , Humanos , Neoplasias/genética , Neoplasias/patologia , Polimorfismo de Nucleotídeo Único , Proteínas/genética , Proteínas/metabolismo , Fluxo de TrabalhoRESUMO
Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.
Assuntos
Processamento Alternativo , Encéfalo , Regulação da Expressão Gênica no Desenvolvimento , Transtornos Mentais , Humanos , Atlas como Assunto , Transtorno do Espectro Autista/genética , Encéfalo/metabolismo , Encéfalo/crescimento & desenvolvimento , Encéfalo/embriologia , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Locos de Características Quantitativas , Esquizofrenia/genética , Transcriptoma , Transtornos Mentais/genéticaRESUMO
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet, little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multi-omics datasets into a resource comprising >2.8M nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550K cell-type-specific regulatory elements and >1.4M single-cell expression-quantitative-trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
RESUMO
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
Assuntos
Encéfalo , Redes Reguladoras de Genes , Transtornos Mentais , Análise de Célula Única , Humanos , Envelhecimento/genética , Encéfalo/metabolismo , Comunicação Celular/genética , Cromatina/metabolismo , Cromatina/genética , Genômica , Transtornos Mentais/genética , Córtex Pré-Frontal/metabolismo , Córtex Pré-Frontal/fisiologia , Locos de Características QuantitativasRESUMO
UNLABELLED: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. AVAILABILITY AND IMPLEMENTATION: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.
Assuntos
Genoma Humano , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Anotação de Sequência Molecular/métodos , Software , Variação Genética , Genótipo , Humanos , InternetRESUMO
Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage-and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.
RESUMO
In recent years, major advances in genomics, proteomics, macromolecular structure determination, and the computational resources capable of processing and disseminating the large volumes of data generated by each have played major roles in advancing a more systems-oriented appreciation of biological organization. One product of systems biology has been the delineation of graph models for describing genome-wide protein-protein interaction networks. The network organization and topology which emerges in such models may be used to address fundamental questions in an array of cellular processes, as well as biological features intrinsic to the constituent proteins (or "nodes") themselves. However, graph models alone constitute an abstraction which neglects the underlying biological and physical reality that the network's nodes and edges are highly heterogeneous entities. Here, we explore some of the advantages of introducing a protein structural dimension to such models, as the marriage of conventional network representations with macromolecular structural data helps to place static node and edge constructs in a biologically more meaningful context. We emphasize that 3D protein structures constitute a valuable conceptual and predictive framework by discussing examples of the insights provided, such as enabling in silico predictions of protein-protein interactions, providing rational and compelling classification schemes for network elements, as well as revealing interesting intrinsic differences between distinct node types, such as disorder and evolutionary features, which may then be rationalized in light of their respective functions within networks.
Assuntos
Simulação por Computador , Mapas de Interação de Proteínas/genética , Proteínas/genética , Animais , Genômica , Humanos , Modelos Genéticos , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Proteínas/química , Homologia Estrutural de Proteína , Máquina de Vetores de SuporteRESUMO
BACKGROUND: Neuropsychiatric disorders afflict a large portion of the global population and constitute a significant source of disability worldwide. Although Genome-wide Association Studies (GWAS) have identified many disorder-associated variants, the underlying regulatory mechanisms linking them to disorders remain elusive, especially those involving distant genomic elements. Expression quantitative trait loci (eQTLs) constitute a powerful means of providing this missing link. However, most eQTL studies in human brains have focused exclusively on cis-eQTLs, which link variants to nearby genes (i.e., those within 1 Mb of a variant). A complete understanding of disease etiology requires a clearer understanding of trans-regulatory mechanisms, which, in turn, entails a detailed analysis of the relationships between variants and expression changes in distant genes. METHODS: By leveraging large datasets from the PsychENCODE consortium, we conducted a genome-wide survey of trans-eQTLs in the human dorsolateral prefrontal cortex. We also performed colocalization and mediation analyses to identify mediators in trans-regulation and use trans-eQTLs to link GWAS loci to schizophrenia risk genes. RESULTS: We identified ~80,000 candidate trans-eQTLs (at FDR<0.25) that influence the expression of ~10K target genes (i.e., "trans-eGenes"). We found that many variants associated with these candidate trans-eQTLs overlap with known cis-eQTLs. Moreover, for >60% of these variants (by colocalization), the cis-eQTL's target gene acts as a mediator for the trans-eQTL SNP's effect on the trans-eGene, highlighting examples of cis-mediation as essential for trans-regulation. Furthermore, many of these colocalized variants fall into a discernable pattern wherein cis-eQTL's target is a transcription factor or RNA-binding protein, which, in turn, targets the gene associated with the candidate trans-eQTL. Finally, we show that trans-regulatory mechanisms provide valuable insights into psychiatric disorders: beyond what had been possible using only cis-eQTLs, we link an additional 23 GWAS loci and 90 risk genes (using colocalization between candidate trans-eQTLs and schizophrenia GWAS loci). CONCLUSIONS: We demonstrate that the transcriptional architecture of the human brain is orchestrated by both cis- and trans-regulatory variants and found that trans-eQTLs provide insights into brain-disease biology.
Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , Polimorfismo de Nucleotídeo Único , Regulação da Expressão Gênica , Córtex Pré-FrontalRESUMO
ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.
Assuntos
Bases de Dados Genéticas , Genômica , Neoplasias/genética , Linhagem Celular Tumoral , Transformação Celular Neoplásica/genética , Redes Reguladoras de Genes , Humanos , Mutação/genética , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismoRESUMO
Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.
Assuntos
Encéfalo/metabolismo , Regulação da Expressão Gênica , Transtornos Mentais/genética , Conjuntos de Dados como Assunto , Aprendizado Profundo , Elementos Facilitadores Genéticos , Epigênese Genética , Epigenômica , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Humanos , Locos de Características Quantitativas , Análise de Célula Única , TranscriptomaRESUMO
The rapidly growing volume of data being produced by next-generation sequencing initiatives is enabling more in-depth analyses of conservation than previously possible. Deep sequencing is uncovering disease loci and regions under selective constraint, despite the fact that intuitive biophysical reasons for such constraint are sometimes absent. Allostery may often provide the missing explanatory link. We use models of protein conformational change to identify allosteric residues by finding essential surface pockets and information-flow bottlenecks, and we develop a software tool that enables users to perform this analysis on their own proteins of interest. Though fundamentally 3D-structural in nature, our analysis is computationally fast, thereby allowing us to run it across the PDB and to evaluate general properties of predicted allosteric residues. We find that these tend to be conserved over diverse evolutionary time scales. Finally, we highlight examples of allosteric residues that help explain poorly understood disease-associated variants.
Assuntos
Sítio Alostérico , Sequência Conservada , Análise de Sequência de Proteína/métodos , Software , Animais , Humanos , Proteoma/químicaRESUMO
Structure has traditionally been interrelated with sequence, usually in the framework of comparing sequences across species sharing a common fold. However, the nature of information within the sequence and structure databases is evolving, changing the type of comparisons possible. In particular, we now have a vast amount of personal genome sequences from human populations and a greater fraction of new structures contain interacting proteins within large complexes. Consequently, we have to recast our conception of sequence conservation and its relation to structure-for example, focusing more on selection within the human population. Moreover, within structural biology there is less emphasis on the discovery of novel folds and more on relating structures to networks of protein interactions. We cover this changing mindset here.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas/química , Proteínas/genética , Humanos , Isomerismo , Mutação , Proteínas/metabolismoRESUMO
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.