Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Nucleic Acids Res ; 52(D1): D174-D182, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37962376

RESUMEN

JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.


Asunto(s)
Bases de Datos Genéticas , Unión Proteica , Factores de Transcripción , Animales , Humanos , Ratones , Bases de Datos Genéticas/normas , Bases de Datos Genéticas/tendencias , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Plantas/genética
2.
Nat Commun ; 14(1): 6947, 2023 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-37935654

RESUMEN

Disease-causing mutations in genes encoding transcription factors (TFs) can affect TF interactions with their cognate DNA-binding motifs. Whether and how TF mutations impact upon the binding to TF composite elements (CE) and the interaction with other TFs is unclear. Here, we report a distinct mechanism of TF alteration in human lymphomas with perturbed B cell identity, in particular classic Hodgkin lymphoma. It is caused by a recurrent somatic missense mutation c.295 T > C (p.Cys99Arg; p.C99R) targeting the center of the DNA-binding domain of Interferon Regulatory Factor 4 (IRF4), a key TF in immune cells. IRF4-C99R fundamentally alters IRF4 DNA-binding, with loss-of-binding to canonical IRF motifs and neomorphic gain-of-binding to canonical and non-canonical IRF CEs. IRF4-C99R thoroughly modifies IRF4 function by blocking IRF4-dependent plasma cell induction, and up-regulates disease-specific genes in a non-canonical Activator Protein-1 (AP-1)-IRF-CE (AICE)-dependent manner. Our data explain how a single mutation causes a complex switch of TF specificity and gene regulation and open the perspective to specifically block the neomorphic DNA-binding activities of a mutant TF.


Asunto(s)
Factores Reguladores del Interferón , Linfoma , Humanos , Linfocitos B/metabolismo , ADN , Regulación de la Expresión Génica , Factores Reguladores del Interferón/genética , Factores Reguladores del Interferón/metabolismo , Linfoma/genética
3.
Bioinformatics ; 39(10)2023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37796837

RESUMEN

SUMMARY: The SBILib Python library provides an integrated platform for the analysis of macromolecular structures and interactions. It combines simple 3D file parsing and workup methods with more advanced analytical tools. SBILib includes modules for macromolecular interactions, loops, super-secondary structures, and biological sequences, as well as wrappers for external tools with which to integrate their results and facilitate the comparative analysis of protein structures and their complexes. The library can handle macromolecular complexes formed by proteins and/or nucleic acid molecules (i.e. DNA and RNA). It is uniquely capable of parsing and calculating protein super-secondary structure and loop geometry. We have compiled a list of example scenarios which SBILib may be applied to and provided access to these within the library. AVAILABILITY AND IMPLEMENTATION: SBILib is made available on Github at https://github.com/structuralbioinformatics/SBILib.


Asunto(s)
ARN , Programas Informáticos , Estructura Molecular , Proteínas , Sustancias Macromoleculares
4.
NAR Genom Bioinform ; 5(2): lqad052, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37260510

RESUMEN

X-chromosome inactivation (XCI) epigenetically silences one X chromosome in every cell in female mammals. Although the majority of X-linked genes are silenced, in humans 20% or more are able to escape inactivation and continue to be expressed. Such escape genes are important contributors to sex differences in gene expression, and may impact the phenotypes of X aneuploidies; yet the mechanisms regulating escape from XCI are not understood. We have performed an enrichment analysis of transcription factor binding on the X chromosome, providing new evidence for enriched factors at the transcription start sites of escape genes. The top escape-enriched transcription factors were detected at the RPS4X promoter, a well-described human escape gene previously demonstrated to escape from XCI in a transgenic mouse model. Using a cell line model system that allows for targeted integration and inactivation of transgenes on the mouse X chromosome, we further assessed combinations of RPS4X promoter and genic elements for their ability to drive escape from XCI. We identified a small transgenic construct of only 6 kb capable of robust escape from XCI, establishing that gene-proximal elements are sufficient to permit escape, and highlighting the additive effect of multiple elements that work together in a context-specific fashion.

5.
Genome Biol ; 24(1): 154, 2023 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-37370113

RESUMEN

Deep learning models such as convolutional neural networks (CNNs) excel in genomic tasks but lack interpretability. We introduce ExplaiNN, which combines the expressiveness of CNNs with the interpretability of linear models. ExplaiNN can predict TF binding, chromatin accessibility, and de novo motifs, achieving performance comparable to state-of-the-art methods. Its predictions are transparent, providing global (cell state level) as well as local (individual sequence level) biological insights into the data. ExplaiNN can serve as a plug-and-play platform for pretrained models and annotated position weight matrices. ExplaiNN aims to accelerate the adoption of deep learning in genomic sequence analysis by domain experts.


Asunto(s)
Genómica , Redes Neurales de la Computación , Genómica/métodos , Cromatina/genética , Unión Proteica
6.
Nucleic Acids Res ; 51(W1): W379-W386, 2023 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-37166953

RESUMEN

MiniPromoters, or compact promoters, are short DNA sequences that can drive expression in specific cells and tissues. While broadly useful, they are of high relevance to gene therapy due to their role in enabling precise control of where a therapeutic gene will be expressed. Here, we present OnTarget (http://ontarget.cmmt.ubc.ca), a webserver that streamlines the MiniPromoter design process. Users only need to specify a gene of interest or custom genomic coordinates on which to focus the identification of promoters and enhancers, and can also provide relevant cell-type-specific genomic evidence (e.g. accessible chromatin regions, histone modifications, etc.). OnTarget combines the provided data with internal data to identify candidate promoters and enhancers and design MiniPromoters. To illustrate the utility of OnTarget, we designed and characterized two MiniPromoters targeting different cell populations relevant to Parkinson Disease.


Asunto(s)
Biología Computacional , Simulación por Computador , Regiones Promotoras Genéticas , Programas Informáticos , Elementos de Facilitación Genéticos/genética , Genoma , Genómica , Regiones Promotoras Genéticas/genética , Internet , Biología Computacional/instrumentación , Biología Computacional/métodos
7.
J Exp Med ; 220(5)2023 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-36884218

RESUMEN

STAT6 (signal transducer and activator of transcription 6) is a transcription factor that plays a central role in the pathophysiology of allergic inflammation. We have identified 16 patients from 10 families spanning three continents with a profound phenotype of early-life onset allergic immune dysregulation, widespread treatment-resistant atopic dermatitis, hypereosinophilia with esosinophilic gastrointestinal disease, asthma, elevated serum IgE, IgE-mediated food allergies, and anaphylaxis. The cases were either sporadic (seven kindreds) or followed an autosomal dominant inheritance pattern (three kindreds). All patients carried monoallelic rare variants in STAT6 and functional studies established their gain-of-function (GOF) phenotype with sustained STAT6 phosphorylation, increased STAT6 target gene expression, and TH2 skewing. Precision treatment with the anti-IL-4Rα antibody, dupilumab, was highly effective improving both clinical manifestations and immunological biomarkers. This study identifies heterozygous GOF variants in STAT6 as a novel autosomal dominant allergic disorder. We anticipate that our discovery of multiple kindreds with germline STAT6 GOF variants will facilitate the recognition of more affected individuals and the full definition of this new primary atopic disorder.


Asunto(s)
Asma , Hipersensibilidad a los Alimentos , Humanos , Factor de Transcripción STAT6 , Mutación con Ganancia de Función , Inmunoglobulina E/genética
8.
Stem Cell Reports ; 18(3): 765-781, 2023 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-36801003

RESUMEN

Improving methods for human embryonic stem cell differentiation represents a challenge in modern regenerative medicine research. Using drug repurposing approaches, we discover small molecules that regulate the formation of definitive endoderm. Among them are inhibitors of known processes involved in endoderm differentiation (mTOR, PI3K, and JNK pathways) and a new compound, with an unknown mechanism of action, capable of inducing endoderm formation in the absence of growth factors in the media. Optimization of the classical protocol by inclusion of this compound achieves the same differentiation efficiency with a 90% cost reduction. The presented in silico procedure for candidate molecule selection has broad potential for improving stem cell differentiation protocols.


Asunto(s)
Endodermo , Células Madre Embrionarias Humanas , Humanos , Diferenciación Celular/fisiología
9.
Sci Immunol ; 8(79): eade7953, 2023 01 20.
Artículo en Inglés | MEDLINE | ID: mdl-36662884

RESUMEN

Interferon regulatory factor 4 (IRF4) is a transcription factor (TF) and key regulator of immune cell development and function. We report a recurrent heterozygous mutation in IRF4, p.T95R, causing an autosomal dominant combined immunodeficiency (CID) in seven patients from six unrelated families. The patients exhibited profound susceptibility to opportunistic infections, notably Pneumocystis jirovecii, and presented with agammaglobulinemia. Patients' B cells showed impaired maturation, decreased immunoglobulin isotype switching, and defective plasma cell differentiation, whereas their T cells contained reduced TH17 and TFH populations and exhibited decreased cytokine production. A knock-in mouse model of heterozygous T95R showed a severe defect in antibody production both at the steady state and after immunization with different types of antigens, consistent with the CID observed in these patients. The IRF4T95R variant maps to the TF's DNA binding domain, alters its canonical DNA binding specificities, and results in a simultaneous multimorphic combination of loss, gain, and new functions for IRF4. IRF4T95R behaved as a gain-of-function hypermorph by binding to DNA with higher affinity than IRF4WT. Despite this increased affinity for DNA, the transcriptional activity on IRF4 canonical genes was reduced, showcasing a hypomorphic activity of IRF4T95R. Simultaneously, IRF4T95R functions as a neomorph by binding to noncanonical DNA sites to alter the gene expression profile, including the transcription of genes exclusively induced by IRF4T95R but not by IRF4WT. This previously undescribed multimorphic IRF4 pathophysiology disrupts normal lymphocyte biology, causing human disease.


Asunto(s)
Regulación de la Expresión Génica , Factores Reguladores del Interferón , Ratones , Animales , Humanos , Linfocitos B , ADN/metabolismo , Mutación
10.
Nucleic Acids Res ; 50(D1): D165-D173, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850907

RESUMEN

JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.


Asunto(s)
Bases de Datos Genéticas , Genómica/clasificación , Programas Informáticos , Factores de Transcripción/genética , Animales , Sitios de Unión/genética , Biología Computacional , Genoma/genética , Humanos , Ratones , Plantas/genética , Unión Proteica/genética , Factores de Transcripción/clasificación , Vertebrados/genética
11.
J Med Genet ; 59(1): 46-55, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-33257509

RESUMEN

Strabismus is a common condition, affecting 1%-4% of individuals. Isolated strabismus has been studied in families with Mendelian inheritance patterns. Despite the identification of multiple loci via linkage analyses, no specific genes have been identified from these studies. The current study is based on a seven-generation family with isolated strabismus inherited in an autosomal dominant manner. A total of 13 individuals from a common ancestor have been included for linkage analysis. Among these, nine are affected and four are unaffected. A single linkage signal has been identified at an 8.5 Mb region of chromosome 14q12 with a multipoint LOD (logarithm of the odds) score of 4.69. Disruption of this locus is known to cause FOXG1 syndrome (or congenital Rett syndrome; OMIM #613454 and *164874), in which 84% of affected individuals present with strabismus. With the incorporation of next-generation sequencing and in-depth bioinformatic analyses, a 4 bp non-coding deletion was prioritised as the top candidate for the observed strabismus phenotype. The deletion is predicted to disrupt regulation of FOXG1, which encodes a transcription factor of the Forkhead family. Suggestive of an autoregulation effect, the disrupted sequence matches the consensus FOXG1 and Forkhead family transcription factor binding site and has been observed in previous ChIP-seq studies to be bound by Foxg1 in early mouse brain development. Future study of this specific deletion may shed light on the regulation of FOXG1 expression and may enhance our understanding of the mechanisms contributing to strabismus and FOXG1 syndrome.


Asunto(s)
Factores de Transcripción Forkhead/genética , Proteínas del Tejido Nervioso/genética , Síndrome de Rett/genética , Eliminación de Secuencia , Estrabismo/genética , Adolescente , Anciano , Anciano de 80 o más Años , Animales , Ligamiento Genético , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Persona de Mediana Edad , Linaje , Secuenciación del Exoma , Secuenciación Completa del Genoma , Adulto Joven
12.
Biochim Biophys Acta Gene Regul Mech ; 1864(11-12): 194765, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34673265

RESUMEN

To control gene transcription, DNA-binding transcription factors recognise specific sequence motifs in gene regulatory regions. A complete and reliable GO annotation of all DNA-binding transcription factors is key to investigating the delicate balance of gene regulation in response to environmental and developmental stimuli. The need for such information is demonstrated by the many lists of transcription factors that have been produced over the past decade. The COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC) Consortium brought together experts in the field of transcription with the aim of providing high quality and interoperable gene regulatory data. The Gene Ontology (GO) Consortium provides strict definitions for gene product function, including factors that regulate transcription. The collaboration between the GREEKC and GO Consortia has enabled the application of those definitions to produce a new curated catalogue of over 1400 human DNA-binding transcription factors, that can be accessed at https://www.ebi.ac.uk/QuickGO/targetset/dbTF. This catalogue has facilitated an improvement in the GO annotation of human DNA-binding transcription factors and led to the GO annotation of almost sixty thousand DNA-binding transcription factors in over a hundred species. Thus, this work will aid researchers investigating the regulation of transcription in both biomedical and basic science.


Asunto(s)
ADN/metabolismo , Ontología de Genes , Anotación de Secuencia Molecular , Factores de Transcripción/clasificación , Bases de Datos Genéticas , Humanos , Factores de Transcripción/metabolismo
13.
Genome Biol ; 22(1): 280, 2021 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-34579793

RESUMEN

BACKGROUND: Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task. RESULTS: We assess a transfer learning strategy for TF binding prediction consisting of a pre-training step, wherein we train a multi-task model with multiple TFs, and a fine-tuning step, wherein we initialize single-task models for individual TFs with the weights learned by the multi-task model, after which the single-task models are trained at a lower learning rate. We corroborate that transfer learning improves model performance, especially if in the pre-training step the multi-task model is trained with biologically relevant TFs. We show the effectiveness of transfer learning for TFs with ~ 500 ChIP-seq peak regions. Using model interpretation techniques, we demonstrate that the features learned in the pre-training step are refined in the fine-tuning step to resemble the binding motif of the target TF (i.e., the recipient of transfer learning in the fine-tuning step). Moreover, pre-training with biologically relevant TFs allows single-task models in the fine-tuning step to learn useful features other than the motif of the target TF. CONCLUSIONS: Our results confirm that transfer learning is a powerful technique for TF binding prediction.


Asunto(s)
Aprendizaje Automático , Factores de Transcripción/metabolismo , Secuenciación de Inmunoprecipitación de Cromatina , Genoma
14.
NAR Genom Bioinform ; 3(2): lqab027, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33937764

RESUMEN

Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30-50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.

15.
Gene Ther ; 28(6): 351-372, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33531684

RESUMEN

Small and cell-type restricted promoters are important tools for basic and preclinical research, and clinical delivery of gene therapies. In clinical gene therapy, ophthalmic trials have been leading the field, with over 50% of ocular clinical trials using promoters that restrict expression based on cell type. Here, 19 human DNA MiniPromoters were bioinformatically designed for rAAV, tested by neonatal intravenous delivery in mouse, and successful MiniPromoters went on to be tested by intravitreal, subretinal, intrastromal, and/or intravenous delivery in adult mouse. We present promoter development as an overview for each cell type, but only show results in detail for the recommended MiniPromoters: Ple265 and Ple341 (PCP2) ON bipolar, Ple349 (PDE6H) cone, Ple253 (PITX3) corneal stroma, Ple32 (CLDN5) endothelial cells of the blood-retina barrier, Ple316 (NR2E1) Müller glia, and Ple331 (PAX6) PAX6 positive. Overall, we present a resource of new, redesigned, and improved MiniPromoters for ocular gene therapy that range in size from 784 to 2484 bp, and from weaker, equal, or stronger in strength relative to the ubiquitous control promoter smCBA. All MiniPromoters will be useful for therapies involving small regulatory RNA and DNA, and proteins ranging from 517 to 1084 amino acids, representing 62.9-90.2% of human proteins.


Asunto(s)
Células Endoteliales , Animales , Humanos , Ratones , Neuroglía , Factor de Transcripción PAX6/genética , Regiones Promotoras Genéticas , Retina , Células Fotorreceptoras Retinianas Conos
16.
Epigenetics Chromatin ; 14(1): 12, 2021 02 17.
Artículo en Inglés | MEDLINE | ID: mdl-33597016

RESUMEN

BACKGROUND: X-chromosome inactivation (XCI) in eutherian mammals is the epigenetic inactivation of one of the two X chromosomes in XX females in order to compensate for dosage differences with XY males. Not all genes are inactivated, and the proportion escaping from inactivation varies between human and mouse (the two species that have been extensively studied). RESULTS: We used DNA methylation to predict the XCI status of X-linked genes with CpG islands across 12 different species: human, chimp, bonobo, gorilla, orangutan, mouse, cow, sheep, goat, pig, horse and dog. We determined the XCI status of 342 CpG islands on average per species, with most species having 80-90% of genes subject to XCI. Mouse was an outlier, with a higher proportion of genes subject to XCI than found in other species. Sixteen genes were found to have discordant X-chromosome inactivation statuses across multiple species, with five of these showing primate-specific escape from XCI. These discordant genes tended to cluster together within the X chromosome, along with genes with similar patterns of escape from XCI. CTCF-binding, ATAC-seq signal and LTR repeats were enriched at genes escaping XCI when compared to genes subject to XCI; however, enrichment was only observed in three or four of the species tested. LINE and DNA repeats showed enrichment around subject genes, but again not in a consistent subset of species. CONCLUSIONS: In this study, we determined XCI status across 12 species, showing mouse to be an outlier with few genes that escape inactivation. Inactivation status is largely conserved across species. The clustering of genes that change XCI status across species implicates a domain-level control. In contrast, the relatively consistent, but not universal correlation of inactivation status with enrichment of repetitive elements or CTCF binding at promoters demonstrates gene-based influences on inactivation state. This study broadens enrichment analysis of regulatory elements to species beyond human and mouse.


Asunto(s)
Metilación de ADN , Inactivación del Cromosoma X , Animales , Bovinos , Islas de CpG , Perros , Femenino , Genes Ligados a X , Caballos , Masculino , Ratones , Ovinos , Porcinos , Cromosoma X/genética
17.
Neurobiol Dis ; 153: 105314, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33636385

RESUMEN

The granulin protein (also known as, and hereafter referred to as, progranulin) is a secreted glycoprotein that contributes to overall brain health. Heterozygous loss-of-function mutations in the gene encoding the progranulin protein (Granulin Precursor, GRN) are a common cause of familial frontotemporal dementia (FTD). Gene therapy approaches that aim to increase progranulin expression from a single wild-type allele, an area of active investigation for the potential treatment of GRN-dependent FTD, will benefit from the availability of a mouse model that expresses a genomic copy of the human GRN gene. Here we report the development and characterization of a novel mouse model that expresses the entire human GRN gene in its native genomic context as a single copy inserted into a defined locus (Hprt) in the mouse genome. We show that human and mouse progranulin are expressed in a similar tissue-specific pattern, suggesting that the two genes are regulated by similar mechanisms. Human progranulin rescues a phenotype characteristic of progranulin-null mice, the exaggerated and early deposition of the aging pigment lipofuscin in the brain, indicating that the two proteins are functionally similar. Longitudinal behavioural and neuropathological analyses revealed no significant differences between wild-type and human progranulin-overexpressing mice up to 18 months of age, providing evidence that long-term increase of progranulin levels is well tolerated in mice. Finally, we demonstrate that human progranulin expression can be increased in the brain using an antisense oligonucleotide that inhibits a known GRN-regulating micro-RNA, demonstrating that the transgene is responsive to potential gene therapy drugs. Human progranulin-expressing mice represent a novel and valuable tool to expedite the development of progranulin-modulating therapeutics.


Asunto(s)
Encéfalo/metabolismo , Demencia Frontotemporal/genética , Expresión Génica/efectos de los fármacos , Oligonucleótidos Antisentido/farmacología , Progranulinas/genética , Animales , Modelos Animales de Enfermedad , Expresión Génica/genética , Técnicas de Sustitución del Gen , Terapia Genética , Humanos , Lipofuscina/metabolismo , Ratones , Ratones Noqueados , Ratones Transgénicos
18.
BMC Bioinformatics ; 22(1): 4, 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33407073

RESUMEN

BACKGROUND: Statistical potentials, also named knowledge-based potentials, are scoring functions derived from empirical data that can be used to evaluate the quality of protein folds and protein-protein interaction (PPI) structures. In previous works we decomposed the statistical potentials in different terms, named Split-Statistical Potentials, accounting for the type of amino acid pairs, their hydrophobicity, solvent accessibility and type of secondary structure. These potentials have been successfully used to identify near-native structures in protein structure prediction, rank protein docking poses, and predict PPI binding affinities. RESULTS: Here, we present the SPServer, a web server that applies the Split-Statistical Potentials to analyze protein folds and protein interfaces. SPServer provides global scores as well as residue/residue-pair profiles presented as score plots and maps. This level of detail allows users to: (1) identify potentially problematic regions on protein structures; (2) identify disrupting amino acid pairs in protein interfaces; and (3) compare and analyze the quality of tertiary and quaternary structural models. CONCLUSIONS: While there are many web servers that provide scoring functions to assess the quality of either protein folds or PPI structures, SPServer integrates both aspects in a unique easy-to-use web server. Moreover, the server permits to locally assess the quality of the structures and interfaces at a residue level and provides tools to compare the local assessment between structures. SERVER ADDRESS: https://sbi.upf.edu/spserver/ .


Asunto(s)
Mapas de Interacción de Proteínas/fisiología , Estructura Secundaria de Proteína , Proteínas , Programas Informáticos , Aminoácidos/química , Aminoácidos/metabolismo , Internet , Bases del Conocimiento , Modelos Estadísticos , Proteínas/química , Proteínas/metabolismo
19.
Hum Mutat ; 42(4): 346-358, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33368787

RESUMEN

Mendelian rare genetic diseases affect 5%-10% of the population, and with over 5300 genes responsible for ∼7000 different diseases, they are challenging to diagnose. The use of whole-genome sequencing (WGS) has bolstered the diagnosis rate significantly. The effective use of WGS relies on the ability to identify the disrupted gene responsible for disease phenotypes. This process involves genomic variant calling and prioritization, and is the beneficiary of improvements to sequencing technology, variant calling approaches, and increased capacity to prioritize genomic variants with potential pathogenicity. As analysis pipelines continue to improve, careful testing of their efficacy is paramount. However, real-life cases typically emerge anecdotally, and utilization of clinically sensitive and identifiable data for testing pipeline improvements is regulated and limiting. We identified the need for a gene-based variant simulation framework that can create mock rare disease scenarios, utilizing known pathogenic variants or through the creation of novel gene-disrupting variants. To fill this need, we present GeneBreaker, a tool that creates synthetic rare disease cases with utility for benchmarking variant calling approaches, testing the efficacy of variant prioritization, and as an educational mechanism for training diagnostic practitioners in the expanding field of genomic medicine. GeneBreaker is freely available at http://GeneBreaker.cmmt.ubc.ca.


Asunto(s)
Genómica , Enfermedades Raras , Simulación por Computador , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Fenotipo , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Secuenciación Completa del Genoma
20.
Genome Biol ; 21(1): 114, 2020 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-32393327

RESUMEN

BACKGROUND: Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets. RESULTS: Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity. CONCLUSIONS: In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.


Asunto(s)
Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Factores de Transcripción/metabolismo , Animales , Benchmarking , Secuenciación de Inmunoprecipitación de Cromatina , Humanos , Ratones
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA