Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
GigaByte ; 2023: 1-10, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37732134

RESUMEN

We present ensemblQueryR, an R package for querying Ensembl linkage disequilibrium (LD) endpoints. This package is flexible, fast and user-friendly, and optimised for high-throughput querying. ensemblQueryR uses functions that are intuitive and amenable to custom code integration, familiar R object types as inputs and outputs as well as providing parallelisation functionality. For each Ensembl LD endpoint, ensemblQueryR provides two functions, permitting both single- and multi-query modes of operation. The multi-query functions are optimised for large query sizes and provide optional parallelisation to leverage available computational resources and minimise processing time. We demonstrate improved computational performance of ensemblQueryR over an exisiting tool in terms of random access memory (RAM) usage and speed, delivering a 10-fold speed increase whilst using a third of the RAM. Finally, ensemblQueryR is near-agnostic to operating system and computational architecture through Docker and singularity images, making this tool widely accessible to the scientific community.

2.
GigaByte ; 2023: gigabyte87, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37637773

RESUMEN

Amazon Simple Storage Service (Amazon S3) is a widely used platform for storing large biomedical datasets. Unintended data alterations can occur during data writing and transmission, altering the original content and generating unexpected results. However, no open-source and easy-to-use tool exists to verify end-to-end data integrity. Here, we present aws-s3-integrity-check, a user-friendly, lightweight, and reliable bash tool to verify the integrity of a dataset stored in an Amazon S3 bucket. Using this tool, we only needed ∼114 min to verify the integrity of 1,045 records ranging between 5 bytes and 10 gigabytes and occupying ∼935 gigabytes of the Amazon S3 cloud. Our aws-s3-integrity-check tool also provides file-by-file on-screen and log-file-based information about the status of each integrity check. To our knowledge, this tool is the only open-source one that allows verifying the integrity of a dataset uploaded to the Amazon S3 Storage quickly, reliably, and efficiently. The tool is freely available for download and use at https://github.com/SoniaRuiz/aws-s3-integrity-check and https://hub.docker.com/r/soniaruiz/aws-s3-integrity-check.

3.
Brain ; 146(12): 4974-4987, 2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-37522749

RESUMEN

Genetic variants conferring risks for Parkinson's disease have been highlighted through genome-wide association studies, yet exploration of their specific disease mechanisms is lacking. Two Parkinson's disease candidate genes, KAT8 and KANSL1, identified through genome-wide studies and a PINK1-mitophagy screen, encode part of the histone acetylating non-specific lethal complex. This complex localizes to the nucleus, where it plays a role in transcriptional activation, and to mitochondria, where it has been suggested to have a role in mitochondrial transcription. In this study, we sought to identify whether the non-specific lethal complex has potential regulatory relationships with other genes associated with Parkinson's disease in human brain. Correlation in the expression of non-specific lethal genes and Parkinson's disease-associated genes was investigated in primary gene co-expression networks using publicly-available transcriptomic data from multiple brain regions (provided by the Genotype-Tissue Expression Consortium and UK Brain Expression Consortium), whilst secondary networks were used to examine cell type specificity. Reverse engineering of gene regulatory networks generated regulons of the complex, which were tested for heritability using stratified linkage disequilibrium score regression. Prioritized gene targets were then validated in vitro using a QuantiGene multiplex assay and publicly-available chromatin immunoprecipitation-sequencing data. Significant clustering of non-specific lethal genes was revealed alongside Parkinson's disease-associated genes in frontal cortex primary co-expression modules, amongst other brain regions. Both primary and secondary co-expression modules containing these genes were enriched for mainly neuronal cell types. Regulons of the complex contained Parkinson's disease-associated genes and were enriched for biological pathways genetically linked to disease. When examined in a neuroblastoma cell line, 41% of prioritized gene targets showed significant changes in mRNA expression following KANSL1 or KAT8 perturbation. KANSL1 and H4K8 chromatin immunoprecipitation-sequencing data demonstrated non-specific lethal complex activity at many of these genes. In conclusion, genes encoding the non-specific lethal complex are highly correlated with and regulate genes associated with Parkinson's disease. Overall, these findings reveal a potentially wider role for this protein complex in regulating genes and pathways implicated in Parkinson's disease.


Asunto(s)
Enfermedad de Parkinson , Humanos , Enfermedad de Parkinson/genética , Enfermedad de Parkinson/metabolismo , Estudio de Asociación del Genoma Completo , Mitocondrias/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes
4.
Brain ; 146(7): 2869-2884, 2023 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-36624280

RESUMEN

Improvements in functional genomic annotation have led to a critical mass of neurogenetic discoveries. This is exemplified in hereditary ataxia, a heterogeneous group of disorders characterised by incoordination from cerebellar dysfunction. Associated pathogenic variants in more than 300 genes have been described, leading to a detailed genetic classification partitioned by age-of-onset. Despite these advances, up to 75% of patients with ataxia remain molecularly undiagnosed even following whole genome sequencing, as exemplified in the 100 000 Genomes Project. This study aimed to understand whether we can improve our knowledge of the genetic architecture of hereditary ataxia by leveraging functional genomic annotations, and as a result, generate insights and strategies that raise the diagnostic yield. To achieve these aims, we used publicly-available multi-omics data to generate 294 genic features, capturing information relating to a gene's structure, genetic variation, tissue-specific, cell-type-specific and temporal expression, as well as protein products of a gene. We studied these features across genes typically causing childhood-onset, adult-onset or both types of disease first individually, then collectively. This led to the generation of testable hypotheses which we investigated using whole genome sequencing data from up to 2182 individuals presenting with ataxia and 6658 non-neurological probands recruited in the 100 000 Genomes Project. Using this approach, we demonstrated a high short tandem repeat (STR) density within childhood-onset genes suggesting that we may be missing pathogenic repeat expansions within this cohort. This was verified in both childhood- and adult-onset ataxia patients from the 100 000 Genomes Project who were unexpectedly found to have a trend for higher repeat sizes even at naturally-occurring STRs within known ataxia genes, implying a role for STRs in pathogenesis. Using unsupervised analysis, we found significant similarities in genomic annotation across the gene panels, which suggested adult- and childhood-onset patients should be screened using a common diagnostic gene set. We tested this within the 100 000 Genomes Project by assessing the burden of pathogenic variants among childhood-onset genes in adult-onset patients and vice versa. This demonstrated a significantly higher burden of rare, potentially pathogenic variants in conventional childhood-onset genes among individuals with adult-onset ataxia. Our analysis has implications for the current clinical practice in genetic testing for hereditary ataxia. We suggest that the diagnostic rate for hereditary ataxia could be increased by removing the age-of-onset partition, and through a modified screening for repeat expansions in naturally-occurring STRs within known ataxia-associated genes, in effect treating these regions as candidate pathogenic loci.


Asunto(s)
Ataxia Cerebelosa , Degeneraciones Espinocerebelosas , Adulto , Humanos , Degeneraciones Espinocerebelosas/genética , Ataxia Cerebelosa/diagnóstico , Ataxia Cerebelosa/genética , Ataxia/diagnóstico , Ataxia/genética , Genómica , Pruebas Genéticas
5.
Nucleic Acids Res ; 51(D1): D167-D178, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36399497

RESUMEN

Dysregulation of RNA splicing contributes to both rare and complex diseases. RNA-sequencing data from human tissues has shown that this process can be inaccurate, resulting in the presence of novel introns detected at low frequency across samples and within an individual. To enable the full spectrum of intron use to be explored, we have developed IntroVerse, which offers an extensive catalogue on the splicing of 332,571 annotated introns and a linked set of 4,679,474 novel junctions covering 32,669 different genes. This dataset has been generated through the analysis of 17,510 human control RNA samples from 54 tissues provided by the Genotype-Tissue Expression Consortium. IntroVerse has two unique features: (i) it provides a complete catalogue of novel junctions and (ii) each novel junction has been assigned to a specific annotated intron. This unique, hierarchical structure offers multiple uses, including the identification of novel transcripts from known genes and their tissue-specific usage, and the assessment of background splicing noise for introns thought to be mis-spliced in disease states. IntroVerse provides a user-friendly web interface and is freely available at https://rytenlab.com/browser/app/introverse.


Asunto(s)
Bases de Datos Genéticas , Intrones , Empalme del ARN , Humanos , Empalme Alternativo , Secuencia de Bases , Intrones/genética , ARN , Empalme del ARN/genética
6.
medRxiv ; 2023 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-38196618

RESUMEN

To discover rare disease-gene associations, we developed a gene burden analytical framework and applied it to rare, protein-coding variants from whole genome sequencing of 35,008 cases with rare diseases and their family members recruited to the 100,000 Genomes Project (100KGP). Following in silico triaging of the results, 88 novel associations were identified including 38 with existing experimental evidence. We have published the confirmation of one of these associations, hereditary ataxia with UCHL1 , and independent confirmatory evidence has recently been published for four more. We highlight a further seven compelling associations: hypertrophic cardiomyopathy with DYSF and SLC4A3 where both genes show high/specific heart expression and existing associations to skeletal dystrophies or short QT syndrome respectively; monogenic diabetes with UNC13A with a known role in the regulation of ß cells and a mouse model with impaired glucose tolerance; epilepsy with KCNQ1 where a mouse model shows seizures and the existing long QT syndrome association may be linked; early onset Parkinson's disease with RYR1 with existing links to tremor pathophysiology and a mouse model with neurological phenotypes; anterior segment ocular abnormalities associated with POMK showing expression in corneal cells and with a zebrafish model with developmental ocular abnormalities; and cystic kidney disease with COL4A3 showing high renal expression and prior evidence for a digenic or modifying role in renal disease. Confirmation of all 88 associations would lead to potential diagnoses in 456 molecularly undiagnosed cases within the 100KGP, as well as other rare disease patients worldwide, highlighting the clinical impact of a large-scale statistical approach to rare disease gene discovery.

7.
Bioinformatics ; 38(15): 3844-3846, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35751589

RESUMEN

MOTIVATION: The advent of long-read sequencing technologies has increased demand for the visualization and interpretation of transcripts. However, tools that perform such visualizations remain inflexible and lack the ability to easily identify differences between transcript structures. Here, we introduce ggtranscript, an R package that provides a fast and flexible method to visualize and compare transcripts. As a ggplot2 extension, ggtranscript inherits the functionality and familiarity of ggplot2 making it easy to use. AVAILABILITY AND IMPLEMENTATION: ggtranscript is an R package available at https://github.com/dzhang32/ggtranscript (DOI: https://doi.org/10.5281/zenodo.6374061) via an open-source MIT licence. Further documentation is available at https://dzhang32.github.io/ggtranscript/.


Asunto(s)
Programas Informáticos , Análisis de Secuencia de ADN/métodos , Isoformas de Proteínas/genética
8.
Nat Commun ; 13(1): 2270, 2022 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-35477703

RESUMEN

There is growing evidence for the importance of 3' untranslated region (3'UTR) dependent regulatory processes. However, our current human 3'UTR catalogue is incomplete. Here, we develop a machine learning-based framework, leveraging both genomic and tissue-specific transcriptomic features to predict previously unannotated 3'UTRs. We identify unannotated 3'UTRs associated with 1,563 genes across 39 human tissues, with the greatest abundance found in the brain. These unannotated 3'UTRs are significantly enriched for RNA binding protein (RBP) motifs and exhibit high human lineage-specificity. We find that brain-specific unannotated 3'UTRs are enriched for the binding motifs of important neuronal RBPs such as TARDBP and RBFOX1, and their associated genes are involved in synaptic function. Our data is shared through an online resource F3UTER ( https://astx.shinyapps.io/F3UTER/ ). Overall, our data improves 3'UTR annotation and provides additional insights into the mRNA-RBP interactome in the human brain, with implications for our understanding of neurological and neurodevelopmental diseases.


Asunto(s)
Transcriptoma , Regiones no Traducidas 3'/genética , Humanos , ARN Mensajero/genética
9.
Commun Biol ; 4(1): 1262, 2021 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-34737414

RESUMEN

Mitochondrial dysfunction contributes to the pathogenesis of many neurodegenerative diseases. The mitochondrial genome encodes core respiratory chain proteins, but the vast majority of mitochondrial proteins are nuclear-encoded, making interactions between the two genomes vital for cell function. Here, we examine these relationships by comparing mitochondrial and nuclear gene expression across different regions of the human brain in healthy and disease cohorts. We find strong regional patterns that are modulated by cell-type and reflect functional specialisation. Nuclear genes causally implicated in sporadic Parkinson's and Alzheimer's disease (AD) show much stronger relationships with the mitochondrial genome than expected by chance, and mitochondrial-nuclear relationships are highly perturbed in AD cases, particularly through synaptic and lysosomal pathways, potentially implicating the regulation of energy balance and removal of dysfunction mitochondria in the etiology or progression of the disease. Finally, we present MitoNuclearCOEXPlorer, a tool to interrogate key mitochondria-nuclear relationships in multi-dimensional brain data.


Asunto(s)
Encéfalo/fisiopatología , Núcleo Celular/fisiología , Mitocondrias/fisiología , Enfermedades Neurodegenerativas/fisiopatología , Humanos , Análisis de Secuencia de ARN , Transducción de Señal
10.
Cell Rep ; 35(10): 109189, 2021 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-34107263

RESUMEN

Neuropathological and experimental evidence suggests that the cell-to-cell transfer of α-synuclein has an important role in the pathogenesis of Parkinson's disease (PD). However, the mechanism underlying this phenomenon is not fully understood. We undertook a small interfering RNA (siRNA), genome-wide screen to identify genes regulating the cell-to-cell transfer of α-synuclein. A genetically encoded reporter, GFP-2A-αSynuclein-RFP, suitable for separating donor and recipient cells, was transiently transfected into HEK cells stably overexpressing α-synuclein. We find that 38 genes regulate the transfer of α-synuclein-RFP, one of which is ITGA8, a candidate gene identified through a recent PD genome-wide association study (GWAS). Weighted gene co-expression network analysis (WGCNA) and weighted protein-protein network interaction analysis (WPPNIA) show that those hits cluster in networks that include known PD genes more frequently than expected by random chance. The findings expand our understanding of the mechanism of α-synuclein spread.


Asunto(s)
Comunicación Celular/fisiología , Estudio de Asociación del Genoma Completo/métodos , Mapas de Interacción de Proteínas/fisiología , alfa-Sinucleína/metabolismo , Humanos
11.
Nat Commun ; 12(1): 2076, 2021 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-33824317

RESUMEN

Knowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer's disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.


Asunto(s)
Apolipoproteínas E/genética , Genoma Humano , Enfermedades Neurodegenerativas/genética , Filogenia , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/patología , Encéfalo/patología , Cromosomas Humanos Par 19/genética , Secuencia Conservada/genética , ADN Intergénico/genética , Ontología de Genes , Humanos , Intrones/genética , Desequilibrio de Ligamiento/genética , Anotación de Secuencia Molecular , Fenotipo , Polimorfismo de Nucleótido Simple/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Análisis de Regresión
12.
Front Genet ; 12: 630187, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33719340

RESUMEN

Gene co-expression networks are a powerful type of analysis to construct gene groupings based on transcriptomic profiling. Co-expression networks make it possible to discover modules of genes whose mRNA levels are highly correlated across samples. Subsequent annotation of modules often reveals biological functions and/or evidence of cellular specificity for cell types implicated in the tissue being studied. There are multiple ways to perform such analyses with weighted gene co-expression network analysis (WGCNA) amongst one of the most widely used R packages. While managing a few network models can be done manually, it is often more advantageous to study a wider set of models derived from multiple independently generated transcriptomic data sets (e.g., multiple networks built from many transcriptomic sources). However, there is no software tool available that allows this to be easily achieved. Furthermore, the visual nature of co-expression networks in combination with the coding skills required to explore networks, makes the construction of a web-based platform for their management highly desirable. Here, we present the CoExp Web application, a user-friendly online tool that allows the exploitation of the full collection of 109 co-expression networks provided by the CoExpNets suite of R packages. We describe the usage of CoExp, including its contents and the functionality available through the family of CoExpNets packages. All the tools presented, including the web front- and back-ends are available for the research community so any research group can build its own suite of networks and make them accessible through their own CoExp Web application. Therefore, this paper is of interest to both researchers wishing to annotate their genes of interest across different brain network models and specialists interested in the creation of GCNs looking for a tool to appropriately manage, use, publish, and share their networks in a consistent and productive manner.

13.
Bioinformatics ; 37(18): 2905-2911, 2021 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-33734320

RESUMEN

MOTIVATION: Co-expression networks are a powerful gene expression analysis method to study how genes co-express together in clusters with functional coherence that usually resemble specific cell type behavior for the genes involved. They can be applied to bulk-tissue gene expression profiling and assign function, and usually cell type specificity, to a high percentage of the gene pool used to construct the network. One of the limitations of this method is that each gene is predicted to play a role in a specific set of coherent functions in a single cell type (i.e. at most we get a single for each gene). We present here GMSCA (Gene Multifunctionality Secondary Co-expression Analysis), a software tool that exploits the co-expression paradigm to increase the number of functions and cell types ascribed to a gene in bulk-tissue co-expression networks. RESULTS: We applied GMSCA to 27 co-expression networks derived from bulk-tissue gene expression profiling of a variety of brain tissues. Neurons and glial cells (microglia, astrocytes and oligodendrocytes) were considered the main cell types. Applying this approach, we increase the overall number of predicted triplets by 46.73%. Moreover, GMSCA predicts that the SNCA gene, traditionally associated to work mainly in neurons, also plays a relevant function in oligodendrocytes. AVAILABILITYAND IMPLEMENTATION: The tool is available at GitHub, https://github.com/drlaguna/GMSCA as open-source software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Reguladoras de Genes , Programas Informáticos , Humanos , Encéfalo , Perfilación de la Expresión Génica/métodos
14.
Sci Adv ; 6(24)2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32917675

RESUMEN

Growing evidence suggests that human gene annotation remains incomplete; however, it is unclear how this affects different tissues and our understanding of different disorders. Here, we detect previously unannotated transcription from Genotype-Tissue Expression RNA sequencing data across 41 human tissues. We connect this unannotated transcription to known genes, confirming that human gene annotation remains incomplete, even among well-studied genes including 63% of the Online Mendelian Inheritance in Man-morbid catalog and 317 neurodegeneration-associated genes. We find the greatest abundance of unannotated transcription in brain and genes highly expressed in brain are more likely to be reannotated. We explore examples of reannotated disease genes, such as SNCA, for which we experimentally validate a previously unidentified, brain-specific, potentially protein-coding exon. We release all tissue-specific transcriptomes through vizER: http://rytenlab.com/browser/app/vizER We anticipate that this resource will facilitate more accurate genetic analysis, with the greatest impact on our understanding of Mendelian and complex neurogenetic disorders.


Asunto(s)
Bases de Datos Genéticas , Transcriptoma , Exones , Humanos , Anotación de Secuencia Molecular , Análisis de Secuencia de ARN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...