Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Nat Commun ; 15(1): 3315, 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38632311

ABSTRACT

This study investigates the humoral and cellular immune responses and health-related quality of life measures in individuals with mild to moderate long COVID (LC) compared to age and gender matched recovered COVID-19 controls (MC) over 24 months. LC participants show elevated nucleocapsid IgG levels at 3 months, and higher neutralizing capacity up to 8 months post-infection. Increased spike-specific and nucleocapsid-specific CD4+ T cells, PD-1, and TIM-3 expression on CD4+ and CD8+ T cells were observed at 3 and 8 months, but these differences do not persist at 24 months. Some LC participants had detectable IFN-γ and IFN-ß, that was attributed to reinfection and antigen re-exposure. Single-cell RNA sequencing at the 24 month timepoint shows similar immune cell proportions and reconstitution of naïve T and B cell subsets in LC and MC. No significant differences in exhaustion scores or antigen-specific T cell clones are observed. These findings suggest resolution of immune activation in LC and return to comparable immune responses between LC and MC over time. Improvement in self-reported health-related quality of life at 24 months was also evident in the majority of LC (62%). PTX3, CRP levels and platelet count are associated with improvements in health-related quality of life.


Subject(s)
COVID-19 , Post-Acute COVID-19 Syndrome , Humans , CD8-Positive T-Lymphocytes , Quality of Life , SARS-CoV-2 , Antibodies, Viral
2.
Nat Commun ; 14(1): 7226, 2023 11 09.
Article in English | MEDLINE | ID: mdl-37940702

ABSTRACT

Genetic and environmental variation are key contributors during organism development, but the influence of minor perturbations or noise is difficult to assess. This study focuses on the stochastic variation in allele-specific expression that persists through cell divisions in the nine-banded armadillo (Dasypus novemcinctus). We investigated the blood transcriptome of five wild monozygotic quadruplets over time to explore the influence of developmental stochasticity on gene expression. We identify an enduring signal of autosomal allelic variability that distinguishes individuals within a quadruplet despite their genetic similarity. This stochastic allelic variation, akin to X-inactivation but broader, provides insight into non-genetic influences on phenotype. The presence of stochastically canalized allelic signatures represents a novel axis for characterizing organismal variability, complementing traditional approaches based on genetic and environmental factors. We also developed a model to explain the inconsistent penetrance associated with these stochastically canalized allelic expressions. By elucidating mechanisms underlying the persistence of allele-specific expression, we enhance understanding of development's role in shaping organismal diversity.


Subject(s)
Armadillos , Humans , Animals , Armadillos/physiology , Phenotype , Alleles , Penetrance
3.
Chem Senses ; 482023 01 01.
Article in English | MEDLINE | ID: mdl-37539767

ABSTRACT

The sweet taste receptor (STR) is a G protein-coupled receptor (GPCR) responsible for mediating cellular responses to sweet stimuli. Early evidence suggests that elements of the STR signaling system are present beyond the tongue in metabolically active tissues, where it may act as an extraoral glucose sensor. This study aimed to delineate expression of the STR in extraoral tissues using publicly available RNA-sequencing repositories. Gene expression data was mined for all genes implicated in the structure and function of the STR, and control genes including highly expressed metabolic genes in relevant tissues, other GPCRs and effector G proteins with physiological roles in metabolism, and other GPCRs with expression exclusively outside the metabolic tissues. Since the physiological role of the STR in extraoral tissues is likely related to glucose sensing, expression was then examined in diseases related to glucose-sensing impairment such as type 2 diabetes. An aggregate co-expression network was then generated to precisely determine co-expression patterns among the STR genes in these tissues. We found that STR gene expression was negligible in human pancreatic and adipose tissues, and low in intestinal tissue. Genes encoding the STR did not show significant co-expression or connectivity with other functional genes in these tissues. In addition, STR expression was higher in mouse pancreatic and adipose tissues, and equivalent to human in intestinal tissue. Our results suggest that STR expression in mice is not representative of expression in humans, and the receptor is unlikely to be a promising extraoral target in human cardiometabolic disease.


Subject(s)
Cardiovascular Diseases , Diabetes Mellitus, Type 2 , Taste Buds , Mice , Humans , Animals , Taste/physiology , Diabetes Mellitus, Type 2/genetics , Taste Buds/metabolism , Receptors, G-Protein-Coupled/metabolism , Gene Expression Profiling , Glucose/metabolism , Cardiovascular Diseases/metabolism
4.
Dev Cell ; 57(16): 1995-2008.e5, 2022 08 22.
Article in English | MEDLINE | ID: mdl-35914524

ABSTRACT

X-chromosome inactivation (XCI) is a random, permanent, and developmentally early epigenetic event that occurs during mammalian embryogenesis. We harness these features to investigate characteristics of early lineage specification events during human development. We initially assess the consistency of X-inactivation and establish a robust set of XCI-escape genes. By analyzing variance in XCI ratios across tissues and individuals, we find that XCI is shared across all tissues, suggesting that XCI is completed in the epiblast (in at least 6-16 cells) prior to specification of the germ layers. Additionally, we exploit tissue-specific variability to characterize the number of cells present during tissue-lineage commitment, ranging from approximately 20 cells in liver and whole blood tissues to 80 cells in brain tissues. By investigating the variability of XCI ratios using adult tissue, we characterize embryonic features of human XCI and lineage specification that are otherwise difficult to ascertain experimentally.


Subject(s)
Embryo, Mammalian , X Chromosome Inactivation , Adult , Animals , Chromosomes, Human, X/genetics , Humans , Mammals/genetics , X Chromosome Inactivation/genetics
5.
Genome Res ; 32(4): 738-749, 2022 04.
Article in English | MEDLINE | ID: mdl-35256454

ABSTRACT

The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.


Subject(s)
Genome, Human , Genomics , Consensus , Genomics/methods , Humans , RNA-Seq , Exome Sequencing
6.
Cardiovasc Res ; 117(10): 2216-2227, 2021 08 29.
Article in English | MEDLINE | ID: mdl-33002116

ABSTRACT

AIMS: Cardiac electrical activity is extraordinarily robust. However, when it goes wrong it can have fatal consequences. Electrical activity in the heart is controlled by the carefully orchestrated activity of more than a dozen different ion conductances. While there is considerable variability in cardiac ion channel expression levels between individuals, studies in rodents have indicated that there are modules of ion channels whose expression co-vary. The aim of this study was to investigate whether meta-analytic co-expression analysis of large-scale gene expression datasets could identify modules of co-expressed cardiac ion channel genes in human hearts that are of functional importance. METHODS AND RESULTS: Meta-analysis of 3653 public human RNA-seq datasets identified a strong correlation between expression of CACNA1C (L-type calcium current, ICaL) and KCNH2 (rapid delayed rectifier K+ current, IKr), which was also observed in human adult heart tissue samples. In silico modelling suggested that co-expression of CACNA1C and KCNH2 would limit the variability in action potential duration seen with variations in expression of ion channel genes and reduce susceptibility to early afterdepolarizations, a surrogate marker for proarrhythmia. We also found that levels of KCNH2 and CACNA1C expression are correlated in human-induced pluripotent stem cell-derived cardiac myocytes and the levels of CACNA1C and KCNH2 expression were inversely correlated with the magnitude of changes in repolarization duration following inhibition of IKr. CONCLUSION: Meta-analytic approaches of multiple independent human gene expression datasets can be used to identify gene modules that are important for regulating heart function. Specifically, we have verified that there is co-expression of CACNA1C and KCNH2 ion channel genes in human heart tissue, and in silico analyses suggest that CACNA1C-KCNH2 co-expression increases the robustness of cardiac electrical activity.


Subject(s)
Action Potentials , Arrhythmias, Cardiac/metabolism , Calcium Channels, L-Type/metabolism , ERG1 Potassium Channel/metabolism , Heart Rate , Induced Pluripotent Stem Cells/metabolism , Myocytes, Cardiac/metabolism , Arrhythmias, Cardiac/genetics , Arrhythmias, Cardiac/physiopathology , Arrhythmias, Cardiac/prevention & control , Calcium Channels, L-Type/genetics , Cells, Cultured , Databases, Genetic , ERG1 Potassium Channel/genetics , Humans , Models, Cardiovascular , RNA-Seq , Signal Transduction , Time Factors
7.
Mol Cell Proteomics ; 19(11): 1876-1895, 2020 11.
Article in English | MEDLINE | ID: mdl-32817346

ABSTRACT

Co-fractionation MS (CF-MS) is a technique with potential to characterize endogenous and unmanipulated protein complexes on an unprecedented scale. However this potential has been offset by a lack of guidelines for best-practice CF-MS data collection and analysis. To obtain such guidelines, this study thoroughly evaluates novel and published Saccharomyces cerevisiae CF-MS data sets using very high proteome coverage libraries of yeast gold standard complexes. A new method for identifying gold standard complexes in CF-MS data, Reference Complex Profiling, and the Extending 'Guilt-by-Association' by Degree (EGAD) R package are used for these evaluations, which are verified with concurrent analyses of published human data. By evaluating data collection designs, which involve fractionation of cell lysates, it is found that near-maximum recall of complexes can be achieved with fewer samples than published studies. Distributing sample collection across orthogonal fractionation methods, rather than a single high resolution data set, leads to particularly efficient recall. By evaluating 17 different similarity scoring metrics, which are central to CF-MS data analysis, it is found that two metrics rarely used in past CF-MS studies - Spearman and Kendall correlations - and the recently introduced Co-apex metric frequently maximize recall, whereas a popular metric-Euclidean distance-delivers poor recall. The common practice of integrating external genomic data into CF-MS data analysis is also evaluated, revealing that this practice may improve the precision and recall of known complexes but is generally unsuitable for predicting novel complexes in model organisms. If studying nonmodel organisms using orthologous genomic data, it is found that particular subsets of fractionation profiles (e.g. the lowest abundance quartile) should be excluded to minimize false discovery. These assessments are summarized in a series of universally applicable guidelines for precise, sensitive and efficient CF-MS studies of known complexes, and effective predictions of novel complexes for orthogonal experimental validation.


Subject(s)
Chemical Fractionation/methods , Mass Spectrometry/methods , Proteome/metabolism , Proteomics/methods , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Chromatography, Gel , Chromatography, Liquid/methods , Gene Ontology , Humans , Reference Standards
8.
Nucleic Acids Res ; 48(W1): W566-W571, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32392296

ABSTRACT

Co-expression analysis has provided insight into gene function in organisms from Arabidopsis to zebrafish. Comparison across species has the potential to enrich these results, for example by prioritizing among candidate human disease genes based on their network properties or by finding alternative model systems where their co-expression is conserved. Here, we present CoCoCoNet as a tool for identifying conserved gene modules and comparing co-expression networks. CoCoCoNet is a resource for both data and methods, providing gold standard networks and sophisticated tools for on-the-fly comparative analyses across 14 species. We show how CoCoCoNet can be used in two use cases. In the first, we demonstrate deep conservation of a nucleolus gene module across very divergent organisms, and in the second, we show how the heterogeneity of autism mechanisms in humans can be broken down by functional groups and translated to model organisms. CoCoCoNet is free to use and available to all at https://milton.cshl.edu/CoCoCoNet, with data and R scripts available at ftp://milton.cshl.edu/data.


Subject(s)
Gene Regulatory Networks , Software , Animals , Autism Spectrum Disorder/genetics , Gene Expression , Humans , RNA-Seq , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
9.
Genome Biol ; 20(1): 159, 2019 08 09.
Article in English | MEDLINE | ID: mdl-31399121

ABSTRACT

The use of the human reference genome has shaped methods and data across modern genomics. This has offered many benefits while creating a few constraints. In the following opinion, we outline the history, properties, and pitfalls of the current human reference genome. In a few illustrative analyses, we focus on its use for variant-calling, highlighting its nearness to a 'type specimen'. We suggest that switching to a consensus reference would offer important advantages over the continued use of the current reference with few disadvantages.


Subject(s)
Genomics/standards , Genome, Human , Humans , Reference Standards
10.
Proc Natl Acad Sci U S A ; 116(13): 6491-6500, 2019 03 26.
Article in English | MEDLINE | ID: mdl-30846554

ABSTRACT

Differential expression (DE) is commonly used to explore molecular mechanisms of biological conditions. While many studies report significant results between their groups of interest, the degree to which results are specific to the question at hand is not generally assessed, potentially leading to inaccurate interpretation. This could be particularly problematic for metaanalysis where replicability across datasets is taken as strong evidence for the existence of a specific, biologically relevant signal, but which instead may arise from recurrence of generic processes. To address this, we developed an approach to predict DE based on an analysis of over 600 studies. A predictor based on empirical prior probability of DE performs very well at this task (mean area under the receiver operating characteristic curve, ∼0.8), indicating that a large fraction of DE hit lists are nonspecific. In contrast, predictors based on attributes such as gene function, mutation rates, or network features perform poorly. Genes associated with sex, the extracellular matrix, the immune system, and stress responses are prominent within the "DE prior." In a series of control studies, we show that these patterns reflect shared biology rather than technical artifacts or ascertainment biases. Finally, we demonstrate the application of the DE prior to data interpretation in three use cases: (i) breast cancer subtyping, (ii) single-cell genomics of pancreatic islet cells, and (iii) metaanalysis of lung adenocarcinoma and renal transplant rejection transcriptomics. In all cases, we find hallmarks of generic DE, highlighting the need for nuanced interpretation of gene phenotypic associations.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation , Human Genetics , Probability , Adenocarcinoma/genetics , Biomarkers, Tumor/genetics , Breast Neoplasms/genetics , Electronic Data Processing , Female , Gene Regulatory Networks , Genes, Essential , Genomics , Graft Rejection , Humans , Kidney Transplantation , Lung Neoplasms , ROC Curve , Recurrence , Sensitivity and Specificity , Transcriptome
11.
Nucleic Acids Res ; 46(10): 5125-5138, 2018 06 01.
Article in English | MEDLINE | ID: mdl-29718481

ABSTRACT

Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR's performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur.


Subject(s)
Gene Expression , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Software , Algorithms , Chromosomes, Human, Y , Databases, Genetic , Female , Humans , Male , Sex Factors
12.
Nat Commun ; 9(1): 884, 2018 02 28.
Article in English | MEDLINE | ID: mdl-29491377

ABSTRACT

Single-cell RNA-sequencing (scRNA-seq) technology provides a new avenue to discover and characterize cell types; however, the experiment-specific technical biases and analytic variability inherent to current pipelines may undermine its replicability. Meta-analysis is further hampered by the use of ad hoc naming conventions. Here we demonstrate our replication framework, MetaNeighbor, that quantifies the degree to which cell types replicate across datasets, and enables rapid identification of clusters with high similarity. We first measure the replicability of neuronal identity, comparing results across eight technically and biologically diverse datasets to define best practices for more complex assessments. We then apply this to novel interneuron subtypes, finding that 24/45 subtypes have evidence of replication, which enables the identification of robust candidate marker genes. Across tasks we find that large sets of variably expressed genes can identify replicable cell types with high accuracy, suggesting a general route forward for large-scale evaluation of scRNA-seq data.


Subject(s)
Neurons/metabolism , RNA/genetics , Computational Biology , Gene Expression Profiling , Humans , Neurons/cytology , RNA/metabolism , Sequence Analysis, RNA , Single-Cell Analysis
13.
Genome Med ; 9(1): 64, 2017 07 07.
Article in English | MEDLINE | ID: mdl-28687074

ABSTRACT

BACKGROUND: Disagreements over genetic signatures associated with disease have been particularly prominent in the field of psychiatric genetics, creating a sharp divide between disease burdens attributed to common and rare variation, with study designs independently targeting each. Meta-analysis within each of these study designs is routine, whether using raw data or summary statistics, but combining results across study designs is atypical. However, tests of functional convergence are used across all study designs, where candidate gene sets are assessed for overlaps with previously known properties. This suggests one possible avenue for combining not study data, but the functional conclusions that they reach. METHOD: In this work, we test for functional convergence in autism spectrum disorder (ASD) across different study types, and specifically whether the degree to which a gene is implicated in autism is correlated with the degree to which it drives functional convergence. Because different study designs are distinguishable by their differences in effect size, this also provides a unified means of incorporating the impact of study design into the analysis of convergence. RESULTS: We detected remarkably significant positive trends in aggregate (p < 2.2e-16) with 14 individually significant properties (false discovery rate <0.01), many in areas researchers have targeted based on different reasoning, such as the fragile X mental retardation protein (FMRP) interactor enrichment (false discovery rate 0.003). We are also able to detect novel technical effects and we see that network enrichment from protein-protein interaction data is heavily confounded with study design, arising readily in control data. CONCLUSIONS: We see a convergent functional signal for a subset of known and novel functions in ASD from all sources of genetic variation. Meta-analytic approaches explicitly accounting for different study designs can be adapted to other diseases to discover novel functional associations and increase statistical power.


Subject(s)
Autism Spectrum Disorder/genetics , Genomics/methods , Meta-Analysis as Topic , Mutation , Polymorphism, Genetic , Female , Fragile X Mental Retardation Protein/genetics , Genetic Predisposition to Disease , Humans , Male , Models, Genetic
14.
Nucleic Acids Res ; 45(4): e20, 2017 02 28.
Article in English | MEDLINE | ID: mdl-28204549

ABSTRACT

Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper, we evaluate the robustness and uniqueness of enrichment results as a means of assessing methods even where correctness is unknown. We show that heavily annotated ('multifunctional') genes are likely to appear in genomics study results and drive the generation of biologically non-specific enrichment results as well as highly fragile significances. By providing a means of determining where enrichment analyses report non-specific and non-robust findings, we are able to assess where we can be confident in their use. We find significant progress in recent bias correction methods for enrichment and provide our own software implementation. Our approach can be readily adapted to any pre-existing package.


Subject(s)
Genes , Genomics/methods , Algorithms , Animals , Autistic Disorder/genetics , Cell Hypoxia/genetics , Gene Expression , Gene Ontology , Genome-Wide Association Study , Humans , Mice , Molecular Sequence Annotation , Octamer Transcription Factor-3/metabolism , Schizophrenia/genetics , Software
15.
Bioinformatics ; 33(4): 612-614, 2017 02 15.
Article in English | MEDLINE | ID: mdl-27993773

ABSTRACT

Summary: Evaluating gene networks with respect to known biology is a common task but often a computationally costly one. Many computational experiments are difficult to apply exhaustively in network analysis due to run-times. To permit high-throughput analysis of gene networks, we have implemented a set of very efficient tools to calculate functional properties in networks based on guilt-by-association methods. ( xtending ' uilt-by- ssociation' by egree) allows gene networks to be evaluated with respect to hundreds or thousands of gene sets. The methods predict novel members of gene groups, assess how well a gene network groups known sets of genes, and determines the degree to which generic predictions drive performance. By allowing fast evaluations, whether of random sets or real functional ones, provides the user with an assessment of performance which can easily be used in controlled evaluations across many parameters. Availability and Implementation: The software package is freely available at https://github.com/sarbal/EGAD and implemented for use in R and Matlab. The package is also freely available under the LGPL license from the Bioconductor web site ( http://bioconductor.org ). Contact: JGillis@cshl.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Gene Regulatory Networks , Software , Animals , Humans , Saccharomyces cerevisiae/genetics
16.
PLoS One ; 11(7): e0160098, 2016.
Article in English | MEDLINE | ID: mdl-27467773

ABSTRACT

The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63-0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited.


Subject(s)
Databases, Protein , Gene Regulatory Networks , Ligands
17.
Genome Biol ; 17: 101, 2016 May 06.
Article in English | MEDLINE | ID: mdl-27165153

ABSTRACT

BACKGROUND: Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks. RESULTS: We perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data. CONCLUSIONS: Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.


Subject(s)
Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Animals , Cell Separation/methods , Gene Expression Profiling/standards , Gene Regulatory Networks , Mice , Reproducibility of Results , Sequence Analysis, RNA/standards , Single-Cell Analysis/standards
18.
PLoS Comput Biol ; 12(4): e1004868, 2016 Apr.
Article in English | MEDLINE | ID: mdl-27082953

ABSTRACT

In addition to detecting novel transcripts and higher dynamic range, a principal claim for RNA-sequencing has been greater replicability, typically measured in sample-sample correlations of gene expression levels. Through a re-analysis of ENCODE data, we show that replicability of transcript abundances will provide misleading estimates of the replicability of conditional variation in transcript abundances (i.e., most expression experiments). Heuristics which implicitly address this problem have emerged in quality control measures to obtain 'good' differential expression results. However, these methods involve strict filters such as discarding low expressing genes or using technical replicates to remove discordant transcripts, and are costly or simply ad hoc. As an alternative, we model gene-level replicability of differential activity using co-expressing genes. We find that sets of housekeeping interactions provide a sensitive means of estimating the replicability of expression changes, where the co-expressing pair can be regarded as pseudo-replicates of one another. We model the effects of noise that perturbs a gene's expression within its usual distribution of values and show that perturbing expression by only 5% within that range is readily detectable (AUROC~0.73). We have made our method available as a set of easily implemented R scripts.


Subject(s)
Sequence Analysis, RNA/statistics & numerical data , Computational Biology , Databases, Nucleic Acid/statistics & numerical data , Gene Expression , Humans , Models, Statistical , Quality Control , Reproducibility of Results , Sequence Analysis, RNA/standards , Signal-To-Noise Ratio
19.
BMC Med Genomics ; 8 Suppl 2: S1, 2015.
Article in English | MEDLINE | ID: mdl-26044129

ABSTRACT

BACKGROUND: Coronary artery disease (CAD), one of the leading causes of death globally, is influenced by both environmental and genetic risk factors. Gene-centric genome-wide association studies (GWAS) involving cases and controls have been remarkably successful in identifying genetic loci contributing to CAD. Modern in silico platforms, such as candidate gene prediction tools, permit a systematic analysis of GWAS data to identify candidate genes for complex diseases like CAD. Subsequent integration of drug-target data from drug databases with the predicted candidate genes can potentially identify novel therapeutics suitable for repositioning towards treatment of CAD. METHODS: Previously, we were able to predict 264 candidate genes and 104 potential therapeutic targets for CAD using Gentrepid (http://www.gentrepid.org), a candidate gene prediction platform with two bioinformatic modules to reanalyze Wellcome Trust Case-Control Consortium GWAS data. In an expanded study, using five bioinformatic modules on the same data, Gentrepid predicted 647 candidate genes and successfully replicated 55% of the candidate genes identified by the more powerful CARDIoGRAMplusC4D consortium meta-analysis. Hence, Gentrepid was capable of enhancing lower quality genotype-phenotype data, using an independent knowledgebase of existing biological data. Here, we used our methodology to integrate drug data from three drug databases: the Therapeutic Target Database, PharmGKB and Drug Bank, with the 647 candidate gene predictions from Gentrepid. We utilized known CAD targets, the scientific literature, existing drug data and the CARDIoGRAMplusC4D meta-analysis study as benchmarks to validate Gentrepid predictions for CAD. RESULTS: Our analysis identified a total of 184 predicted candidate genes as novel therapeutic targets for CAD, and 981 novel therapeutics feasible for repositioning in clinical trials towards treatment of CAD. The benchmarks based on known CAD targets and the scientific literature showed that our results were significant (p < 0.05). CONCLUSIONS: We have demonstrated that available drugs may potentially be repositioned as novel therapeutics for the treatment of CAD. Drug repositioning can save valuable time and money spent on preclinical and phase I clinical studies.


Subject(s)
Coronary Artery Disease/genetics , Coronary Artery Disease/therapy , Genome-Wide Association Study , Case-Control Studies , Clinical Trials as Topic , Databases as Topic , Humans , Molecular Targeted Therapy , Reproducibility of Results , Software
20.
BMC Med Genomics ; 7 Suppl 1: S8, 2014.
Article in English | MEDLINE | ID: mdl-25077696

ABSTRACT

BACKGROUND: Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data to the clinic has not kept pace. Over the past 60 years, pharmaceutical companies have successfully demonstrated the safety and efficacy of over 1,200 novel therapeutic drugs via costly clinical studies. While this process must continue, better use can be made of the existing valuable data. In silico tools such as candidate gene prediction systems allow rapid identification of disease genes by identifying the most probable candidate genes linked to genetic markers of the disease or phenotype under investigation. Integration of drug-target data with candidate gene prediction systems can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on preclinical studies and phase I clinical trials. METHODS: We previously used Gentrepid (http://www.gentrepid.org) as a platform to predict 1,497 candidate genes for the seven complex diseases considered in the Wellcome Trust Case-Control Consortium genome-wide association study; namely Type 2 Diabetes, Bipolar Disorder, Crohn's Disease, Hypertension, Type 1 Diabetes, Coronary Artery Disease and Rheumatoid Arthritis. Here, we adopted a simple approach to integrate drug data from three publicly available drug databases: the Therapeutic Target Database, the Pharmacogenomics Knowledgebase and DrugBank; with candidate gene predictions from Gentrepid at the systems level. RESULTS: Using the publicly available drug databases as sources of drug-target association data, we identified a total of 428 candidate genes as novel therapeutic targets for the seven phenotypes of interest, and 2,130 drugs feasible for repositioning against the predicted novel targets. CONCLUSIONS: By integrating genetic, bioinformatic and drug data, we have demonstrated that currently available drugs may be repositioned as novel therapeutics for the seven diseases studied here, quickly taking advantage of prior work in pharmaceutics to translate ground-breaking results in genetics to clinical treatments.


Subject(s)
Disease/genetics , Genome-Wide Association Study , Molecular Targeted Therapy/methods , Databases, Pharmaceutical , Drug Approval , Drug Discovery , Feasibility Studies , Genetic Loci/genetics , Humans , United States , United States Food and Drug Administration
SELECTION OF CITATIONS
SEARCH DETAIL
...