Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 170
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 173(2): 321-337.e10, 2018 04 05.
Article in English | MEDLINE | ID: mdl-29625050

ABSTRACT

Genetic alterations in signaling pathways that control cell-cycle progression, apoptosis, and cell growth are common hallmarks of cancer, but the extent, mechanisms, and co-occurrence of alterations in these pathways differ between individual tumors and tumor types. Using mutations, copy-number changes, mRNA expression, gene fusions and DNA methylation in 9,125 tumors profiled by The Cancer Genome Atlas (TCGA), we analyzed the mechanisms and patterns of somatic alterations in ten canonical pathways: cell cycle, Hippo, Myc, Notch, Nrf2, PI-3-Kinase/Akt, RTK-RAS, TGFß signaling, p53 and ß-catenin/Wnt. We charted the detailed landscape of pathway alterations in 33 cancer types, stratified into 64 subtypes, and identified patterns of co-occurrence and mutual exclusivity. Eighty-nine percent of tumors had at least one driver alteration in these pathways, and 57% percent of tumors had at least one alteration potentially targetable by currently available drugs. Thirty percent of tumors had multiple targetable alterations, indicating opportunities for combination therapy.


Subject(s)
Databases, Genetic , Neoplasms/pathology , Signal Transduction/genetics , Genes, Neoplasm , Humans , Neoplasms/genetics , Phosphatidylinositol 3-Kinases/genetics , Phosphatidylinositol 3-Kinases/metabolism , Transforming Growth Factor beta/genetics , Transforming Growth Factor beta/metabolism , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/metabolism , Wnt Proteins/genetics , Wnt Proteins/metabolism
2.
Am J Hum Genet ; 111(1): 11-23, 2024 Jan 04.
Article in English | MEDLINE | ID: mdl-38181729

ABSTRACT

Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine. This living resource currently has more than 200,000 participants with ongoing recruitment. We highlight the clinical, laboratory, regulatory, and HIPAA-compliant informatics infrastructure along with our stakeholder engagement, consent, recontact, and participant engagement strategies. We characterize aspects of genetic and geographic diversity unique to the Rocky Mountain region, the primary catchment area for CCPM Biobank participants. We leverage linked health and demographic information of the CCPM Biobank participant population to demonstrate the utility of the CCPM Biobank to replicate complex trait associations in the first 33,674 genotyped individuals across multiple disease domains. Finally, we describe our current efforts toward return of clinical genetic test results, including high-impact pathogenic variants and pharmacogenetic information, and our broader goals as the CCPM Biobank continues to grow. Bringing clinical and research interests together fosters unique clinical and translational questions that can be addressed from the large EHR-linked CCPM Biobank resource within a HIPAA- and CLIA-certified environment.


Subject(s)
Learning Health System , Precision Medicine , Humans , Biological Specimen Banks , Colorado , Genomics
3.
Nat Methods ; 20(6): 803-814, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37248386

ABSTRACT

High-throughput profiling methods (such as genomics or imaging) have accelerated basic research and made deep molecular characterization of patient samples routine. These approaches provide a rich portrait of genes, molecular pathways and cell types involved in disease phenotypes. Machine learning (ML) can be a useful tool for extracting disease-relevant patterns from high-dimensional datasets. However, depending upon the complexity of the biological question, machine learning often requires many samples to identify recurrent and biologically meaningful patterns. Rare diseases are inherently limited in clinical cases, leading to few samples to study. In this Perspective, we outline the challenges and emerging solutions for using ML for small sample sets, specifically in rare diseases. Advances in ML methods for rare diseases are likely to be informative for applications beyond rare diseases for which few samples exist with high-dimensional data. We propose that the method community prioritize the development of ML techniques for rare disease research.


Subject(s)
Machine Learning , Rare Diseases , Humans , Rare Diseases/genetics , Genomics/methods
4.
Nat Rev Genet ; 21(10): 615-629, 2020 10.
Article in English | MEDLINE | ID: mdl-32694666

ABSTRACT

Data sharing anchors reproducible science, but expectations and best practices are often nebulous. Communities of funders, researchers and publishers continue to grapple with what should be required or encouraged. To illuminate the rationales for sharing data, the technical challenges and the social and cultural challenges, we consider the stakeholders in the scientific enterprise. In biomedical research, participants are key among those stakeholders. Ethical sharing requires considering both the value of research efforts and the privacy costs for participants. We discuss current best practices for various types of genomic data, as well as opportunities to promote ethical data sharing that accelerates science by aligning incentives.


Subject(s)
Biomedical Research/methods , Biomedical Research/trends , Genomics/ethics , Information Dissemination/ethics , Research Personnel/trends , Cooperative Behavior , Humans , Privacy
5.
PLoS Biol ; 20(2): e3001470, 2022 02.
Article in English | MEDLINE | ID: mdl-35104289

ABSTRACT

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.


Subject(s)
Language , Peer Review, Research , Preprints as Topic , Biomedical Research , Publications/standards , Terminology as Topic
6.
Nucleic Acids Res ; 51(W1): W350-W356, 2023 07 05.
Article in English | MEDLINE | ID: mdl-37070209

ABSTRACT

Gene definitions and identifiers can be painful to manage-more so when trying to include gene function annotations as this can be highly context-dependent. Creating groups of genes or gene sets can help provide such context, but it compounds the issue as each gene within the gene set can map to multiple identifiers and have annotations derived from multiple sources. We developed MyGeneset.info to provide an API for integrated annotations for gene sets suitable for use in analytical pipelines or web servers. Leveraging our previous work with MyGene.info (a server that provides gene-centric annotations and identifiers), MyGeneset.info addresses the challenge of managing gene sets from multiple resources. With our API, users readily have read-only access to gene sets imported from commonly-used resources such as Wikipathways, CTD, Reactome, SMPDB, MSigDB, GO, and DO. In addition to supporting the access and reuse of approximately 180k gene sets from humans, common model organisms (mice, yeast, etc.), and less-common ones (e.g. black cottonwood tree), MyGeneset.info supports user-created gene sets, providing an important means for making gene sets more FAIR. User-created gene sets can serve as a way to store and manage collections for analysis or easy dissemination through a consistent API.


Subject(s)
Internet , Software , Humans , Animals , Mice , Molecular Sequence Annotation , User-Computer Interface
7.
PLoS Biol ; 19(10): e3001419, 2021 10.
Article in English | MEDLINE | ID: mdl-34618807

ABSTRACT

Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.


Subject(s)
Computational Biology , Budgets , Cooperative Behavior , Humans , Interdisciplinary Research , Mentoring , Motivation , Publications , Reward , Software
8.
PLoS Comput Biol ; 19(3): e1010984, 2023 03.
Article in English | MEDLINE | ID: mdl-36972227

ABSTRACT

Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.


Subject(s)
Gene Expression , Nonlinear Dynamics , Gene Expression Profiling , Neural Networks, Computer , Linear Models
9.
Bioinformatics ; 38(22): 5129-5130, 2022 11 15.
Article in English | MEDLINE | ID: mdl-36193991

ABSTRACT

MOTIVATION: Domain adaptation allows for the development of predictive models even in cases with limited sample data. Weighted elastic net domain adaptation specifically leverages features of genomic data to maximize transferability but the method is too computationally demanding to apply to many genome-sized datasets. RESULTS: We developed wenda_gpu, which uses GPyTorch to train models on genomic data within hours on a single GPU-enabled machine. We show that wenda_gpu returns comparable results to the original wenda implementation, and that it can be used for improved prediction of cancer mutation status on small sample sizes than regular elastic net. AVAILABILITY AND IMPLEMENTATION: wenda_gpu is available on GitHub at https://github.com/greenelab/wenda_gpu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neoplasms , Software , Humans , Genomics/methods , Neoplasms/genetics , Sample Size
10.
Proc Natl Acad Sci U S A ; 117(6): 3167-3173, 2020 02 11.
Article in English | MEDLINE | ID: mdl-31980538

ABSTRACT

Pseudomonas aeruginosa strains with loss-of-function mutations in the transcription factor LasR are frequently encountered in the clinic and the environment. Among the characteristics common to LasR-defective (LasR-) strains is increased activity of the transcription factor Anr, relative to their LasR+ counterparts, in low-oxygen conditions. One of the Anr-regulated genes found to be highly induced in LasR- strains was PA14_42860 (PA1673), which we named mhr for microoxic hemerythrin. Purified P. aeruginosa Mhr protein contained the predicted di-iron center and bound molecular oxygen with an apparent Kd of ∼1 µM. Both Anr and Mhr were necessary for fitness in lasR+ and lasR mutant strains in colony biofilms grown in microoxic conditions, and the effects were more striking in the lasR mutant. Among genes in the Anr regulon, mhr was most closely coregulated with the Anr-controlled high-affinity cytochrome c oxidase genes. In the absence of high-affinity cytochrome c oxidases, deletion of mhr no longer caused a fitness disadvantage, suggesting that Mhr works in concert with microoxic respiration. We demonstrate that Anr and Mhr contribute to LasR- strain fitness even in biofilms grown in normoxic conditions. Furthermore, metabolomics data indicate that, in a lasR mutant, expression of Anr-regulated mhr leads to differences in metabolism in cells grown on lysogeny broth or artificial sputum medium. We propose that increased Anr activity leads to higher levels of the oxygen-binding protein Mhr, which confers an advantage to lasR mutants in microoxic conditions.


Subject(s)
Bacterial Proteins/metabolism , Cell Hypoxia/genetics , Genetic Fitness/genetics , Hemerythrin/metabolism , Pseudomonas aeruginosa , Trans-Activators/metabolism , Bacterial Proteins/genetics , Hemerythrin/genetics , Oxygen/metabolism , Pseudomonas aeruginosa/genetics , Pseudomonas aeruginosa/metabolism , Pseudomonas aeruginosa/physiology , Trans-Activators/genetics
11.
PLoS Comput Biol ; 17(8): e1009290, 2021 08.
Article in English | MEDLINE | ID: mdl-34428202

ABSTRACT

Single-cell RNA-sequencing (scRNA-seq) has made it possible to profile gene expression in tissues at high resolution. An important preprocessing step prior to performing downstream analyses is to identify and remove cells with poor or degraded sample quality using quality control (QC) metrics. Two widely used QC metrics to identify a 'low-quality' cell are (i) if the cell includes a high proportion of reads that map to mitochondrial DNA (mtDNA) encoded genes and (ii) if a small number of genes are detected. Current best practices use these QC metrics independently with either arbitrary, uniform thresholds (e.g. 5%) or biological context-dependent (e.g. species) thresholds, and fail to jointly model these metrics in a data-driven manner. Current practices are often overly stringent and especially untenable on certain types of tissues, such as archived tumor tissues, or tissues associated with mitochondrial function, such as kidney tissue [1]. We propose a data-driven QC metric (miQC) that jointly models both the proportion of reads mapping to mtDNA genes and the number of detected genes with mixture models in a probabilistic framework to predict the low-quality cells in a given dataset. We demonstrate how our QC metric easily adapts to different types of single-cell datasets to remove low-quality cells while preserving high-quality cells that can be used for downstream analyses. Our software package is available at https://bioconductor.org/packages/miQC.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Probability , Quality Control , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , DNA, Mitochondrial/genetics , Humans
12.
Nucleic Acids Res ; 48(9): 4709-4724, 2020 05 21.
Article in English | MEDLINE | ID: mdl-32319526

ABSTRACT

Alternative splicing (AS) is frequent during early mouse embryonic development. Specific histone post-translational modifications (hPTMs) have been shown to regulate exon splicing by either directly recruiting splice machinery or indirectly modulating transcriptional elongation. In this study, we hypothesized that hPTMs regulate expression of alternatively spliced genes for specific processes during differentiation. To address this notion, we applied an innovative machine learning approach to relate global hPTM enrichment to AS regulation during mammalian tissue development. We found that specific hPTMs, H3K36me3 and H3K4me1, play a role in skipped exon selection among all the tissues and developmental time points examined. In addition, we used iterative random forest model and found that interactions of multiple hPTMs most strongly predicted splicing when they included H3K36me3 and H3K4me1. Collectively, our data demonstrated a link between hPTMs and alternative splicing which will drive further experimental studies on the functional relevance of these modifications to alternative splicing.


Subject(s)
Alternative Splicing , Embryonic Development/genetics , Exons , Histone Code , Animals , Logistic Models , Machine Learning , Mice , Protein Processing, Post-Translational
13.
Genet Epidemiol ; 44(1): 52-66, 2020 01.
Article in English | MEDLINE | ID: mdl-31583758

ABSTRACT

Genetic interactions have been recognized as a potentially important contributor to the heritability of complex diseases. Nevertheless, due to small effect sizes and stringent multiple-testing correction, identifying genetic interactions in complex diseases is particularly challenging. To address the above challenges, many genomic research initiatives collaborate to form large-scale consortia and develop open access to enable sharing of genome-wide association study (GWAS) data. Despite the perceived benefits of data sharing from large consortia, a number of practical issues have arisen, such as privacy concerns on individual genomic information and heterogeneous data sources from distributed GWAS databases. In the context of large consortia, we demonstrate that the heterogeneously appearing marginal effects over distributed GWAS databases can offer new insights into genetic interactions for which conventional methods have had limited success. In this paper, we develop a novel two-stage testing procedure, named phylogenY-based effect-size tests for interactions using first 2 moments (YETI2), to detect genetic interactions through both pooled marginal effects, in terms of averaging site-specific marginal effects, and heterogeneity in marginal effects across sites, using a meta-analytic framework. YETI2 can not only be applied to large consortia without shared personal information but also can be used to leverage underlying heterogeneity in marginal effects to prioritize potential genetic interactions. We investigate the performance of YETI2 through simulation studies and apply YETI2 to bladder cancer data from dbGaP.


Subject(s)
Epistasis, Genetic/genetics , Genome-Wide Association Study/methods , Urinary Bladder Neoplasms/genetics , Humans , Information Dissemination , Models, Genetic , Polymorphism, Single Nucleotide/genetics
14.
Trends Genet ; 34(10): 790-805, 2018 10.
Article in English | MEDLINE | ID: mdl-30143323

ABSTRACT

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.


Subject(s)
Data Interpretation, Statistical , Genomics/statistics & numerical data , Proteomics/statistics & numerical data , Algorithms , Humans , Systems Biology/statistics & numerical data
16.
Bioinformatics ; 35(9): 1518-1526, 2019 05 01.
Article in English | MEDLINE | ID: mdl-30247517

ABSTRACT

MOTIVATION: Decreasing costs are making it feasible to perform time series proteomics and genomics experiments with more replicates and higher resolution than ever before. With more replicates and time points, proteome and genome-wide patterns of expression are more readily discernible. These larger experiments require more batches exacerbating batch effects and increasing the number of bias trends. In the case of proteomics, where methods frequently result in missing data this increasing scale is also decreasing the number of peptides observed in all samples. The sources of batch effects and missing data are incompletely understood necessitating novel techniques. RESULTS: Here we show that by exploiting the structure of time series experiments, it is possible to accurately and reproducibly model and remove batch effects. We implement Learning and Imputation for Mass-spec Bias Reduction (LIMBR) software, which builds on previous block-based models of batch effects and includes features specific to time series and circadian studies. To aid in the analysis of time series proteomics experiments, which are often plagued with missing data points, we also integrate an imputation system. By building LIMBR for imputation and time series tailored bias modeling into one straightforward software package, we expect that the quality and ease of large-scale proteomics and genomics time series experiments will be significantly increased. AVAILABILITY AND IMPLEMENTATION: Python code and documentation is available for download at https://github.com/aleccrowell/LIMBR and LIMBR can be downloaded and installed with dependencies using 'pip install limbr'. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Genome , Genomics , Mass Spectrometry , Proteomics
17.
PLoS Comput Biol ; 15(6): e1007128, 2019 06.
Article in English | MEDLINE | ID: mdl-31233491

ABSTRACT

Open, collaborative research is a powerful paradigm that can immensely strengthen the scientific process by integrating broad and diverse expertise. However, traditional research and multi-author writing processes break down at scale. We present new software named Manubot, available at https://manubot.org, to address the challenges of open scholarly writing. Manubot adopts the contribution workflow used by many large-scale open source software projects to enable collaborative authoring of scholarly manuscripts. With Manubot, manuscripts are written in Markdown and stored in a Git repository to precisely track changes over time. By hosting manuscript repositories publicly, such as on GitHub, multiple authors can simultaneously propose and review changes. A cloud service automatically evaluates proposed changes to catch errors. Publication with Manubot is continuous: When a manuscript's source changes, the rendered outputs are rebuilt and republished to a web page. Manubot automates bibliographic tasks by implementing citation by identifier, where users cite persistent identifiers (e.g. DOIs, PubMed IDs, ISBNs, URLs), whose metadata is then retrieved and converted to a user-specified style. Manubot modernizes publishing to align with the ideals of open science by making it transparent, reproducible, immediate, versioned, collaborative, and free of charge.


Subject(s)
Publishing , Software , Writing , Humans , Manuscripts, Medical as Topic
19.
J Bacteriol ; 200(8)2018 04 15.
Article in English | MEDLINE | ID: mdl-29311282

ABSTRACT

The Pseudomonas fluorescens genome encodes more than 50 proteins predicted to be involved in c-di-GMP signaling. Here, we demonstrated that, tested across 188 nutrients, these enzymes and effectors appeared capable of impacting biofilm formation. Transcriptional analysis of network members across ∼50 nutrient conditions indicates that altered gene expression can explain a subset of but not all biofilm formation responses to the nutrients. Additional organization of the network is likely achieved through physical interaction, as determined via probing ∼2,000 interactions by bacterial two-hybrid assays. Our analysis revealed a multimodal regulatory strategy using combinations of ligand-mediated signals, protein-protein interaction, and/or transcriptional regulation to fine-tune c-di-GMP-mediated responses. These results create a profile of a large c-di-GMP network that is used to make important cellular decisions, opening the door to future model building and the ability to engineer this complex circuitry in other bacteria.IMPORTANCE Cyclic diguanylate (c-di-GMP) is a key signaling molecule regulating bacterial biofilm formation, and many microbes have up to dozens of proteins that make, break, or bind this dinucleotide. A major open issue in the field is how signaling specificity is conferred in the unpartitioned space of a bacterial cell. Here, we took a systems approach, using mutational analysis, transcriptional studies, and bacterial two-hybrid analysis to interrogate this network. We found that a majority of enzymes are capable of impacting biofilm formation in a context-dependent manner, and we revealed examples of two or more modes of regulation (i.e., transcriptional control with protein-protein interaction) being utilized to generate an observable impact on biofilm formation.


Subject(s)
Biofilms/growth & development , Cyclic GMP/analogs & derivatives , Gene Expression Regulation, Bacterial , Pseudomonas fluorescens/growth & development , Cyclic GMP/genetics , Gene Expression Profiling , Pseudomonas fluorescens/genetics , Signal Transduction , Two-Hybrid System Techniques
20.
Hum Mol Genet ; 25(R2): R94-R98, 2016 Oct 01.
Article in English | MEDLINE | ID: mdl-27340225

ABSTRACT

One way to design a drug is to attempt to phenocopy a genetic variant that is known to have the desired effect. In general, drugs that are supported by genetic associations progress further in the development pipeline. However, the number of associations that are candidates for development into drugs is limited because many associations are in non-coding regions or difficult to target genes. Approaches that overlay information from pathway databases or biological networks can expand the potential target list. In cases where the initial variant is not targetable or there is no variant with the desired effect, this may reveal new means to target a disease. In this review, we discuss recent examples in the domain of pathway and network-based drug repositioning from genetic associations. We highlight important caveats and challenges for the field, and we discuss opportunities for further development.

SELECTION OF CITATIONS
SEARCH DETAIL