Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 170
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 173(2): 321-337.e10, 2018 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-29625050

RESUMO

Genetic alterations in signaling pathways that control cell-cycle progression, apoptosis, and cell growth are common hallmarks of cancer, but the extent, mechanisms, and co-occurrence of alterations in these pathways differ between individual tumors and tumor types. Using mutations, copy-number changes, mRNA expression, gene fusions and DNA methylation in 9,125 tumors profiled by The Cancer Genome Atlas (TCGA), we analyzed the mechanisms and patterns of somatic alterations in ten canonical pathways: cell cycle, Hippo, Myc, Notch, Nrf2, PI-3-Kinase/Akt, RTK-RAS, TGFß signaling, p53 and ß-catenin/Wnt. We charted the detailed landscape of pathway alterations in 33 cancer types, stratified into 64 subtypes, and identified patterns of co-occurrence and mutual exclusivity. Eighty-nine percent of tumors had at least one driver alteration in these pathways, and 57% percent of tumors had at least one alteration potentially targetable by currently available drugs. Thirty percent of tumors had multiple targetable alterations, indicating opportunities for combination therapy.


Assuntos
Bases de Dados Genéticas , Neoplasias/patologia , Transdução de Sinais/genética , Genes Neoplásicos , Humanos , Neoplasias/genética , Fosfatidilinositol 3-Quinases/genética , Fosfatidilinositol 3-Quinases/metabolismo , Fator de Crescimento Transformador beta/genética , Fator de Crescimento Transformador beta/metabolismo , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo , Proteínas Wnt/genética , Proteínas Wnt/metabolismo
2.
Am J Hum Genet ; 111(1): 11-23, 2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38181729

RESUMO

Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine. This living resource currently has more than 200,000 participants with ongoing recruitment. We highlight the clinical, laboratory, regulatory, and HIPAA-compliant informatics infrastructure along with our stakeholder engagement, consent, recontact, and participant engagement strategies. We characterize aspects of genetic and geographic diversity unique to the Rocky Mountain region, the primary catchment area for CCPM Biobank participants. We leverage linked health and demographic information of the CCPM Biobank participant population to demonstrate the utility of the CCPM Biobank to replicate complex trait associations in the first 33,674 genotyped individuals across multiple disease domains. Finally, we describe our current efforts toward return of clinical genetic test results, including high-impact pathogenic variants and pharmacogenetic information, and our broader goals as the CCPM Biobank continues to grow. Bringing clinical and research interests together fosters unique clinical and translational questions that can be addressed from the large EHR-linked CCPM Biobank resource within a HIPAA- and CLIA-certified environment.


Assuntos
Sistema de Aprendizagem em Saúde , Medicina de Precisão , Humanos , Bancos de Espécimes Biológicos , Colorado , Genômica
3.
Nat Methods ; 20(6): 803-814, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37248386

RESUMO

High-throughput profiling methods (such as genomics or imaging) have accelerated basic research and made deep molecular characterization of patient samples routine. These approaches provide a rich portrait of genes, molecular pathways and cell types involved in disease phenotypes. Machine learning (ML) can be a useful tool for extracting disease-relevant patterns from high-dimensional datasets. However, depending upon the complexity of the biological question, machine learning often requires many samples to identify recurrent and biologically meaningful patterns. Rare diseases are inherently limited in clinical cases, leading to few samples to study. In this Perspective, we outline the challenges and emerging solutions for using ML for small sample sets, specifically in rare diseases. Advances in ML methods for rare diseases are likely to be informative for applications beyond rare diseases for which few samples exist with high-dimensional data. We propose that the method community prioritize the development of ML techniques for rare disease research.


Assuntos
Aprendizado de Máquina , Doenças Raras , Humanos , Doenças Raras/genética , Genômica/métodos
4.
Nat Rev Genet ; 21(10): 615-629, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32694666

RESUMO

Data sharing anchors reproducible science, but expectations and best practices are often nebulous. Communities of funders, researchers and publishers continue to grapple with what should be required or encouraged. To illuminate the rationales for sharing data, the technical challenges and the social and cultural challenges, we consider the stakeholders in the scientific enterprise. In biomedical research, participants are key among those stakeholders. Ethical sharing requires considering both the value of research efforts and the privacy costs for participants. We discuss current best practices for various types of genomic data, as well as opportunities to promote ethical data sharing that accelerates science by aligning incentives.


Assuntos
Pesquisa Biomédica/métodos , Pesquisa Biomédica/tendências , Genômica/ética , Disseminação de Informação/ética , Pesquisadores/tendências , Comportamento Cooperativo , Humanos , Privacidade
5.
PLoS Biol ; 20(2): e3001470, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35104289

RESUMO

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.


Assuntos
Idioma , Revisão da Pesquisa por Pares , Pré-Publicações como Assunto , Pesquisa Biomédica , Publicações/normas , Terminologia como Assunto
6.
Nucleic Acids Res ; 51(W1): W350-W356, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37070209

RESUMO

Gene definitions and identifiers can be painful to manage-more so when trying to include gene function annotations as this can be highly context-dependent. Creating groups of genes or gene sets can help provide such context, but it compounds the issue as each gene within the gene set can map to multiple identifiers and have annotations derived from multiple sources. We developed MyGeneset.info to provide an API for integrated annotations for gene sets suitable for use in analytical pipelines or web servers. Leveraging our previous work with MyGene.info (a server that provides gene-centric annotations and identifiers), MyGeneset.info addresses the challenge of managing gene sets from multiple resources. With our API, users readily have read-only access to gene sets imported from commonly-used resources such as Wikipathways, CTD, Reactome, SMPDB, MSigDB, GO, and DO. In addition to supporting the access and reuse of approximately 180k gene sets from humans, common model organisms (mice, yeast, etc.), and less-common ones (e.g. black cottonwood tree), MyGeneset.info supports user-created gene sets, providing an important means for making gene sets more FAIR. User-created gene sets can serve as a way to store and manage collections for analysis or easy dissemination through a consistent API.


Assuntos
Internet , Software , Humanos , Animais , Camundongos , Anotação de Sequência Molecular , Interface Usuário-Computador
7.
PLoS Biol ; 19(10): e3001419, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34618807

RESUMO

Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.


Assuntos
Biologia Computacional , Orçamentos , Comportamento Cooperativo , Humanos , Pesquisa Interdisciplinar , Tutoria , Motivação , Publicações , Recompensa , Software
8.
PLoS Comput Biol ; 19(3): e1010984, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36972227

RESUMO

Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.


Assuntos
Expressão Gênica , Dinâmica não Linear , Perfilação da Expressão Gênica , Redes Neurais de Computação , Modelos Lineares
9.
Bioinformatics ; 38(22): 5129-5130, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36193991

RESUMO

MOTIVATION: Domain adaptation allows for the development of predictive models even in cases with limited sample data. Weighted elastic net domain adaptation specifically leverages features of genomic data to maximize transferability but the method is too computationally demanding to apply to many genome-sized datasets. RESULTS: We developed wenda_gpu, which uses GPyTorch to train models on genomic data within hours on a single GPU-enabled machine. We show that wenda_gpu returns comparable results to the original wenda implementation, and that it can be used for improved prediction of cancer mutation status on small sample sizes than regular elastic net. AVAILABILITY AND IMPLEMENTATION: wenda_gpu is available on GitHub at https://github.com/greenelab/wenda_gpu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Software , Humanos , Genômica/métodos , Neoplasias/genética , Tamanho da Amostra
10.
Proc Natl Acad Sci U S A ; 117(6): 3167-3173, 2020 02 11.
Artigo em Inglês | MEDLINE | ID: mdl-31980538

RESUMO

Pseudomonas aeruginosa strains with loss-of-function mutations in the transcription factor LasR are frequently encountered in the clinic and the environment. Among the characteristics common to LasR-defective (LasR-) strains is increased activity of the transcription factor Anr, relative to their LasR+ counterparts, in low-oxygen conditions. One of the Anr-regulated genes found to be highly induced in LasR- strains was PA14_42860 (PA1673), which we named mhr for microoxic hemerythrin. Purified P. aeruginosa Mhr protein contained the predicted di-iron center and bound molecular oxygen with an apparent Kd of ∼1 µM. Both Anr and Mhr were necessary for fitness in lasR+ and lasR mutant strains in colony biofilms grown in microoxic conditions, and the effects were more striking in the lasR mutant. Among genes in the Anr regulon, mhr was most closely coregulated with the Anr-controlled high-affinity cytochrome c oxidase genes. In the absence of high-affinity cytochrome c oxidases, deletion of mhr no longer caused a fitness disadvantage, suggesting that Mhr works in concert with microoxic respiration. We demonstrate that Anr and Mhr contribute to LasR- strain fitness even in biofilms grown in normoxic conditions. Furthermore, metabolomics data indicate that, in a lasR mutant, expression of Anr-regulated mhr leads to differences in metabolism in cells grown on lysogeny broth or artificial sputum medium. We propose that increased Anr activity leads to higher levels of the oxygen-binding protein Mhr, which confers an advantage to lasR mutants in microoxic conditions.


Assuntos
Proteínas de Bactérias/metabolismo , Hipóxia Celular/genética , Aptidão Genética/genética , Hemeritrina/metabolismo , Pseudomonas aeruginosa , Transativadores/metabolismo , Proteínas de Bactérias/genética , Hemeritrina/genética , Oxigênio/metabolismo , Pseudomonas aeruginosa/genética , Pseudomonas aeruginosa/metabolismo , Pseudomonas aeruginosa/fisiologia , Transativadores/genética
11.
PLoS Comput Biol ; 17(8): e1009290, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34428202

RESUMO

Single-cell RNA-sequencing (scRNA-seq) has made it possible to profile gene expression in tissues at high resolution. An important preprocessing step prior to performing downstream analyses is to identify and remove cells with poor or degraded sample quality using quality control (QC) metrics. Two widely used QC metrics to identify a 'low-quality' cell are (i) if the cell includes a high proportion of reads that map to mitochondrial DNA (mtDNA) encoded genes and (ii) if a small number of genes are detected. Current best practices use these QC metrics independently with either arbitrary, uniform thresholds (e.g. 5%) or biological context-dependent (e.g. species) thresholds, and fail to jointly model these metrics in a data-driven manner. Current practices are often overly stringent and especially untenable on certain types of tissues, such as archived tumor tissues, or tissues associated with mitochondrial function, such as kidney tissue [1]. We propose a data-driven QC metric (miQC) that jointly models both the proportion of reads mapping to mtDNA genes and the number of detected genes with mixture models in a probabilistic framework to predict the low-quality cells in a given dataset. We demonstrate how our QC metric easily adapts to different types of single-cell datasets to remove low-quality cells while preserving high-quality cells that can be used for downstream analyses. Our software package is available at https://bioconductor.org/packages/miQC.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Probabilidade , Controle de Qualidade , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , DNA Mitocondrial/genética , Humanos
12.
Nucleic Acids Res ; 48(9): 4709-4724, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-32319526

RESUMO

Alternative splicing (AS) is frequent during early mouse embryonic development. Specific histone post-translational modifications (hPTMs) have been shown to regulate exon splicing by either directly recruiting splice machinery or indirectly modulating transcriptional elongation. In this study, we hypothesized that hPTMs regulate expression of alternatively spliced genes for specific processes during differentiation. To address this notion, we applied an innovative machine learning approach to relate global hPTM enrichment to AS regulation during mammalian tissue development. We found that specific hPTMs, H3K36me3 and H3K4me1, play a role in skipped exon selection among all the tissues and developmental time points examined. In addition, we used iterative random forest model and found that interactions of multiple hPTMs most strongly predicted splicing when they included H3K36me3 and H3K4me1. Collectively, our data demonstrated a link between hPTMs and alternative splicing which will drive further experimental studies on the functional relevance of these modifications to alternative splicing.


Assuntos
Processamento Alternativo , Desenvolvimento Embrionário/genética , Éxons , Código das Histonas , Animais , Modelos Logísticos , Aprendizado de Máquina , Camundongos , Processamento de Proteína Pós-Traducional
13.
Genet Epidemiol ; 44(1): 52-66, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31583758

RESUMO

Genetic interactions have been recognized as a potentially important contributor to the heritability of complex diseases. Nevertheless, due to small effect sizes and stringent multiple-testing correction, identifying genetic interactions in complex diseases is particularly challenging. To address the above challenges, many genomic research initiatives collaborate to form large-scale consortia and develop open access to enable sharing of genome-wide association study (GWAS) data. Despite the perceived benefits of data sharing from large consortia, a number of practical issues have arisen, such as privacy concerns on individual genomic information and heterogeneous data sources from distributed GWAS databases. In the context of large consortia, we demonstrate that the heterogeneously appearing marginal effects over distributed GWAS databases can offer new insights into genetic interactions for which conventional methods have had limited success. In this paper, we develop a novel two-stage testing procedure, named phylogenY-based effect-size tests for interactions using first 2 moments (YETI2), to detect genetic interactions through both pooled marginal effects, in terms of averaging site-specific marginal effects, and heterogeneity in marginal effects across sites, using a meta-analytic framework. YETI2 can not only be applied to large consortia without shared personal information but also can be used to leverage underlying heterogeneity in marginal effects to prioritize potential genetic interactions. We investigate the performance of YETI2 through simulation studies and apply YETI2 to bladder cancer data from dbGaP.


Assuntos
Epistasia Genética/genética , Estudo de Associação Genômica Ampla/métodos , Neoplasias da Bexiga Urinária/genética , Humanos , Disseminação de Informação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
14.
Trends Genet ; 34(10): 790-805, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30143323

RESUMO

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.


Assuntos
Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Algoritmos , Humanos , Biologia de Sistemas/estatística & dados numéricos
16.
Bioinformatics ; 35(9): 1518-1526, 2019 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-30247517

RESUMO

MOTIVATION: Decreasing costs are making it feasible to perform time series proteomics and genomics experiments with more replicates and higher resolution than ever before. With more replicates and time points, proteome and genome-wide patterns of expression are more readily discernible. These larger experiments require more batches exacerbating batch effects and increasing the number of bias trends. In the case of proteomics, where methods frequently result in missing data this increasing scale is also decreasing the number of peptides observed in all samples. The sources of batch effects and missing data are incompletely understood necessitating novel techniques. RESULTS: Here we show that by exploiting the structure of time series experiments, it is possible to accurately and reproducibly model and remove batch effects. We implement Learning and Imputation for Mass-spec Bias Reduction (LIMBR) software, which builds on previous block-based models of batch effects and includes features specific to time series and circadian studies. To aid in the analysis of time series proteomics experiments, which are often plagued with missing data points, we also integrate an imputation system. By building LIMBR for imputation and time series tailored bias modeling into one straightforward software package, we expect that the quality and ease of large-scale proteomics and genomics time series experiments will be significantly increased. AVAILABILITY AND IMPLEMENTATION: Python code and documentation is available for download at https://github.com/aleccrowell/LIMBR and LIMBR can be downloaded and installed with dependencies using 'pip install limbr'. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Genoma , Genômica , Espectrometria de Massas , Proteômica
17.
PLoS Comput Biol ; 15(6): e1007128, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31233491

RESUMO

Open, collaborative research is a powerful paradigm that can immensely strengthen the scientific process by integrating broad and diverse expertise. However, traditional research and multi-author writing processes break down at scale. We present new software named Manubot, available at https://manubot.org, to address the challenges of open scholarly writing. Manubot adopts the contribution workflow used by many large-scale open source software projects to enable collaborative authoring of scholarly manuscripts. With Manubot, manuscripts are written in Markdown and stored in a Git repository to precisely track changes over time. By hosting manuscript repositories publicly, such as on GitHub, multiple authors can simultaneously propose and review changes. A cloud service automatically evaluates proposed changes to catch errors. Publication with Manubot is continuous: When a manuscript's source changes, the rendered outputs are rebuilt and republished to a web page. Manubot automates bibliographic tasks by implementing citation by identifier, where users cite persistent identifiers (e.g. DOIs, PubMed IDs, ISBNs, URLs), whose metadata is then retrieved and converted to a user-specified style. Manubot modernizes publishing to align with the ideals of open science by making it transparent, reproducible, immediate, versioned, collaborative, and free of charge.


Assuntos
Editoração , Software , Redação , Humanos , Manuscritos Médicos como Assunto
19.
J Bacteriol ; 200(8)2018 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-29311282

RESUMO

The Pseudomonas fluorescens genome encodes more than 50 proteins predicted to be involved in c-di-GMP signaling. Here, we demonstrated that, tested across 188 nutrients, these enzymes and effectors appeared capable of impacting biofilm formation. Transcriptional analysis of network members across ∼50 nutrient conditions indicates that altered gene expression can explain a subset of but not all biofilm formation responses to the nutrients. Additional organization of the network is likely achieved through physical interaction, as determined via probing ∼2,000 interactions by bacterial two-hybrid assays. Our analysis revealed a multimodal regulatory strategy using combinations of ligand-mediated signals, protein-protein interaction, and/or transcriptional regulation to fine-tune c-di-GMP-mediated responses. These results create a profile of a large c-di-GMP network that is used to make important cellular decisions, opening the door to future model building and the ability to engineer this complex circuitry in other bacteria.IMPORTANCE Cyclic diguanylate (c-di-GMP) is a key signaling molecule regulating bacterial biofilm formation, and many microbes have up to dozens of proteins that make, break, or bind this dinucleotide. A major open issue in the field is how signaling specificity is conferred in the unpartitioned space of a bacterial cell. Here, we took a systems approach, using mutational analysis, transcriptional studies, and bacterial two-hybrid analysis to interrogate this network. We found that a majority of enzymes are capable of impacting biofilm formation in a context-dependent manner, and we revealed examples of two or more modes of regulation (i.e., transcriptional control with protein-protein interaction) being utilized to generate an observable impact on biofilm formation.


Assuntos
Biofilmes/crescimento & desenvolvimento , GMP Cíclico/análogos & derivados , Regulação Bacteriana da Expressão Gênica , Pseudomonas fluorescens/crescimento & desenvolvimento , GMP Cíclico/genética , Perfilação da Expressão Gênica , Pseudomonas fluorescens/genética , Transdução de Sinais , Técnicas do Sistema de Duplo-Híbrido
20.
Hum Mol Genet ; 25(R2): R94-R98, 2016 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-27340225

RESUMO

One way to design a drug is to attempt to phenocopy a genetic variant that is known to have the desired effect. In general, drugs that are supported by genetic associations progress further in the development pipeline. However, the number of associations that are candidates for development into drugs is limited because many associations are in non-coding regions or difficult to target genes. Approaches that overlay information from pathway databases or biological networks can expand the potential target list. In cases where the initial variant is not targetable or there is no variant with the desired effect, this may reveal new means to target a disease. In this review, we discuss recent examples in the domain of pathway and network-based drug repositioning from genetic associations. We highlight important caveats and challenges for the field, and we discuss opportunities for further development.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA