Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Cell ; 181(1): 92-101, 2020 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-32243801

RESUMO

This Perspective explores the application of machine learning toward improved diagnosis and treatment. We outline a vision for how machine learning can transform three broad areas of biomedicine: clinical diagnostics, precision treatments, and health monitoring, where the goal is to maintain health through a range of diseases and the normal aging process. For each area, early instances of successful machine learning applications are discussed, as well as opportunities and challenges for machine learning. When these challenges are met, machine learning promises a future of rigorous, outcomes-based medicine with detection, diagnosis, and treatment strategies that are continuously adapted to individual and environmental differences.


Assuntos
Aprendizado de Máquina , Medicina de Precisão , Humanos
2.
Cell ; 181(2): 236-249, 2020 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-32302568

RESUMO

Crucial transitions in cancer-including tumor initiation, local expansion, metastasis, and therapeutic resistance-involve complex interactions between cells within the dynamic tumor ecosystem. Transformative single-cell genomics technologies and spatial multiplex in situ methods now provide an opportunity to interrogate this complexity at unprecedented resolution. The Human Tumor Atlas Network (HTAN), part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types. This effort complements both ongoing efforts to map healthy organs and previous large-scale cancer genomics approaches focused on bulk sequencing at a single point in time. Generating single-cell, multiparametric, longitudinal atlases and integrating them with clinical outcomes should help identify novel predictive biomarkers and features as well as therapeutically relevant cell types, cell states, and cellular interactions across transitions. The resulting tumor atlases should have a profound impact on our understanding of cancer biology and have the potential to improve cancer detection, prevention, and therapeutic discovery for better precision-medicine treatments of cancer patients and those at risk for cancer.


Assuntos
Transformação Celular Neoplásica/metabolismo , Neoplasias/metabolismo , Microambiente Tumoral/fisiologia , Atlas como Assunto , Transformação Celular Neoplásica/patologia , Genômica/métodos , Humanos , Medicina de Precisão/métodos , Análise de Célula Única/métodos
3.
Nat Methods ; 19(3): 311-315, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34824477

RESUMO

Highly multiplexed tissue imaging makes detailed molecular analysis of single cells possible in a preserved spatial context. However, reproducible analysis of large multichannel images poses a substantial computational challenge. Here, we describe a modular and open-source computational pipeline, MCMICRO, for performing the sequential steps needed to transform whole-slide images into single-cell data. We demonstrate the use of MCMICRO on tissue and tumor images acquired using multiple imaging platforms, thereby providing a solid foundation for the continued development of tissue imaging software.


Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias , Diagnóstico por Imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias/diagnóstico por imagem , Neoplasias/patologia , Software
4.
Genome Res ; 30(6): 885-897, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32660935

RESUMO

RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap: 5 kb), large (Neb: 22 kb), and very large (Ttn: 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type-specific differential expression of these novel transcripts. The improved transcript identification and quantification shown by our approach removes previous impediments to studies aimed at quantitative differential expression of ultralong transcripts.


Assuntos
Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , RNA Mensageiro , Análise de Sequência de RNA , Transcriptoma , Processamento Alternativo , Biologia Computacional/métodos , Éxons , Perfilação da Expressão Gênica/métodos , Humanos , Anotação de Sequência Molecular , Especificidade de Órgãos/genética , Sequências Repetitivas de Ácido Nucleico
5.
PLoS Comput Biol ; 17(6): e1009014, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34061826

RESUMO

Supervised machine learning is an essential but difficult to use approach in biomedical data analysis. The Galaxy-ML toolkit (https://galaxyproject.org/community/machine-learning/) makes supervised machine learning more accessible to biomedical scientists by enabling them to perform end-to-end reproducible machine learning analyses at large scale using only a web browser. Galaxy-ML extends Galaxy (https://galaxyproject.org), a biomedical computational workbench used by tens of thousands of scientists across the world, with a suite of tools for all aspects of supervised machine learning.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Software
6.
Nucleic Acids Res ; 48(W1): W395-W402, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32479607

RESUMO

Galaxy (https://galaxyproject.org) is a web-based computational workbench used by tens of thousands of scientists across the world to analyze large biomedical datasets. Since 2005, the Galaxy project has fostered a global community focused on achieving accessible, reproducible, and collaborative research. Together, this community develops the Galaxy software framework, integrates analysis tools and visualizations into the framework, runs public servers that make Galaxy available via a web browser, performs and publishes analyses using Galaxy, leads bioinformatics workshops that introduce and use Galaxy, and develops interactive training materials for Galaxy. Over the last two years, all aspects of the Galaxy project have grown: code contributions, tools integrated, users, and training materials. Key advances in Galaxy's user interface include enhancements for analyzing large dataset collections as well as interactive tools for exploratory data analysis. Extensions to Galaxy's framework include support for federated identity and access management and increased ability to distribute analysis jobs to remote resources. New community resources include large public servers in Europe and Australia, an increasing number of regional and local Galaxy communities, and substantial growth in the Galaxy Training Network.


Assuntos
Software , Pesquisa Biomédica , Análise de Dados , Conjuntos de Dados como Assunto , Metabolômica/métodos , Metagenômica/métodos , Proteômica/métodos , Reprodutibilidade dos Testes , Análise de Célula Única/métodos
7.
Bioinformatics ; 36(1): 1-9, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31197310

RESUMO

MOTIVATION: Large biomedical datasets, such as those from genomics and imaging, are increasingly being stored on commercial and institutional cloud computing platforms. This is because cloud-scale computing resources, from robust backup to high-speed data transfer to scalable compute and storage, are needed to make these large datasets usable. However, one challenge for large-scale biomedical data on the cloud is providing secure access, especially when datasets are distributed across platforms. While there are open Web protocols for secure authentication and authorization, these protocols are not in wide use in bioinformatics and are difficult to use for even technologically sophisticated users. RESULTS: We have developed a generic and extensible approach for securely accessing biomedical datasets distributed across cloud computing platforms. Our approach combines OpenID Connect and OAuth2, best-practice Web protocols for authentication and authorization, together with Galaxy (https://galaxyproject.org), a web-based computational workbench used by thousands of scientists across the world. With our enhanced version of Galaxy, users can access and analyze data distributed across multiple cloud computing providers without any special knowledge of access/authorization protocols. Our approach does not require users to share permanent credentials (e.g. username, password, API key), instead relying on automatically generated temporary tokens that refresh as needed. Our approach is generalizable to most identity providers and cloud computing platforms. To the best of our knowledge, Galaxy is the only computational workbench where users can access biomedical datasets across multiple cloud computing platforms using best-practice Web security approaches and thereby minimize risks of unauthorized data access and credential use. AVAILABILITY AND IMPLEMENTATION: Freely available for academic and commercial use under the open-source Academic Free License (https://opensource.org/licenses/AFL-3.0) from the following Github repositories: https://github.com/galaxyproject/galaxy and https://github.com/galaxyproject/cloudauthz.


Assuntos
Computação em Nuvem , Biologia Computacional , Segurança Computacional , Biologia Computacional/normas , Segurança Computacional/tendências , Software
8.
PLoS Biol ; 16(12): e3000099, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30596645

RESUMO

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Animais , Comunicação , Biologia Computacional/normas , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Medicina de Precisão/tendências , Reprodutibilidade dos Testes , Análise de Sequência de DNA/normas , Software , Fluxo de Trabalho
9.
PLoS Comput Biol ; 16(6): e1007863, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32497138

RESUMO

Scientists are sequencing new genomes at an increasing rate with the goal of associating genome contents with phenotypic traits. After a new genome is sequenced and assembled, structural gene annotation is often the first step in analysis. Despite advances in computational gene prediction algorithms, most eukaryotic genomes still benefit from manual gene annotation. This requires access to good genome browsers to enable annotators to visualize and evaluate multiple lines of evidence (e.g., sequence similarity, RNA sequencing [RNA-Seq] results, gene predictions, repeats) and necessitates many volunteers to participate in the work. To address the technical barriers to creating genome browsers, the Genomics Education Partnership (GEP; https://gep.wustl.edu/) has partnered with the Galaxy Project (https://galaxyproject.org) to develop G-OnRamp (http://g-onramp.org), a web-based platform for creating UCSC Genome Browser Assembly Hubs and JBrowse genome browsers. G-OnRamp also converts a JBrowse instance into an Apollo instance for collaborative genome annotations in research and educational settings. The genome browsers produced can be transferred to the CyVerse Data Store for long-term access. G-OnRamp enables researchers to easily visualize their experimental results, educators to create Course-based Undergraduate Research Experiences (CUREs) centered on genome annotation, and students to participate in genomics research. In the process, students learn about genes/genomes and about how to utilize large datasets. Development of G-OnRamp was guided by extensive user feedback. Sixty-five researchers/educators from >40 institutions participated through in-person workshops, which produced >20 genome browsers now available for research and education. Genome browsers generated for four parasitoid wasp species have been used in a CURE engaging students at 15 colleges and universities. Our assessment results in the classroom demonstrate that the genome browsers produced by G-OnRamp are effective tools for engaging undergraduates in research and in enabling their contributions to the scientific literature in genomics. Expansion of such genomics research/education partnerships will be beneficial to researchers, faculty, and students alike.


Assuntos
Biologia Computacional/educação , Biologia Computacional/métodos , Genoma , Genômica/educação , Genômica/métodos , Anotação de Sequência Molecular , Software , Algoritmos , Animais , Sequência de Bases , Gráficos por Computador , Bases de Dados Genéticas , Drosophila melanogaster , Humanos , Análise de Sequência de RNA , Estudantes , Interface Usuário-Computador
10.
Bioinformatics ; 35(21): 4422-4423, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31070714

RESUMO

SUMMARY: G-OnRamp provides a user-friendly, web-based platform for collaborative, end-to-end annotation of eukaryotic genomes using UCSC Assembly Hubs and JBrowse/Apollo genome browsers with evidence tracks derived from sequence alignments, ab initio gene predictors, RNA-Seq data and repeat finders. G-OnRamp can be used to visualize large genomics datasets and to perform collaborative genome annotation projects in both research and educational settings. AVAILABILITY AND IMPLEMENTATION: The virtual machine images and tutorials are available on the G-OnRamp web site (http://g-onramp.org/deployments). The source code is available under an Academic Free License version 3.0 through the goeckslab GitHub repository (https://github.com/goeckslab). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Eucariotos , Genoma , Genômica , Alinhamento de Sequência , Software
11.
Nucleic Acids Res ; 46(W1): W537-W544, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29790989

RESUMO

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.


Assuntos
Genômica/estatística & dados numéricos , Metabolômica/estatística & dados numéricos , Imagem Molecular/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Interface Usuário-Computador , Conjuntos de Dados como Assunto , Humanos , Disseminação de Informação , Cooperação Internacional , Internet , Reprodutibilidade dos Testes
12.
Mol Biol Evol ; 35(6): 1372-1375, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29688462

RESUMO

Research in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today's life sciences ranging from cancer biology to microbial ecology. This situation makes algorithms and software tools developed by our community more important than ever before. This means that we, developers of software tool for molecular evolutionary analyses, now have a shared responsibility to make these tools accessible using modern technological developments as well as provide adequate documentation and training.


Assuntos
Evolução Biológica , Biologia Computacional , Software/normas
13.
Nucleic Acids Res ; 44(W1): W3-W10, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27137889

RESUMO

High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.


Assuntos
Biologia Computacional/estatística & dados numéricos , Conjuntos de Dados como Assunto/estatística & dados numéricos , Interface Usuário-Computador , Pesquisa Biomédica , Biologia Computacional/métodos , Bases de Dados Genéticas , Humanos , Internet , Reprodutibilidade dos Testes
15.
PLoS Comput Biol ; 11(2): e1003972, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25654371

RESUMO

"Scientific community" refers to a group of people collaborating together on scientific-research-related activities who also share common goals, interests, and values. Such communities play a key role in many bioinformatics activities. Communities may be linked to a specific location or institute, or involve people working at many different institutions and locations. Education and training is typically an important component of these communities, providing a valuable context in which to develop skills and expertise, while also strengthening links and relationships within the community. Scientific communities facilitate: (i) the exchange and development of ideas and expertise; (ii) career development; (iii) coordinated funding activities; (iv) interactions and engagement with professionals from other fields; and (v) other activities beneficial to individual participants, communities, and the scientific field as a whole. It is thus beneficial at many different levels to understand the general features of successful, high-impact bioinformatics communities; how individual participants can contribute to the success of these communities; and the role of education and training within these communities. We present here a quick guide to building and maintaining a successful, high-impact bioinformatics community, along with an overview of the general benefits of participating in such communities. This article grew out of contributions made by organizers, presenters, panelists, and other participants of the ISMB/ECCB 2013 workshop "The 'How To Guide' for Establishing a Successful Bioinformatics Network" at the 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 12th European Conference on Computational Biology (ECCB).


Assuntos
Comunicação , Biologia Computacional/organização & administração , Humanos , Internet , Mídias Sociais
16.
Proc Natl Acad Sci U S A ; 110(23): 9427-32, 2013 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-23690612

RESUMO

Because parasite virulence factors target host immune responses, identification and functional characterization of these factors can provide insight into poorly understood host immune mechanisms. The fruit fly Drosophila melanogaster is a model system for understanding humoral innate immunity, but Drosophila cellular innate immune responses remain incompletely characterized. Fruit flies are regularly infected by parasitoid wasps in nature and, following infection, flies mount a cellular immune response culminating in the cellular encapsulation of the wasp egg. The mechanistic basis of this response is largely unknown, but wasps use a mixture of virulence proteins derived from the venom gland to suppress cellular encapsulation. To gain insight into the mechanisms underlying wasp virulence and fly cellular immunity, we used a joint transcriptomic/proteomic approach to identify venom genes from Ganaspis sp.1 (G1), a previously uncharacterized Drosophila parasitoid species, and found that G1 venom contains a highly abundant sarco/endoplasmic reticulum calcium ATPase (SERCA) pump. Accordingly, we found that fly immune cells termed plasmatocytes normally undergo a cytoplasmic calcium burst following infection, and that this calcium burst is required for activation of the cellular immune response. We further found that the plasmatocyte calcium burst is suppressed by G1 venom in a SERCA-dependent manner, leading to the failure of plasmatocytes to become activated and migrate toward G1 eggs. Finally, by genetically manipulating plasmatocyte calcium levels, we were able to alter fly immune success against G1 and other parasitoid species. Our characterization of parasitoid wasp venom proteins led us to identify plasmatocyte cytoplasmic calcium bursts as an important aspect of fly cellular immunity.


Assuntos
Cálcio/metabolismo , Drosophila melanogaster/imunologia , Drosophila melanogaster/parasitologia , Imunidade Celular/efeitos dos fármacos , ATPases Transportadoras de Cálcio do Retículo Sarcoplasmático/farmacologia , Venenos de Vespas/enzimologia , Vespas/química , Animais , Sequência de Bases , Western Blotting , Drosophila melanogaster/metabolismo , Perfilação da Expressão Gênica , Hemócitos/imunologia , Hemócitos/metabolismo , Espectrometria de Massas , Dados de Sequência Molecular , Hibridização de Ácido Nucleico/métodos , Reação em Cadeia da Polimerase , ATPases Transportadoras de Cálcio do Retículo Sarcoplasmático/análise , Análise de Sequência de DNA , Fatores de Virulência/farmacologia , Vespas/genética , Vespas/patogenicidade
17.
Cancer Immunol Res ; 12(5): 544-558, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38381401

RESUMO

Tumor molecular data sets are becoming increasingly complex, making it nearly impossible for humans alone to effectively analyze them. Here, we demonstrate the power of using machine learning (ML) to analyze a single-cell, spatial, and highly multiplexed proteomic data set from human pancreatic cancer and reveal underlying biological mechanisms that may contribute to clinical outcomes. We designed a multiplex immunohistochemistry antibody panel to compare T-cell functionality and spatial localization in resected tumors from treatment-naïve patients with localized pancreatic ductal adenocarcinoma (PDAC) with resected tumors from a second cohort of patients treated with neoadjuvant agonistic CD40 (anti-CD40) monoclonal antibody therapy. In total, nearly 2.5 million cells from 306 tissue regions collected from 29 patients across both cohorts were assayed, and over 1,000 tumor microenvironment (TME) features were quantified. We then trained ML models to accurately predict anti-CD40 treatment status and disease-free survival (DFS) following anti-CD40 therapy based on TME features. Through downstream interpretation of the ML models' predictions, we found anti-CD40 therapy reduced canonical aspects of T-cell exhaustion within the TME, as compared with treatment-naïve TMEs. Using automated clustering approaches, we found improved DFS following anti-CD40 therapy correlated with an increased presence of CD44+CD4+ Th1 cells located specifically within cellular neighborhoods characterized by increased T-cell proliferation, antigen experience, and cytotoxicity in immune aggregates. Overall, our results demonstrate the utility of ML in molecular cancer immunology applications, highlight the impact of anti-CD40 therapy on T cells within the TME, and identify potential candidate biomarkers of DFS for anti-CD40-treated patients with PDAC.


Assuntos
Carcinoma Ductal Pancreático , Imunoterapia , Aprendizado de Máquina , Terapia Neoadjuvante , Neoplasias Pancreáticas , Microambiente Tumoral , Humanos , Neoplasias Pancreáticas/imunologia , Neoplasias Pancreáticas/terapia , Neoplasias Pancreáticas/patologia , Microambiente Tumoral/imunologia , Imunoterapia/métodos , Carcinoma Ductal Pancreático/imunologia , Carcinoma Ductal Pancreático/terapia , Carcinoma Ductal Pancreático/patologia , Linfócitos T/imunologia , Linfócitos T/metabolismo , Antígenos CD40/metabolismo , Resultado do Tratamento , Feminino , Linfócitos do Interstício Tumoral/imunologia , Linfócitos do Interstício Tumoral/metabolismo , Masculino
18.
BMC Genomics ; 14: 397, 2013 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-23758618

RESUMO

BACKGROUND: Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. RESULTS: We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. CONCLUSIONS: Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments.


Assuntos
Gráficos por Computador , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Internet , Estatística como Assunto/métodos , Disseminação de Informação , Filogenia , Publicações
19.
bioRxiv ; 2023 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-38106203

RESUMO

Multiplex tissue imaging are a collection of increasingly popular single-cell spatial proteomics and transcriptomics assays for characterizing biological tissues both compositionally and spatially. However, several technical issues limit the utility of multiplex tissue imaging, including the limited number of RNAs and proteins that can be assayed, tissue loss, and protein probe failure. In this work, we demonstrate how machine learning methods can address these limitations by imputing protein abundance at the single-cell level using multiplex tissue imaging datasets from a breast cancer cohort. We first compared machine learning methods' strengths and weaknesses for imputing single-cell protein abundance. Machine learning methods used in this work include regularized linear regression, gradient-boosted regression trees, and deep learning autoencoders. We also incorporated cellular spatial information to improve imputation performance. Using machine learning, single-cell protein expression can be imputed with mean absolute error ranging between 0.05-0.3 on a [0,1] scale. Our results demonstrate (1) the feasibility of imputing single-cell abundance levels for many proteins using machine learning to overcome the technical constraints of multiplex tissue imaging and (2) how including cellular spatial information can substantially enhance imputation results.

20.
bioRxiv ; 2023 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-37961410

RESUMO

Tumor molecular datasets are becoming increasingly complex, making it nearly impossible for humans alone to effectively analyze them. Here, we demonstrate the power of using machine learning to analyze a single-cell, spatial, and highly multiplexed proteomic dataset from human pancreatic cancer and reveal underlying biological mechanisms that may contribute to clinical outcome. A novel multiplex immunohistochemistry antibody panel was used to audit T cell functionality and spatial localization in resected tumors from treatment-naive patients with localized pancreatic ductal adenocarcinoma (PDAC) compared to a second cohort of patients treated with neoadjuvant agonistic CD40 (αCD40) monoclonal antibody therapy. In total, nearly 2.5 million cells from 306 tissue regions collected from 29 patients across both treatment cohorts were assayed, and more than 1,000 tumor microenvironment (TME) features were quantified. We then trained machine learning models to accurately predict αCD40 treatment status and disease-free survival (DFS) following αCD40 therapy based upon TME features. Through downstream interpretation of the machine learning models' predictions, we found αCD40 therapy to reduce canonical aspects of T cell exhaustion within the TME, as compared to treatment-naive TMEs. Using automated clustering approaches, we found improved DFS following αCD40 therapy to correlate with the increased presence of CD44+ CD4+ Th1 cells located specifically within cellular spatial neighborhoods characterized by increased T cell proliferation, antigen-experience, and cytotoxicity in immune aggregates. Overall, our results demonstrate the utility of machine learning in molecular cancer immunology applications, highlight the impact of αCD40 therapy on T cells within the TME, and identify potential candidate biomarkers of DFS for αCD40-treated patients with PDAC.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa