Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 120
Filtrar
1.
J Pathol ; 262(3): 271-288, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38230434

RESUMO

Recent advances in the field of immuno-oncology have brought transformative changes in the management of cancer patients. The immune profile of tumours has been found to have key value in predicting disease prognosis and treatment response in various cancers. Multiplex immunohistochemistry and immunofluorescence have emerged as potent tools for the simultaneous detection of multiple protein biomarkers in a single tissue section, thereby expanding opportunities for molecular and immune profiling while preserving tissue samples. By establishing the phenotype of individual tumour cells when distributed within a mixed cell population, the identification of clinically relevant biomarkers with high-throughput multiplex immunophenotyping of tumour samples has great potential to guide appropriate treatment choices. Moreover, the emergence of novel multi-marker imaging approaches can now provide unprecedented insights into the tumour microenvironment, including the potential interplay between various cell types. However, there are significant challenges to widespread integration of these technologies in daily research and clinical practice. This review addresses the challenges and potential solutions within a structured framework of action from a regulatory and clinical trial perspective. New developments within the field of immunophenotyping using multiplexed tissue imaging platforms and associated digital pathology are also described, with a specific focus on translational implications across different subtypes of cancer. © 2024 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Biomarcadores Tumorais/genética , Prognóstico , Fenótipo , Reino Unido , Microambiente Tumoral
2.
J Pathol ; 260(5): 514-532, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37608771

RESUMO

Modern histologic imaging platforms coupled with machine learning methods have provided new opportunities to map the spatial distribution of immune cells in the tumor microenvironment. However, there exists no standardized method for describing or analyzing spatial immune cell data, and most reported spatial analyses are rudimentary. In this review, we provide an overview of two approaches for reporting and analyzing spatial data (raster versus vector-based). We then provide a compendium of spatial immune cell metrics that have been reported in the literature, summarizing prognostic associations in the context of a variety of cancers. We conclude by discussing two well-described clinical biomarkers, the breast cancer stromal tumor infiltrating lymphocytes score and the colon cancer Immunoscore, and describe investigative opportunities to improve clinical utility of these spatial biomarkers. © 2023 The Pathological Society of Great Britain and Ireland.


Assuntos
Neoplasias do Colo , Humanos , Biomarcadores , Benchmarking , Linfócitos do Interstício Tumoral , Análise Espacial , Microambiente Tumoral
3.
J Pathol ; 260(5): 498-513, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37608772

RESUMO

The clinical significance of the tumor-immune interaction in breast cancer is now established, and tumor-infiltrating lymphocytes (TILs) have emerged as predictive and prognostic biomarkers for patients with triple-negative (estrogen receptor, progesterone receptor, and HER2-negative) breast cancer and HER2-positive breast cancer. How computational assessments of TILs might complement manual TIL assessment in trial and daily practices is currently debated. Recent efforts to use machine learning (ML) to automatically evaluate TILs have shown promising results. We review state-of-the-art approaches and identify pitfalls and challenges of automated TIL evaluation by studying the root cause of ML discordances in comparison to manual TIL quantification. We categorize our findings into four main topics: (1) technical slide issues, (2) ML and image analysis aspects, (3) data challenges, and (4) validation issues. The main reason for discordant assessments is the inclusion of false-positive areas or cells identified by performance on certain tissue patterns or design choices in the computational implementation. To aid the adoption of ML for TIL assessment, we provide an in-depth discussion of ML and image analysis, including validation issues that need to be considered before reliable computational reporting of TILs can be incorporated into the trial and routine clinical management of patients with triple-negative breast cancer. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Assuntos
Neoplasias Mamárias Animais , Neoplasias de Mama Triplo Negativas , Humanos , Animais , Linfócitos do Interstício Tumoral , Biomarcadores , Aprendizado de Máquina
4.
Am J Epidemiol ; 192(6): 995-1005, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-36804665

RESUMO

Data sharing is essential for reproducibility of epidemiologic research, replication of findings, pooled analyses in consortia efforts, and maximizing study value to address multiple research questions. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of data sharing. Epidemiological practices that follow Findable, Accessible, Interoperable, Reusable (FAIR) principles can address these barriers by making data resources findable with the necessary metadata, accessible to authorized users, and interoperable with other data, to optimize the reuse of resources with appropriate credit to its creators. We provide an overview of these principles and describe approaches for implementation in epidemiology. Increasing degrees of FAIRness can be achieved by moving data and code from on-site locations to remote, accessible ("Cloud") data servers, using machine-readable and nonproprietary files, and developing open-source code. Adoption of these practices will improve daily work and collaborative analyses and facilitate compliance with data sharing policies from funders and scientific journals. Achieving a high degree of FAIRness will require funding, training, organizational support, recognition, and incentives for sharing research resources, both data and code. However, these costs are outweighed by the benefits of making research more reproducible, impactful, and equitable by facilitating the reuse of precious research resources by the scientific community.


Assuntos
Confidencialidade , Disseminação de Informação , Humanos , Reprodutibilidade dos Testes , Software , Estudos Epidemiológicos
5.
Bioinformatics ; 38(18): 4434-4436, 2022 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-35900159

RESUMO

MOTIVATION: The Division of Cancer Epidemiology and Genetics (DCEG) and the Division of Cancer Prevention (DCP) at the National Cancer Institute (NCI) have recently generated genome-wide association study (GWAS) data for multiple traits in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Genomic Atlas project. The GWAS included 110 000 participants. The dissemination of the genetic association data through a data portal called GWAS Explorer, in a manner that addresses the modern expectations of FAIR reusability by data scientists and engineers, is the main motivation for the development of the open-source JavaScript software development kit (SDK) reported here. RESULTS: The PLCO GWAS Explorer resource relies on a public stateless HTTP application programming interface (API) deployed as the sole backend service for both the landing page's web application and third-party analytical workflows. The core PLCOjs SDK is mapped to each of the API methods, and also to each of the reference graphic visualizations in the GWAS Explorer. A few additional visualization methods extend it. As is the norm with web SDKs, no download or installation is needed and modularization supports targeted code injection for web applications, reactive notebooks (Observable) and node-based web services. AVAILABILITY AND IMPLEMENTATION: code at https://github.com/episphere/plco; project page at https://episphere.github.io/plco.


Assuntos
Neoplasias Colorretais , Neoplasias Ovarianas , Estados Unidos , Masculino , Humanos , Feminino , Estudo de Associação Genômica Ampla , National Cancer Institute (U.S.) , Próstata , Software , Neoplasias Ovarianas/genética , Pulmão
6.
BMC Med Inform Decis Mak ; 23(1): 238, 2023 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-37880712

RESUMO

BACKGROUND: Online questionnaires are commonly used to collect information from participants in epidemiological studies. This requires building questionnaires using machine-readable formats that can be delivered to study participants using web-based technologies such as progressive web applications. However, the paucity of open-source markup standards with support for complex logic make collaborative development of web-based questionnaire modules difficult. This often prevents interoperability and reusability of questionnaire modules across epidemiological studies. RESULTS: We developed an open-source markup language for presentation of questionnaire content and logic, Quest, within a real-time renderer that enables the user to test logic (e.g., skip patterns) and view the structure of data collection. We provide the Quest markup language, an in-browser markup rendering tool, questionnaire development tool and an example web application that embeds the renderer, developed for The Connect for Cancer Prevention Study. CONCLUSION: A markup language can specify both the content and logic of a questionnaire as plain text. Questionnaire markup, such as Quest, can become a standard format for storing questionnaires or sharing questionnaires across the web. Quest is a step towards generation of FAIR data in epidemiological studies by facilitating reusability of questionnaires and data interoperability using open-source tools.


Assuntos
Software , Humanos , Inquéritos e Questionários , Estudos Epidemiológicos
7.
Bioinformatics ; 37(14): 2073-2074, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-33135727

RESUMO

MOTIVATION: Mortality Tracker is an in-browser application for data wrangling, analysis, dissemination and visualization of public time series of mortality in the United States. It was developed in response to requests by epidemiologists for portable real time assessment of the effect of COVID-19 on other causes of death and all-cause mortality. This is performed by comparing 2020 real time values with observations from the same week in the previous 5 years, and by enabling the extraction of temporal snapshots of mortality series that facilitate modeling the interdependence between its causes. RESULTS: Our solution employs a scalable 'Data Commons at Web Scale' approach that abstracts all stages of the data cycle as in-browser components. Specifically, the data wrangling computation, not just the orchestration of data retrieval, takes place in the browser, without any requirement to download or install software. This approach, where operations that would normally be computed server-side are mapped to in-browser SDKs, is sometimes loosely described as Web APIs, a designation adopted here. AVAILABILITYAND IMPLEMENTATION: https://episphere.github.io/mortalitytracker; webcast demo: youtu.be/ZsvCe7cZzLo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , Computadores , Humanos , Armazenamento e Recuperação da Informação , SARS-CoV-2 , Software
8.
Ann Intern Med ; 174(4): 437-443, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33316174

RESUMO

BACKGROUND: Excess death estimates quantify the full impact of the coronavirus disease 2019 (COVID-19) pandemic. Widely reported U.S. excess death estimates have not accounted for recent population changes, especially increases in the population older than 65 years. OBJECTIVE: To estimate excess deaths in the United States in 2020, after accounting for population changes. DESIGN: Surveillance study. SETTING: United States, March to August 2020. PARTICIPANTS: All decedents. MEASUREMENTS: Age-specific excess deaths in the United States from 1 March to 31 August 2020 compared with 2015 to 2019 were estimated, after changes in population size and age were taken into account, by using Centers for Disease Control and Prevention provisional death data and U.S. Census Bureau population estimates. Cause-specific excess deaths were estimated by month and age. RESULTS: From March through August 2020, 1 671 400 deaths were registered in the United States, including 173 300 COVID-19 deaths. An average of 1 370 000 deaths were reported over the same months during 2015 to 2019, for a crude excess of 301 400 deaths (128 100 non-COVID-19 deaths). However, the 2020 U.S. population includes 5.04 million more persons aged 65 years and older than the average population in 2015 to 2019 (a 10% increase). After population changes were taken into account, an estimated 217 900 excess deaths occurred from March through August 2020 (173 300 COVID-19 and 44 600 non-COVID-19 deaths). Most excess non-COVID-19 deaths occurred in April, July, and August, and 34 900 (78%) were in persons aged 25 to 64 years. Diabetes, Alzheimer disease, and heart disease caused the most non-COVID-19 excess deaths. LIMITATION: Provisional death data are underestimated because of reporting delays. CONCLUSION: The COVID-19 pandemic resulted in an estimated 218 000 excess deaths in the United States between March and August 2020, and 80% of those deaths had COVID-19 as the underlying cause. Accounting for population changes substantially reduced the excess non-COVID-19 death estimates, providing important information for guiding future clinical and public health interventions. PRIMARY FUNDING SOURCE: National Cancer Institute.


Assuntos
Envelhecimento , COVID-19/mortalidade , Mortalidade/tendências , Pneumonia Viral/mortalidade , Crescimento Demográfico , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Pandemias , Pneumonia Viral/virologia , Vigilância da População , Fatores de Risco , SARS-CoV-2 , Estados Unidos/epidemiologia
9.
Ann Intern Med ; 174(12): 1693-1699, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34606321

RESUMO

BACKGROUND: Although racial/ethnic disparities in U.S. COVID-19 death rates are striking, focusing on COVID-19 deaths alone may underestimate the true effect of the pandemic on disparities. Excess death estimates capture deaths both directly and indirectly caused by COVID-19. OBJECTIVE: To estimate U.S. excess deaths by racial/ethnic group. DESIGN: Surveillance study. SETTING: United States. PARTICIPANTS: All decedents. MEASUREMENTS: Excess deaths and excess deaths per 100 000 persons from March to December 2020 were estimated by race/ethnicity, sex, age group, and cause of death, using provisional death certificate data from the Centers for Disease Control and Prevention (CDC) and U.S. Census Bureau population estimates. RESULTS: An estimated 2.88 million deaths occurred between March and December 2020. Compared with the number of expected deaths based on 2019 data, 477 200 excess deaths occurred during this period, with 74% attributed to COVID-19. Age-standardized excess deaths per 100 000 persons among Black, American Indian/Alaska Native (AI/AN), and Latino males and females were more than double those in White and Asian males and females. Non-COVID-19 excess deaths also disproportionately affected Black, AI/AN, and Latino persons. Compared with White males and females, non-COVID-19 excess deaths per 100 000 persons were 2 to 4 times higher in Black, AI/AN, and Latino males and females, including deaths due to diabetes, heart disease, cerebrovascular disease, and Alzheimer disease. Excess deaths in 2020 resulted in substantial widening of racial/ethnic disparities in all-cause mortality from 2019 to 2020. LIMITATIONS: Completeness and availability of provisional CDC data; no estimates of precision around results. CONCLUSION: There were profound racial/ethnic disparities in excess deaths in the United States in 2020 during the COVID-19 pandemic, resulting in rapid increases in racial/ethnic disparities in all-cause mortality between 2019 and 2020. PRIMARY FUNDING SOURCE: National Institutes of Health Intramural Research Program.


Assuntos
COVID-19/etnologia , COVID-19/mortalidade , Minorias Étnicas e Raciais/estatística & dados numéricos , Disparidades nos Níveis de Saúde , Pandemias , Adolescente , Adulto , Distribuição por Idade , Idoso , Idoso de 80 Anos ou mais , Causas de Morte , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Vigilância da População , SARS-CoV-2 , Distribuição por Sexo , Estados Unidos/epidemiologia , Adulto Jovem
10.
Am J Pathol ; 190(7): 1491-1504, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32277893

RESUMO

Quantitative assessment of spatial relations between tumor and tumor-infiltrating lymphocytes (TIL) is increasingly important in both basic science and clinical aspects of breast cancer research. We have developed and evaluated convolutional neural network analysis pipelines to generate combined maps of cancer regions and TILs in routine diagnostic breast cancer whole slide tissue images. The combined maps provide insight about the structural patterns and spatial distribution of lymphocytic infiltrates and facilitate improved quantification of TILs. Both tumor and TIL analyses were evaluated by using three convolutional neural network networks (34-layer ResNet, 16-layer VGG, and Inception v4); the results compared favorably with those obtained by using the best published methods. We have produced open-source tools and a public data set consisting of tumor/TIL maps for 1090 invasive breast cancer images from The Cancer Genome Atlas. The maps can be downloaded for further downstream analyses.


Assuntos
Neoplasias da Mama/patologia , Aprendizado Profundo , Linfócitos do Interstício Tumoral/patologia , Neoplasias da Mama/imunologia , Feminino , Humanos , Linfócitos do Interstício Tumoral/imunologia , Programa de SEER
11.
PLoS Biol ; 16(12): e3000099, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30596645

RESUMO

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Animais , Comunicação , Biologia Computacional/normas , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Medicina de Precisão/tendências , Reprodutibilidade dos Testes , Análise de Sequência de DNA/normas , Software , Fluxo de Trabalho
12.
Bioinformatics ; 33(4): 547-548, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27797761

RESUMO

Summary: The move of computational genomics workflows to Cloud Computing platforms is associated with a new level of integration and interoperability that challenges existing data representation formats. The Variant Calling Format (VCF) is in a particularly sensitive position in that regard, with both clinical and consumer-facing analysis tools relying on this self-contained description of genomic variation in Next Generation Sequencing (NGS) results. In this report we identify an isomorphic map between VCF and the reference Resource Description Framework. RDF is advanced by the World Wide Web Consortium (W3C) to enable representations of linked data that are both distributed and discoverable. The resulting ability to decompose VCF reports of genomic variation without loss of context addresses the need to modularize and govern NGS pipelines for Precision Medicine. Specifically, it provides the flexibility (i.e. the indexing) needed to support the wide variety of clinical scenarios and patient-facing governance where only part of the VCF data is fitting. Availability and Implementation: Software libraries with a claim to be both domain-facing and consumer-facing have to pass the test of portability across the variety of devices that those consumers in fact adopt. That is, ideally the implementation should itself take place within the space defined by web technologies. Consequently, the isomorphic mapping function was implemented in JavaScript, and was tested in a variety of environments and devices, client and server side alike. These range from web browsers in mobile phones to the most popular micro service platform, NodeJS. The code is publicly available at https://github.com/ibl/VCFr , with a live deployment at: http://ibl.github.io/VCFr/ . Contact: jonas.almeida@stonybrookmedicine.edu.


Assuntos
Variação Genética , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Genômica/métodos , Humanos , Armazenamento e Recuperação da Informação , Semântica
13.
Cancer Control ; 24(1): 102-110, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28178722

RESUMO

BACKGROUND: The molecular signature of ductal carcinoma in situ (DCIS) in the breast is not well understood. Erb-b2 receptor tyrosine kinase 2 (ERBB2 [formerly known as HER2/neu]) positivity in DCIS is predictive of coexistent early invasive breast carcinoma. The aim of this study is to identify the gene-expression signature profiles of estrogen receptor (ER)/progesterone receptor (PR)-positive, ERBB2, and triple-negative subtypes of DCIS. METHODS: Based on ER, PR, and ERBB2 status, a total of 18 high nuclear grade DCIS cases with no evidence of invasive breast carcinoma were selected along with 6 non-neoplastic controls. The 3 study groups were defined as ER/PR-positive, ERBB2, and triple-negative subtypes. RESULTS: A total of 49 genes were differentially expressed in the ERBB2 subtype compared with the ER/PR-positive and triple-negative groups. PROM1 was overexpressed in the ERBB2 subtype compared with ER/PR-positive and triple-negative subtypes. Other genes differentially expressed included TAOK1, AREG, AGR3, PEG10, and MMP9. CONCLUSIONS: Our study identified unique gene signatures in ERBB2-positive DCIS, which may be associated with the development of invasive breast carcinoma. The results may enhance our understanding of the progression of breast cancer and become the basis for developing new predictive biomarkers and therapeutic targets for DCIS.


Assuntos
Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/genética , Carcinoma Ductal de Mama/genética , Carcinoma Intraductal não Infiltrante/genética , Perfilação da Expressão Gênica , Receptor ErbB-2/metabolismo , Adulto , Idoso , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Carcinoma Ductal de Mama/metabolismo , Carcinoma Ductal de Mama/patologia , Carcinoma Intraductal não Infiltrante/metabolismo , Carcinoma Intraductal não Infiltrante/patologia , Feminino , Humanos , Pessoa de Meia-Idade , Invasividade Neoplásica , Estadiamento de Neoplasias , Prognóstico , Receptores de Estrogênio/metabolismo , Receptores de Progesterona/metabolismo , Taxa de Sobrevida
14.
Brief Bioinform ; 15(3): 369-75, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24162172

RESUMO

Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.


Assuntos
Biologia Computacional/métodos , Análise de Sequência/métodos , Biologia Computacional/tendências , Fractais , Modelos Estatísticos , Dinâmica não Linear , Alinhamento de Sequência , Análise de Sequência/estatística & dados numéricos
15.
Am J Pathol ; 185(3): 600-1, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25701882

RESUMO

This editorial discusses the rise of computational pathology as a major driver of experimental pathology research.


Assuntos
Patologia , Biologia Computacional , Humanos , Pesquisa
16.
BMC Bioinformatics ; 15: 176, 2014 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-24913605

RESUMO

BACKGROUND: Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' "Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. RESULTS: QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running "download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. CONCLUSIONS: QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.


Assuntos
Design de Software , Navegador , Biologia Computacional/métodos , Genoma , Genômica/métodos , Humanos , Streptococcus pneumoniae/genética
17.
BMC Bioinformatics ; 15: 28, 2014 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-24467687

RESUMO

BACKGROUND: Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. RESULTS: To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). CONCLUSIONS: Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Proteoma/genética , Proteômica/métodos , Algoritmos , Pesquisa Biomédica , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Neoplasias/metabolismo , Filogenia , Polimorfismo de Nucleotídeo Único , Proteoma/classificação , Proteoma/metabolismo , Interface Usuário-Computador
18.
Bioinformatics ; 29(10): 1333-40, 2013 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-23595662

RESUMO

MOTIVATION: Since 2011, The Cancer Genome Atlas' (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. RESULTS: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium's (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. AVAILABILITY: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial. CONTACT: robbinsd@uab.edu.


Assuntos
Armazenamento e Recuperação da Informação , Neoplasias/genética , Genoma Humano , Humanos , Internet , Linguagens de Programação
19.
ArXiv ; 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38903738

RESUMO

Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.

20.
Artigo em Inglês | MEDLINE | ID: mdl-38827109

RESUMO

Motivation: The proliferation of genetic testing and consumer genomics represents a logistic challenge to the personalized use of GWAS data in VCF format. Specifically, the challenge of retrieving target genetic variation from large compressed files filled with unrelated variation information. Compounding the data traversal challenge, privacy-sensitive VCF files are typically managed as large stand-alone single files (no companion index file) composed of variable-sized compressed chunks, hosted in consumer-facing environments with no native support for hosted execution. Results: A portable JavaScript module was developed to support in-browser fetching of partial content using byte-range requests. This includes on-the-fly decompressing irregularly positioned compressed chunks, coupled with a binary search algorithm iteratively identifying chromosome-position ranges. The in-browser zero-footprint solution (no downloads, no installations) enables the interoperability, reusability, and user-facing governance advanced by the FAIR principles for stewardship of scientific data. Availability - https://episphere.github.io/vcf, including supplementary material.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa