Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 133
Filtrar
Más filtros

País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Pathol ; 262(3): 271-288, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-38230434

RESUMEN

Recent advances in the field of immuno-oncology have brought transformative changes in the management of cancer patients. The immune profile of tumours has been found to have key value in predicting disease prognosis and treatment response in various cancers. Multiplex immunohistochemistry and immunofluorescence have emerged as potent tools for the simultaneous detection of multiple protein biomarkers in a single tissue section, thereby expanding opportunities for molecular and immune profiling while preserving tissue samples. By establishing the phenotype of individual tumour cells when distributed within a mixed cell population, the identification of clinically relevant biomarkers with high-throughput multiplex immunophenotyping of tumour samples has great potential to guide appropriate treatment choices. Moreover, the emergence of novel multi-marker imaging approaches can now provide unprecedented insights into the tumour microenvironment, including the potential interplay between various cell types. However, there are significant challenges to widespread integration of these technologies in daily research and clinical practice. This review addresses the challenges and potential solutions within a structured framework of action from a regulatory and clinical trial perspective. New developments within the field of immunophenotyping using multiplexed tissue imaging platforms and associated digital pathology are also described, with a specific focus on translational implications across different subtypes of cancer. © 2024 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Biomarcadores de Tumor/genética , Pronóstico , Fenotipo , Reino Unido , Microambiente Tumoral
2.
J Pathol ; 260(5): 514-532, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37608771

RESUMEN

Modern histologic imaging platforms coupled with machine learning methods have provided new opportunities to map the spatial distribution of immune cells in the tumor microenvironment. However, there exists no standardized method for describing or analyzing spatial immune cell data, and most reported spatial analyses are rudimentary. In this review, we provide an overview of two approaches for reporting and analyzing spatial data (raster versus vector-based). We then provide a compendium of spatial immune cell metrics that have been reported in the literature, summarizing prognostic associations in the context of a variety of cancers. We conclude by discussing two well-described clinical biomarkers, the breast cancer stromal tumor infiltrating lymphocytes score and the colon cancer Immunoscore, and describe investigative opportunities to improve clinical utility of these spatial biomarkers. © 2023 The Pathological Society of Great Britain and Ireland.


Asunto(s)
Neoplasias del Colon , Humanos , Biomarcadores , Benchmarking , Linfocitos Infiltrantes de Tumor , Análisis Espacial , Microambiente Tumoral
3.
J Pathol ; 260(5): 498-513, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37608772

RESUMEN

The clinical significance of the tumor-immune interaction in breast cancer is now established, and tumor-infiltrating lymphocytes (TILs) have emerged as predictive and prognostic biomarkers for patients with triple-negative (estrogen receptor, progesterone receptor, and HER2-negative) breast cancer and HER2-positive breast cancer. How computational assessments of TILs might complement manual TIL assessment in trial and daily practices is currently debated. Recent efforts to use machine learning (ML) to automatically evaluate TILs have shown promising results. We review state-of-the-art approaches and identify pitfalls and challenges of automated TIL evaluation by studying the root cause of ML discordances in comparison to manual TIL quantification. We categorize our findings into four main topics: (1) technical slide issues, (2) ML and image analysis aspects, (3) data challenges, and (4) validation issues. The main reason for discordant assessments is the inclusion of false-positive areas or cells identified by performance on certain tissue patterns or design choices in the computational implementation. To aid the adoption of ML for TIL assessment, we provide an in-depth discussion of ML and image analysis, including validation issues that need to be considered before reliable computational reporting of TILs can be incorporated into the trial and routine clinical management of patients with triple-negative breast cancer. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Asunto(s)
Neoplasias Mamarias Animales , Neoplasias de la Mama Triple Negativas , Humanos , Animales , Linfocitos Infiltrantes de Tumor , Biomarcadores , Aprendizaje Automático
4.
Am J Epidemiol ; 192(6): 995-1005, 2023 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-36804665

RESUMEN

Data sharing is essential for reproducibility of epidemiologic research, replication of findings, pooled analyses in consortia efforts, and maximizing study value to address multiple research questions. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of data sharing. Epidemiological practices that follow Findable, Accessible, Interoperable, Reusable (FAIR) principles can address these barriers by making data resources findable with the necessary metadata, accessible to authorized users, and interoperable with other data, to optimize the reuse of resources with appropriate credit to its creators. We provide an overview of these principles and describe approaches for implementation in epidemiology. Increasing degrees of FAIRness can be achieved by moving data and code from on-site locations to remote, accessible ("Cloud") data servers, using machine-readable and nonproprietary files, and developing open-source code. Adoption of these practices will improve daily work and collaborative analyses and facilitate compliance with data sharing policies from funders and scientific journals. Achieving a high degree of FAIRness will require funding, training, organizational support, recognition, and incentives for sharing research resources, both data and code. However, these costs are outweighed by the benefits of making research more reproducible, impactful, and equitable by facilitating the reuse of precious research resources by the scientific community.


Asunto(s)
Confidencialidad , Difusión de la Información , Humanos , Reproducibilidad de los Resultados , Programas Informáticos , Estudios Epidemiológicos
5.
Bioinformatics ; 38(18): 4434-4436, 2022 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-35900159

RESUMEN

MOTIVATION: The Division of Cancer Epidemiology and Genetics (DCEG) and the Division of Cancer Prevention (DCP) at the National Cancer Institute (NCI) have recently generated genome-wide association study (GWAS) data for multiple traits in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Genomic Atlas project. The GWAS included 110 000 participants. The dissemination of the genetic association data through a data portal called GWAS Explorer, in a manner that addresses the modern expectations of FAIR reusability by data scientists and engineers, is the main motivation for the development of the open-source JavaScript software development kit (SDK) reported here. RESULTS: The PLCO GWAS Explorer resource relies on a public stateless HTTP application programming interface (API) deployed as the sole backend service for both the landing page's web application and third-party analytical workflows. The core PLCOjs SDK is mapped to each of the API methods, and also to each of the reference graphic visualizations in the GWAS Explorer. A few additional visualization methods extend it. As is the norm with web SDKs, no download or installation is needed and modularization supports targeted code injection for web applications, reactive notebooks (Observable) and node-based web services. AVAILABILITY AND IMPLEMENTATION: code at https://github.com/episphere/plco; project page at https://episphere.github.io/plco.


Asunto(s)
Neoplasias Colorrectales , Neoplasias Ováricas , Estados Unidos , Masculino , Humanos , Femenino , Estudio de Asociación del Genoma Completo , National Cancer Institute (U.S.) , Próstata , Programas Informáticos , Neoplasias Ováricas/genética , Pulmón
6.
BMC Med Inform Decis Mak ; 23(1): 238, 2023 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-37880712

RESUMEN

BACKGROUND: Online questionnaires are commonly used to collect information from participants in epidemiological studies. This requires building questionnaires using machine-readable formats that can be delivered to study participants using web-based technologies such as progressive web applications. However, the paucity of open-source markup standards with support for complex logic make collaborative development of web-based questionnaire modules difficult. This often prevents interoperability and reusability of questionnaire modules across epidemiological studies. RESULTS: We developed an open-source markup language for presentation of questionnaire content and logic, Quest, within a real-time renderer that enables the user to test logic (e.g., skip patterns) and view the structure of data collection. We provide the Quest markup language, an in-browser markup rendering tool, questionnaire development tool and an example web application that embeds the renderer, developed for The Connect for Cancer Prevention Study. CONCLUSION: A markup language can specify both the content and logic of a questionnaire as plain text. Questionnaire markup, such as Quest, can become a standard format for storing questionnaires or sharing questionnaires across the web. Quest is a step towards generation of FAIR data in epidemiological studies by facilitating reusability of questionnaires and data interoperability using open-source tools.


Asunto(s)
Programas Informáticos , Humanos , Encuestas y Cuestionarios , Estudios Epidemiológicos
7.
Cancer Causes Control ; 33(8): 1107-1120, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35759080

RESUMEN

Cancer heterogeneities hold the key to a deeper understanding of cancer etiology and progression and the discovery of more precise cancer therapy. Modern pathological and molecular technologies offer a powerful set of tools to profile tumor heterogeneities at multiple levels in large patient populations, from DNA to RNA, protein and epigenetics, and from tumor tissues to tumor microenvironment and liquid biopsy. When coupled with well-validated epidemiologic methodology and well-characterized epidemiologic resources, the rich tumor pathological and molecular tumor information provide new research opportunities at an unprecedented breadth and depth. This is the research space where Molecular Pathological Epidemiology (MPE) emerged over a decade ago and has been thriving since then. As a truly multidisciplinary field, MPE embraces collaborations from diverse fields including epidemiology, pathology, immunology, genetics, biostatistics, bioinformatics, and data science. Since first convened in 2013, the International MPE Meeting series has grown into a dynamic and dedicated platform for experts from these disciplines to communicate novel findings, discuss new research opportunities and challenges, build professional networks, and educate the next-generation scientists. Herein, we share the proceedings of the Fifth International MPE meeting, held virtually online, on May 24 and 25, 2021. The meeting consisted of 21 presentations organized into the three main themes, which were recent integrative MPE studies, novel cancer profiling technologies, and new statistical and data science approaches. Looking forward to the near future, the meeting attendees anticipated continuous expansion and fruition of MPE research in many research fronts, particularly immune-epidemiology, mutational signatures, liquid biopsy, and health disparities.


Asunto(s)
Neoplasias , Patología Molecular , Humanos , Mutación , Neoplasias/epidemiología , Neoplasias/genética , Neoplasias/terapia , Patología Molecular/métodos , Microambiente Tumoral
8.
Bioinformatics ; 37(14): 2073-2074, 2021 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-33135727

RESUMEN

MOTIVATION: Mortality Tracker is an in-browser application for data wrangling, analysis, dissemination and visualization of public time series of mortality in the United States. It was developed in response to requests by epidemiologists for portable real time assessment of the effect of COVID-19 on other causes of death and all-cause mortality. This is performed by comparing 2020 real time values with observations from the same week in the previous 5 years, and by enabling the extraction of temporal snapshots of mortality series that facilitate modeling the interdependence between its causes. RESULTS: Our solution employs a scalable 'Data Commons at Web Scale' approach that abstracts all stages of the data cycle as in-browser components. Specifically, the data wrangling computation, not just the orchestration of data retrieval, takes place in the browser, without any requirement to download or install software. This approach, where operations that would normally be computed server-side are mapped to in-browser SDKs, is sometimes loosely described as Web APIs, a designation adopted here. AVAILABILITYAND IMPLEMENTATION: https://episphere.github.io/mortalitytracker; webcast demo: youtu.be/ZsvCe7cZzLo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
COVID-19 , Computadores , Humanos , Almacenamiento y Recuperación de la Información , SARS-CoV-2 , Programas Informáticos
9.
Ann Intern Med ; 174(4): 437-443, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33316174

RESUMEN

BACKGROUND: Excess death estimates quantify the full impact of the coronavirus disease 2019 (COVID-19) pandemic. Widely reported U.S. excess death estimates have not accounted for recent population changes, especially increases in the population older than 65 years. OBJECTIVE: To estimate excess deaths in the United States in 2020, after accounting for population changes. DESIGN: Surveillance study. SETTING: United States, March to August 2020. PARTICIPANTS: All decedents. MEASUREMENTS: Age-specific excess deaths in the United States from 1 March to 31 August 2020 compared with 2015 to 2019 were estimated, after changes in population size and age were taken into account, by using Centers for Disease Control and Prevention provisional death data and U.S. Census Bureau population estimates. Cause-specific excess deaths were estimated by month and age. RESULTS: From March through August 2020, 1 671 400 deaths were registered in the United States, including 173 300 COVID-19 deaths. An average of 1 370 000 deaths were reported over the same months during 2015 to 2019, for a crude excess of 301 400 deaths (128 100 non-COVID-19 deaths). However, the 2020 U.S. population includes 5.04 million more persons aged 65 years and older than the average population in 2015 to 2019 (a 10% increase). After population changes were taken into account, an estimated 217 900 excess deaths occurred from March through August 2020 (173 300 COVID-19 and 44 600 non-COVID-19 deaths). Most excess non-COVID-19 deaths occurred in April, July, and August, and 34 900 (78%) were in persons aged 25 to 64 years. Diabetes, Alzheimer disease, and heart disease caused the most non-COVID-19 excess deaths. LIMITATION: Provisional death data are underestimated because of reporting delays. CONCLUSION: The COVID-19 pandemic resulted in an estimated 218 000 excess deaths in the United States between March and August 2020, and 80% of those deaths had COVID-19 as the underlying cause. Accounting for population changes substantially reduced the excess non-COVID-19 death estimates, providing important information for guiding future clinical and public health interventions. PRIMARY FUNDING SOURCE: National Cancer Institute.


Asunto(s)
Envejecimiento , COVID-19/mortalidad , Mortalidad/tendencias , Neumonía Viral/mortalidad , Crecimiento Demográfico , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pandemias , Neumonía Viral/virología , Vigilancia de la Población , Factores de Riesgo , SARS-CoV-2 , Estados Unidos/epidemiología
10.
Ann Intern Med ; 174(12): 1693-1699, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34606321

RESUMEN

BACKGROUND: Although racial/ethnic disparities in U.S. COVID-19 death rates are striking, focusing on COVID-19 deaths alone may underestimate the true effect of the pandemic on disparities. Excess death estimates capture deaths both directly and indirectly caused by COVID-19. OBJECTIVE: To estimate U.S. excess deaths by racial/ethnic group. DESIGN: Surveillance study. SETTING: United States. PARTICIPANTS: All decedents. MEASUREMENTS: Excess deaths and excess deaths per 100 000 persons from March to December 2020 were estimated by race/ethnicity, sex, age group, and cause of death, using provisional death certificate data from the Centers for Disease Control and Prevention (CDC) and U.S. Census Bureau population estimates. RESULTS: An estimated 2.88 million deaths occurred between March and December 2020. Compared with the number of expected deaths based on 2019 data, 477 200 excess deaths occurred during this period, with 74% attributed to COVID-19. Age-standardized excess deaths per 100 000 persons among Black, American Indian/Alaska Native (AI/AN), and Latino males and females were more than double those in White and Asian males and females. Non-COVID-19 excess deaths also disproportionately affected Black, AI/AN, and Latino persons. Compared with White males and females, non-COVID-19 excess deaths per 100 000 persons were 2 to 4 times higher in Black, AI/AN, and Latino males and females, including deaths due to diabetes, heart disease, cerebrovascular disease, and Alzheimer disease. Excess deaths in 2020 resulted in substantial widening of racial/ethnic disparities in all-cause mortality from 2019 to 2020. LIMITATIONS: Completeness and availability of provisional CDC data; no estimates of precision around results. CONCLUSION: There were profound racial/ethnic disparities in excess deaths in the United States in 2020 during the COVID-19 pandemic, resulting in rapid increases in racial/ethnic disparities in all-cause mortality between 2019 and 2020. PRIMARY FUNDING SOURCE: National Institutes of Health Intramural Research Program.


Asunto(s)
COVID-19/etnología , COVID-19/mortalidad , Minorías Étnicas y Raciales/estadística & datos numéricos , Disparidades en el Estado de Salud , Pandemias , Adolescente , Adulto , Distribución por Edad , Anciano , Anciano de 80 o más Años , Causas de Muerte , Niño , Preescolar , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Vigilancia de la Población , SARS-CoV-2 , Distribución por Sexo , Estados Unidos/epidemiología , Adulto Joven
11.
Am J Pathol ; 190(7): 1491-1504, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32277893

RESUMEN

Quantitative assessment of spatial relations between tumor and tumor-infiltrating lymphocytes (TIL) is increasingly important in both basic science and clinical aspects of breast cancer research. We have developed and evaluated convolutional neural network analysis pipelines to generate combined maps of cancer regions and TILs in routine diagnostic breast cancer whole slide tissue images. The combined maps provide insight about the structural patterns and spatial distribution of lymphocytic infiltrates and facilitate improved quantification of TILs. Both tumor and TIL analyses were evaluated by using three convolutional neural network networks (34-layer ResNet, 16-layer VGG, and Inception v4); the results compared favorably with those obtained by using the best published methods. We have produced open-source tools and a public data set consisting of tumor/TIL maps for 1090 invasive breast cancer images from The Cancer Genome Atlas. The maps can be downloaded for further downstream analyses.


Asunto(s)
Neoplasias de la Mama/patología , Aprendizaje Profundo , Linfocitos Infiltrantes de Tumor/patología , Neoplasias de la Mama/inmunología , Femenino , Humanos , Linfocitos Infiltrantes de Tumor/inmunología , Programa de VERF
12.
PLoS Biol ; 16(12): e3000099, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30596645

RESUMEN

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Animales , Comunicación , Biología Computacional/normas , Genoma , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Medicina de Precisión/tendencias , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/normas , Programas Informáticos , Flujo de Trabajo
13.
Bioinformatics ; 33(4): 547-548, 2017 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-27797761

RESUMEN

Summary: The move of computational genomics workflows to Cloud Computing platforms is associated with a new level of integration and interoperability that challenges existing data representation formats. The Variant Calling Format (VCF) is in a particularly sensitive position in that regard, with both clinical and consumer-facing analysis tools relying on this self-contained description of genomic variation in Next Generation Sequencing (NGS) results. In this report we identify an isomorphic map between VCF and the reference Resource Description Framework. RDF is advanced by the World Wide Web Consortium (W3C) to enable representations of linked data that are both distributed and discoverable. The resulting ability to decompose VCF reports of genomic variation without loss of context addresses the need to modularize and govern NGS pipelines for Precision Medicine. Specifically, it provides the flexibility (i.e. the indexing) needed to support the wide variety of clinical scenarios and patient-facing governance where only part of the VCF data is fitting. Availability and Implementation: Software libraries with a claim to be both domain-facing and consumer-facing have to pass the test of portability across the variety of devices that those consumers in fact adopt. That is, ideally the implementation should itself take place within the space defined by web technologies. Consequently, the isomorphic mapping function was implemented in JavaScript, and was tested in a variety of environments and devices, client and server side alike. These range from web browsers in mobile phones to the most popular micro service platform, NodeJS. The code is publicly available at https://github.com/ibl/VCFr , with a live deployment at: http://ibl.github.io/VCFr/ . Contact: jonas.almeida@stonybrookmedicine.edu.


Asunto(s)
Variación Genética , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genómica/métodos , Humanos , Almacenamiento y Recuperación de la Información , Semántica
14.
Cancer Control ; 24(1): 102-110, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-28178722

RESUMEN

BACKGROUND: The molecular signature of ductal carcinoma in situ (DCIS) in the breast is not well understood. Erb-b2 receptor tyrosine kinase 2 (ERBB2 [formerly known as HER2/neu]) positivity in DCIS is predictive of coexistent early invasive breast carcinoma. The aim of this study is to identify the gene-expression signature profiles of estrogen receptor (ER)/progesterone receptor (PR)-positive, ERBB2, and triple-negative subtypes of DCIS. METHODS: Based on ER, PR, and ERBB2 status, a total of 18 high nuclear grade DCIS cases with no evidence of invasive breast carcinoma were selected along with 6 non-neoplastic controls. The 3 study groups were defined as ER/PR-positive, ERBB2, and triple-negative subtypes. RESULTS: A total of 49 genes were differentially expressed in the ERBB2 subtype compared with the ER/PR-positive and triple-negative groups. PROM1 was overexpressed in the ERBB2 subtype compared with ER/PR-positive and triple-negative subtypes. Other genes differentially expressed included TAOK1, AREG, AGR3, PEG10, and MMP9. CONCLUSIONS: Our study identified unique gene signatures in ERBB2-positive DCIS, which may be associated with the development of invasive breast carcinoma. The results may enhance our understanding of the progression of breast cancer and become the basis for developing new predictive biomarkers and therapeutic targets for DCIS.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Neoplasias de la Mama/genética , Carcinoma Ductal de Mama/genética , Carcinoma Intraductal no Infiltrante/genética , Perfilación de la Expresión Génica , Receptor ErbB-2/metabolismo , Adulto , Anciano , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Carcinoma Ductal de Mama/metabolismo , Carcinoma Ductal de Mama/patología , Carcinoma Intraductal no Infiltrante/metabolismo , Carcinoma Intraductal no Infiltrante/patología , Femenino , Humanos , Persona de Mediana Edad , Invasividad Neoplásica , Estadificación de Neoplasias , Pronóstico , Receptores de Estrógenos/metabolismo , Receptores de Progesterona/metabolismo , Tasa de Supervivencia
15.
Brief Bioinform ; 15(3): 369-75, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24162172

RESUMEN

Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia/métodos , Biología Computacional/tendencias , Fractales , Modelos Estadísticos , Dinámicas no Lineales , Alineación de Secuencia , Análisis de Secuencia/estadística & datos numéricos
16.
Am J Pathol ; 185(3): 600-1, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25701882

RESUMEN

This editorial discusses the rise of computational pathology as a major driver of experimental pathology research.


Asunto(s)
Patología , Biología Computacional , Humanos , Investigación
17.
Cancer Control ; 23(4): 383-389, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27842327

RESUMEN

BACKGROUND: The scarcity of tissues from racial and ethnic minorities at biobanks poses a scientific constraint to research addressing health disparities in minority populations. METHODS: To address this gap, the Minority Biospecimen/Biobanking Geographic Management Program for region 3 (BMaP-3) established a working infrastructure for a "biobanking" hub in the southeastern United States and Puerto Rico. Herein we describe the steps taken to build this infrastructure, evaluate the feasibility of collecting formalin-fixed, paraffin-embedded tissue blocks and associated data from a single cancer type (breast), and create a web-based database and tissue microarrays (TMAs). RESULTS: Cancer registry data from 6 partner institutions were collected, representing 12,408 entries from 8,279 unique patients with breast cancer (years 2001-2011). Data were harmonized and merged, and deidentified information was made available online. A TMA was constructed from formalin-fixed, paraffin-embedded samples of invasive ductal carcinoma (IDC) representing 427 patients with breast cancer (147 African Americans, 168 Hispanics, and 112 non-Hispanic whites) and was annotated according to biomarker status and race/ethnicity. Biomarker analysis of the TMA was consistent with the literature. CONCLUSIONS: Contributions from participating institutions have facilitated a robust research tool. TMAs of IDC have now been released for 5 projects at 5 different institutions.


Asunto(s)
Carcinoma Ductal de Mama/epidemiología , Adulto , Anciano , Anciano de 80 o más Años , Etnicidad , Femenino , Humanos , Inmunohistoquímica , Persona de Mediana Edad , Análisis de Matrices Tisulares
18.
BMC Bioinformatics ; 15: 176, 2014 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-24913605

RESUMEN

BACKGROUND: Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' "Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. RESULTS: QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running "download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. CONCLUSIONS: QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.


Asunto(s)
Diseño de Software , Navegador Web , Biología Computacional/métodos , Genoma , Genómica/métodos , Humanos , Streptococcus pneumoniae/genética
19.
BMC Bioinformatics ; 15: 28, 2014 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-24467687

RESUMEN

BACKGROUND: Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. RESULTS: To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). CONCLUSIONS: Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , Proteoma/genética , Proteómica/métodos , Algoritmos , Investigación Biomédica , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Humanos , Neoplasias/metabolismo , Filogenia , Polimorfismo de Nucleótido Simple , Proteoma/clasificación , Proteoma/metabolismo , Interfaz Usuario-Computador
20.
Bioinformatics ; 29(10): 1333-40, 2013 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-23595662

RESUMEN

MOTIVATION: Since 2011, The Cancer Genome Atlas' (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. RESULTS: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium's (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. AVAILABILITY: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial. CONTACT: robbinsd@uab.edu.


Asunto(s)
Almacenamiento y Recuperación de la Información , Neoplasias/genética , Genoma Humano , Humanos , Internet , Lenguajes de Programación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA