Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Am J Pathol ; 2020 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-32277893

RESUMO

Quantitative assessment of tumor-tumor-infiltrating lymphocyte (TIL) spatial relations is increasingly important in both basic science and clinical aspects of breast cancer research. We have developed and evaluated convolutional neural network analysis pipelines to generate combined maps of cancer regions and TILs in routine diagnostic breast cancer whole slide tissue images. The combined maps provide insight about the structural patterns and spatial distribution of lymphocytic infiltrates and facilitate improved quantification of TILs. Both tumor and TIL analyses were evaluated by using three convolutional neural network networks (34-layer ResNet, 16-layer VGG, and Inception v4); we showed that the results compared favorably with those obtained by using the best published methods. We have produced open-source tools and a public data set consisting of tumor/TIL maps for 1.090 invasive breast cancer images from The Cancer Genome Atlas. The maps can be downloaded for further downstream analyses.

2.
Genome Biol ; 20(1): 144, 2019 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-31345254

RESUMO

BACKGROUND: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. RESULTS: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. CONCLUSION: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.


Assuntos
Análise de Sequência , Benchmarking , Transferência Genética Horizontal , Internet , Filogenia , Sequências Reguladoras de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de Proteína , Software
3.
PeerJ ; 7: e6230, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30671301

RESUMO

In a previous report, we explored the serverless OpenHealth approach to the Web as a Global Compute space. That approach relies on the modern browser full stack, and, in particular, its configuration for application assembly by code injection. The opportunity, and need, to expand this approach has since increased markedly, reflecting a wider adoption of Open Data policies by Public Health Agencies. Here, we describe how the serverless scaling challenge can be achieved by the isomorphic mapping between the remote data layer API and a local (client-side, in-browser) operator. This solution is validated with an accompanying interactive web application (bit.ly/loadsparcs) capable of real-time traversal of New York's 20 million patient records of the Statewide Planning and Research Cooperative System (SPARCS), and is compared with alternative approaches. The results obtained strengthen the argument that the FAIR reproducibility needed for Population Science applications in the age of P4 Medicine is particularly well served by the Web platform.

4.
PLoS One ; 13(8): e0202139, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30130366

RESUMO

Kinomics is an emerging field of science that involves the study of global kinase activity. As kinases are essential players in virtually all cellular activities, kinomic testing can directly examine protein function, distinguishing kinomics from more remote, upstream components of the central dogma, such as genomics and transcriptomics. While there exist several different approaches for kinomic research, peptide microarrays are the most widely used and involve kinase activity assessment through measurement of phosphorylation of peptide substrates on the array. Unfortunately, bioinformatic tools for analyzing kinomic data are quite limited necessitating the development of accessible open access software in order to facilitate standardization and dissemination of kinomic data for scientific use. Here, we examine and present tools for data analysis for the popular PamChip® (PamGene International) kinomic peptide microarray. As a result, we propose (1) a procedural optimization of kinetic curve data capture, (2) new methods for background normalization, (3) guidelines for the detection of outliers during parameterization, and (4) a standardized data model to store array data at various analytical points. In order to utilize the new data model, we developed a series of tools to implement the new methods and to visualize the various data models. In the interest of accessibility, we developed this new toolbox as a series of JavaScript procedures that can be utilized as either server side resources (easily packaged as web services) or as client side scripts (web applications running in the browser). The aggregation of these tools within a Kinomics Toolbox provides an extensible web based analytic platform that researchers can engage directly and web programmers can extend. As a proof of concept, we developed three analytical tools, a technical reproducibility visualizer, an ANOVA based detector of differentially phosphorylated peptides, and a heatmap display with hierarchical clustering.


Assuntos
Biologia Computacional/métodos , Fosfotransferases/metabolismo , Análise Serial de Proteínas , Proteoma , Proteômica , Software , Navegador , Algoritmos , Linhagem Celular , Ativação Enzimática , Humanos , Fosfotransferases/química , Análise Serial de Proteínas/métodos , Proteômica/métodos , Reprodutibilidade dos Testes
5.
PLoS Biol ; 16(12): e3000099, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30596645

RESUMO

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Animais , Comunicação , Biologia Computacional/normas , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Medicina de Precisão/tendências , Reprodutibilidade dos Testes , Análise de Sequência de DNA/normas , Software , Fluxo de Trabalho
6.
Cancer Res ; 77(21): e79-e82, 2017 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-29092946

RESUMO

Well-curated sets of pathology image features will be critical to clinical studies that aim to evaluate and predict treatment responses. Researchers require information synthesized across multiple biological scales, from the patient to the molecular scale, to more effectively study cancer. This article describes a suite of services and web applications that allow users to select regions of interest in whole slide tissue images, run a segmentation pipeline on the selected regions to extract nuclei and compute shape, size, intensity, and texture features, store and index images and analysis results, and visualize and explore images and computed features. All the services are deployed as containers and the user-facing interfaces as web-based applications. The set of containers and web applications presented in this article is used in cancer research studies of morphologic characteristics of tumor tissues. The software is free and open source. Cancer Res; 77(21); e79-82. ©2017 AACR.


Assuntos
Interpretação de Imagem Assistida por Computador , Neoplasias/patologia , Software , Humanos , Internet , Interface Usuário-Computador
7.
Cancer Control ; 24(1): 102-110, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28178722

RESUMO

BACKGROUND: The molecular signature of ductal carcinoma in situ (DCIS) in the breast is not well understood. Erb-b2 receptor tyrosine kinase 2 (ERBB2 [formerly known as HER2/neu]) positivity in DCIS is predictive of coexistent early invasive breast carcinoma. The aim of this study is to identify the gene-expression signature profiles of estrogen receptor (ER)/progesterone receptor (PR)-positive, ERBB2, and triple-negative subtypes of DCIS. METHODS: Based on ER, PR, and ERBB2 status, a total of 18 high nuclear grade DCIS cases with no evidence of invasive breast carcinoma were selected along with 6 non-neoplastic controls. The 3 study groups were defined as ER/PR-positive, ERBB2, and triple-negative subtypes. RESULTS: A total of 49 genes were differentially expressed in the ERBB2 subtype compared with the ER/PR-positive and triple-negative groups. PROM1 was overexpressed in the ERBB2 subtype compared with ER/PR-positive and triple-negative subtypes. Other genes differentially expressed included TAOK1, AREG, AGR3, PEG10, and MMP9. CONCLUSIONS: Our study identified unique gene signatures in ERBB2-positive DCIS, which may be associated with the development of invasive breast carcinoma. The results may enhance our understanding of the progression of breast cancer and become the basis for developing new predictive biomarkers and therapeutic targets for DCIS.


Assuntos
Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/genética , Carcinoma Ductal de Mama/genética , Carcinoma Intraductal não Infiltrante/genética , Perfilação da Expressão Gênica , Receptor ErbB-2/metabolismo , Adulto , Idoso , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Carcinoma Ductal de Mama/metabolismo , Carcinoma Ductal de Mama/patologia , Carcinoma Intraductal não Infiltrante/metabolismo , Carcinoma Intraductal não Infiltrante/patologia , Feminino , Humanos , Pessoa de Meia-Idade , Invasividade Neoplásica , Estadiamento de Neoplasias , Prognóstico , Receptores Estrogênicos/metabolismo , Receptores de Progesterona/metabolismo , Taxa de Sobrevida
8.
Bioinformatics ; 33(4): 547-548, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27797761

RESUMO

Summary: The move of computational genomics workflows to Cloud Computing platforms is associated with a new level of integration and interoperability that challenges existing data representation formats. The Variant Calling Format (VCF) is in a particularly sensitive position in that regard, with both clinical and consumer-facing analysis tools relying on this self-contained description of genomic variation in Next Generation Sequencing (NGS) results. In this report we identify an isomorphic map between VCF and the reference Resource Description Framework. RDF is advanced by the World Wide Web Consortium (W3C) to enable representations of linked data that are both distributed and discoverable. The resulting ability to decompose VCF reports of genomic variation without loss of context addresses the need to modularize and govern NGS pipelines for Precision Medicine. Specifically, it provides the flexibility (i.e. the indexing) needed to support the wide variety of clinical scenarios and patient-facing governance where only part of the VCF data is fitting. Availability and Implementation: Software libraries with a claim to be both domain-facing and consumer-facing have to pass the test of portability across the variety of devices that those consumers in fact adopt. That is, ideally the implementation should itself take place within the space defined by web technologies. Consequently, the isomorphic mapping function was implemented in JavaScript, and was tested in a variety of environments and devices, client and server side alike. These range from web browsers in mobile phones to the most popular micro service platform, NodeJS. The code is publicly available at https://github.com/ibl/VCFr , with a live deployment at: http://ibl.github.io/VCFr/ . Contact: jonas.almeida@stonybrookmedicine.edu.


Assuntos
Variação Genética , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Genômica/métodos , Humanos , Armazenamento e Recuperação da Informação , Semântica
9.
J Proteomics Bioinform ; 9(5): 151-157, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-27601856

RESUMO

Kinases play a role in every cellular process involved in tumorigenesis ranging from proliferation, migration, and protein synthesis to DNA repair. While genetic sequencing has identified most kinases in the human genome, it does not describe the 'kinome' at the level of activity of kinases against their substrate targets. An attempt to address that limitation and give researchers a more direct view of cellular kinase activity is found in the PamGene PamChip® system, which records and compares the phosphorylation of 144 tyrosine or serine/threonine peptides as they are phosphorylated by cellular kinases. Accordingly, the kinetics of this time dependent kinomic signal needs to be well understood in order to transduce a parameter set into an accurate and meaningful mathematical model. Here we report the analysis and mathematical modeling of kinomic time series, which achieves a more accurate description of the accumulation of phosphorylated product than the current model, which assumes first order enzyme-substrate kinetics. Reproducibility of the proposed solution was of particular attention. Specifically, the non-linear parameterization procedure is delivered as a public open source web application where kinomic time series can be accurately decomposed into the model's two parameter values measuring phosphorylation rate and capacity. The ability to deliver model parameterization entirely as a client side web application is an important result on its own given increasing scientific preoccupation with reproducibility. There is also no need for a potentially transitory and opaque server-side component maintained by the authors, nor of exchanging potentially sensitive data as part of the model parameterization process since the code is transferred to the browser client where it can be inspected and executed.

10.
AMIA Annu Symp Proc ; 2016: 342-351, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28269829

RESUMO

The "Box model" allows users with no particular training in informatics, or access to specialized infrastructure, operate generic cloud computing resources through a temporary URI dereferencing mechanism known as "drop-file-picker API" ("picker API" for sort). This application programming interface (API) was popularized in the web app development community by DropBox, and is now a consumer-facing feature of all major cloud computing platforms such as Box.com, Google Drive and Amazon S3. This reports describes a prototype web service application that uses picker APIs to expose a new, "cloudified", API tailored for image analysis, without compromising the private governance of the data exposed. In order to better understand this cross-platform cloud computing landscape, we first measured the time for both transfer and traversing of large image files generated by whole slide imaging (WSI) in Digital Pathology. The verification that there is extensive interconnectivity between cloud resources let to the development of a prototype software application that exposes an image-traversing REST API to image files stored in any of the consumer-facing "boxes". In summary, an image file can be upload/synchronized into a any cloud resource with a file picker API and the prototype service described here will expose an HTTP REST API that remains within the safety of the user's own governance. The open source prototype is publicly available at sbu-bmi.github.io/imagebox. Availability The accompanying prototype application is made publicly available, fully functional, with open source, at http://sbu-bmi.github.io/imagebox://sbu-bmi.github.io/imagebox. An illustrative webcasted use of this Web App is included with the project codebase at https://github.com/SBU-BMI/imageboxs://github.com/SBU-BMI/imagebox.


Assuntos
Computação em Nuvem , Sistemas Computacionais , Processamento de Imagem Assistida por Computador , Software , Internet
11.
Arch Pathol Lab Med ; 140(1): 41-50, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26098131

RESUMO

CONTEXT: We define the scope and needs within the new discipline of computational pathology, a discipline critical to the future of both the practice of pathology and, more broadly, medical practice in general. OBJECTIVE: To define the scope and needs of computational pathology. DATA SOURCES: A meeting was convened in Boston, Massachusetts, in July 2014 prior to the annual Association of Pathology Chairs meeting, and it was attended by a variety of pathologists, including individuals highly invested in pathology informatics as well as chairs of pathology departments. CONCLUSIONS: The meeting made recommendations to promote computational pathology, including clearly defining the field and articulating its value propositions; asserting that the value propositions for health care systems must include means to incorporate robust computational approaches to implement data-driven methods that aid in guiding individual and population health care; leveraging computational pathology as a center for data interpretation in modern health care systems; stating that realizing the value proposition will require working with institutional administrations, other departments, and pathology colleagues; declaring that a robust pipeline should be fostered that trains and develops future computational pathologists, for those with both pathology and nonpathology backgrounds; and deciding that computational pathology should serve as a hub for data-related research in health care systems. The dissemination of these recommendations to pathology and bioinformatics departments should help facilitate the development of computational pathology.


Assuntos
Biologia Computacional/métodos , Biologia Computacional/tendências , Patologia Clínica/métodos , Patologia Clínica/tendências , Humanos
12.
Am J Pathol ; 185(3): 600-1, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25701882

RESUMO

This editorial discusses the rise of computational pathology as a major driver of experimental pathology research.


Assuntos
Patologia , Biologia Computacional , Humanos , Pesquisa
13.
Pediatr Dev Pathol ; 18(3): 203-9, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25634794

RESUMO

Fetal and infant autopsy yields information regarding cause of death and the risk of recurrence, and it provides closure for parents. A significant number of perinatal evaluations are performed by general practice pathologists or trainees, who often find them time-consuming and/or intimidating. We sought to create a program that would enable pathologists to conduct these examinations with greater ease and to produce reliable, informative reports. We developed software that automatically generates a set of expected anthropometric and organ weight ranges by gestational age (GA)/postnatal age (PA) and a correlative table with the GA/PA that best matches the observed anthropometry. The program highlights measurement and organ weight discrepancies, enabling users to identify abnormalities. Furthermore, a Web page provides options for exporting and saving the data. Pathology residents utilized the program to determine ease of usage and benefits. The average time using conventional methods (ie, reference books and Internet sites) was compared to the average time using our Web page. Average time for novice and experienced residents using conventional methods was 26.7 minutes and 15 minutes, respectively. Using the Web page program, these times were reduced to an average of 3.2 minutes (P < 0.046 and P < 0.02, respectively). Participants found our program simple to use and the corrective features beneficial. This novel application saves time and improves the quality of fetal and infant autopsy reports. The software allows data exportation to reports and data storage for future analysis. Finalization of our software to enable usage by both university and private practice groups is in progress.


Assuntos
Antropometria/métodos , Autopsia/métodos , Patologia Clínica/métodos , Software , Feto , Idade Gestacional , Humanos , Recém-Nascido , Tamanho do Órgão
14.
AMIA Annu Symp Proc ; 2015: 297-305, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26958160

RESUMO

The financial incentives for data science applications leading to improved health outcomes, such as DSRIP (bit.ly/dsrip), are well-aligned with the broad adoption of Open Data by State and Federal agencies. This creates entirely novel opportunities for analytical applications that make exclusive use of the pervasive Web Computing platform. The framework described here explores this new avenue to contextualize Health data in a manner that relies exclusively on the native JavaScript interpreter and data processing resources of the ubiquitous Web Browser. The OpenHealth platform is made publicly available, and is publicly hosted with version control and open source, at https://github.com/mathbiol/openHealth. The different data/analytics workflow architectures explored are accompanied with live applications ranging from DSRIP, such as Hospital Inpatient Prevention Quality Indicators at http://bit.ly/pqiSuffolk, to The Cancer Genome Atlas (TCGA) as illustrated by http://bit.ly/tcgascopeGBM.


Assuntos
Biologia Computacional , Sistemas de Informação em Saúde , Saúde Pública , Acesso à Informação , Humanos , Internet , Bibliotecas Digitais , Software , Interface Usuário-Computador
15.
BMC Bioinformatics ; 15: 176, 2014 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-24913605

RESUMO

BACKGROUND: Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' "Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. RESULTS: QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running "download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. CONCLUSIONS: QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.


Assuntos
Design de Software , Navegador , Biologia Computacional/métodos , Genoma , Genômica/métodos , Humanos , Streptococcus pneumoniae/genética
16.
J Pathol Inform ; 5(1): 3, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24672738

RESUMO

BACKGROUND: Genetics and genomics have radically altered our understanding of breast cancer progression. However, the genomic basis of various histopathologic features of breast cancer is not yet well-defined. MATERIALS AND METHODS: The Cancer Genome Atlas (TCGA) is an international database containing a large collection of human cancer genome sequencing data. cBioPortal is a web tool developed for mining these sequencing data. We performed mining of TCGA sequencing data in an attempt to characterize the genomic features correlated with breast cancer histopathology. We first assessed the quality of the TCGA data using a group of genes with known alterations in various cancers. Both genome-wide gene mutation and copy number changes as well as a group of genes with a high frequency of genetic changes were then correlated with various histopathologic features of invasive breast cancer. RESULTS: Validation of TCGA data using a group of genes with known alterations in breast cancer suggests that the TCGA has accurately documented the genomic abnormalities of multiple malignancies. Further analysis of TCGA breast cancer sequencing data shows that accumulation of specific genomic defects is associated with higher tumor grade, larger tumor size and receptor negativity. Distinct groups of genomic changes were found to be associated with the different grades of invasive ductal carcinoma. The mutator role of the TP53 gene was validated by genomic sequencing data of invasive breast cancer and TP53 mutation was found to play a critical role in defining high tumor grade. CONCLUSIONS: Data mining of the TCGA genome sequencing data is an innovative and reliable method to help characterize the genomic abnormalities associated with histopathologic features of invasive breast cancer.

17.
BMC Bioinformatics ; 15: 28, 2014 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-24467687

RESUMO

BACKGROUND: Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. RESULTS: To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). CONCLUSIONS: Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Proteoma/genética , Proteômica/métodos , Algoritmos , Pesquisa Biomédica , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Neoplasias/metabolismo , Filogenia , Polimorfismo de Nucleotídeo Único , Proteoma/classificação , Proteoma/metabolismo , Interface Usuário-Computador
18.
PLoS One ; 10(4): e0123295, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25919366

RESUMO

Domoic acid toxicosis (DAT) in California sea lions (Zalophus californianus) is caused by exposure to the marine biotoxin domoic acid and has been linked to massive stranding events and mortality. Diagnosis is based on clinical signs in addition to the presence of domoic acid in body fluids. Chronic DAT further is characterized by reoccurring seizures progressing to status epilepticus. Diagnosis of chronic DAT is often slow and problematic, and minimally invasive tests for DAT have been the focus of numerous recent biomarker studies. The goal of this study was to retrospectively profile plasma proteins in a population of sea lions with chronic DAT and those without DAT using two dimensional gel electrophoresis to discover whether individual, multiple, or combinations of protein and clinical data could be utilized to identify sea lions with DAT. Using a training set of 32 sea lion sera, 20 proteins and their isoforms were identified that were significantly different between the two groups (p<0.05). Interestingly, 11 apolipoprotein E (ApoE) charge forms were decreased in DAT samples, indicating that ApoE charge form distributions may be important in the progression of DAT. In order to develop a classifier of chronic DAT, an independent blinded test set of 20 sea lions, seven with chronic DAT, was used to validate models utilizing ApoE charge forms and eosinophil counts. The resulting support vector machine had high sensitivity (85.7% with 92.3% negative predictive value) and high specificity (92.3% with 85.7% positive predictive value). These results suggest that ApoE and eosinophil counts along with machine learning can perform as a robust and accurate tool to diagnose chronic DAT. Although this analysis is specifically focused on blood biomarkers and routine clinical data, the results demonstrate promise for future studies combining additional variables in multidimensional space to create robust classifiers.


Assuntos
Apolipoproteínas E/metabolismo , Ácido Caínico/análogos & derivados , Fármacos Neuromusculares Despolarizantes/toxicidade , Proteômica/métodos , Leões-Marinhos/sangue , Animais , Eosinófilos/metabolismo , Feminino , Ácido Caínico/toxicidade , Aprendizado de Máquina , Masculino , Síndromes Neurotóxicas/diagnóstico , Síndromes Neurotóxicas/veterinária , Estudos Retrospectivos , Máquina de Vetores de Suporte
19.
J Biomed Semantics ; 5: 47, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25937882

RESUMO

BACKGROUD: The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinformatics applications to analyse such large dataset is still challenging, as it often requires downloading large archives and parsing the relevant text files. Therefore, it is making it difficult to enable virtual data integration in order to collect the critical co-variates necessary for analysis. METHODS: We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. RESULTS: We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. CONCLUSION: With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and parsing.

20.
Brief Bioinform ; 15(3): 369-75, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24162172

RESUMO

Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.


Assuntos
Biologia Computacional/métodos , Análise de Sequência/métodos , Biologia Computacional/tendências , Fractais , Modelos Estatísticos , Dinâmica não Linear , Alinhamento de Sequência , Análise de Sequência/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA