Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 36(4): 1241-1251, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31584634

RESUMO

MOTIVATION: Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. RESULTS: We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. AVAILABILITY AND IMPLEMENTATION: As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Software , Interações Medicamentosas , Proteínas , Semântica
2.
Epilepsy Behav ; 64(Pt A): 116-121, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27741462

RESUMO

INTRODUCTION: Epilepsy is a common neurological condition. Seizure diary reports and patient- or caregiver-reported seizure counts are often inaccurate and underestimated. Many caregivers express stress and anxiety about the patient with epilepsy having seizures when they are not present. Therefore, a need exists for the ability to recognize and/or detect a seizure in the home setting. However, few studies have inquired on detection device features that are important to patients and their caregivers. METHODS: A survey instrument utilizing a population of patients and caregivers was created to obtain information on the design criteria most desired for patients with epilepsy in regard to wearable devices. RESULTS: One thousand one hundred sixty-eight responses were collected. Respondents thought that sensors for muscle signal (61.4%) and heart rate (58.0%) would be most helpful followed by the O2 sensor (41.4%). There was more interest in these three sensor types than for an accelerometer (25.5%). There was very little interest in a microphone (8.9%), galvanic skin response sensor (8.0%), or a barometer (4.9%). Based on a rating scale of 1-5 with 5 being the most important, respondents felt that "detecting all seizures" (4.73) is the most important device feature followed by "text/email alerts" (4.53), "comfort" (4.46), and "battery life" (4.43) as an equally important group of features. Respondents felt that "not knowing device is for seizures" (2.60) and "multiple uses" (2.57) were equally the least important device features. Average ratings differed significantly across age groups for the following features: button, multiuse, not knowing device is for seizures, alarm, style, and text ability. The p-values were all<0.002. Eighty-two point five percent of respondents [95% confidence interval: 80.0%, 84.7%] were willing to pay more than $100 for a wearable seizure detection device, and 42.8% of respondents [95% confidence interval: 39.8%, 45.9%] were willing to pay more than $200. CONCLUSIONS: Our survey results demonstrated that patients and caregivers have design features that are important to them in regard to a wearable seizure detection device. Overall, the ability to detect all seizures rated highest among respondents which continues to be an unmet need in the community with epilepsy in regard to seizure detection. Additional uses for a wearable were not as important. Based on our results, it is important that an alert (via test and/or email) for events be a portion of the system. A reasonable price point appears to be around $200 to $300. An accelerometer was less important to those surveyed when compared with the use of heart rate, oxygen saturation, or muscle twitches/signals. As further products become developed for use in other health arenas, it will be important to consider patient and caregiver desires in order to meet the need and address the gap in devices that currently exist.


Assuntos
Cuidadores , Desenho de Equipamento/normas , Monitorização Ambulatorial/instrumentação , Monitorização Neurofisiológica/instrumentação , Preferência do Paciente , Convulsões/diagnóstico , Adulto , Humanos , Monitorização Ambulatorial/economia , Monitorização Ambulatorial/normas , Monitorização Neurofisiológica/economia , Monitorização Neurofisiológica/normas
3.
J Med Genet ; 52(4): 282-8, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25587064

RESUMO

BACKGROUND: Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. METHODS: Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). RESULTS: We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. CONCLUSIONS: These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders.


Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Conjuntos de Dados como Assunto , Exoma , Genoma Humano , Humanos , Mutação
4.
Nucleic Acids Res ; 41(Database issue): D553-60, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23197658

RESUMO

Disease and Gene Annotations database (DGA, http://dga.nubic.northwestern.edu) is a collaborative effort aiming to provide a comprehensive and integrative annotation of the human genes in disease network context by integrating computable controlled vocabulary of the Disease Ontology (DO version 3 revision 2510, which has 8043 inherited, developmental and acquired human diseases), NCBI Gene Reference Into Function (GeneRIF) and molecular interaction network (MIN). DGA integrates these resources together using semantic mappings to build an integrative set of disease-to-gene and gene-to-gene relationships with excellent coverage based on current knowledge. DGA is kept current by periodically reparsing DO, GeneRIF, and MINs. DGA provides a user-friendly and interactive web interface system enabling users to efficiently query, download and visualize the DO tree structure and annotations as a tree, a network graph or a tabular list. To facilitate integrative analysis, DGA provides a web service Application Programming Interface for integration with external analytic tools.


Assuntos
Bases de Dados Genéticas , Doença/genética , Genes , Anotação de Sequência Molecular , Humanos , Internet , Proteínas/genética , Proteínas/metabolismo , Vocabulário Controlado
5.
Genet Med ; 15(10): 802-9, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24008998

RESUMO

Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of "big data." The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.


Assuntos
Bases de Dados Factuais , Registros Eletrônicos de Saúde , Testes Genéticos , Informática Médica/normas , Genética Médica , Genômica , Humanos , Armazenamento e Recuperação da Informação , Uso Significativo , Fenótipo , Pesquisa Translacional Biomédica/tendências
7.
BMC Genomics ; 12: 603, 2011 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-22165947

RESUMO

BACKGROUND: Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology. RESULTS: Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses. CONCLUSIONS: The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.


Assuntos
Genômica , Armazenamento e Recuperação da Informação , Internet
8.
JAMA Pediatr ; 175(2): 176-184, 2021 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-33226415

RESUMO

Importance: There is limited information on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) testing and infection among pediatric patients across the United States. Objective: To describe testing for SARS-CoV-2 and the epidemiology of infected patients. Design, Setting, and Participants: A retrospective cohort study was conducted using electronic health record data from 135 794 patients younger than 25 years who were tested for SARS-CoV-2 from January 1 through September 8, 2020. Data were from PEDSnet, a network of 7 US pediatric health systems, comprising 6.5 million patients primarily from 11 states. Data analysis was performed from September 8 to 24, 2020. Exposure: Testing for SARS-CoV-2. Main Outcomes and Measures: SARS-CoV-2 infection and coronavirus disease 2019 (COVID-19) illness. Results: A total of 135 794 pediatric patients (53% male; mean [SD] age, 8.8 [6.7] years; 3% Asian patients, 15% Black patients, 11% Hispanic patients, and 59% White patients; 290 per 10 000 population [range, 155-395 per 10 000 population across health systems]) were tested for SARS-CoV-2, and 5374 (4%) were infected with the virus (12 per 10 000 population [range, 7-16 per 10 000 population]). Compared with White patients, those of Black, Hispanic, and Asian race/ethnicity had lower rates of testing (Black: odds ratio [OR], 0.70 [95% CI, 0.68-0.72]; Hispanic: OR, 0.65 [95% CI, 0.63-0.67]; Asian: OR, 0.60 [95% CI, 0.57-0.63]); however, they were significantly more likely to have positive test results (Black: OR, 2.66 [95% CI, 2.43-2.90]; Hispanic: OR, 3.75 [95% CI, 3.39-4.15]; Asian: OR, 2.04 [95% CI, 1.69-2.48]). Older age (5-11 years: OR, 1.25 [95% CI, 1.13-1.38]; 12-17 years: OR, 1.92 [95% CI, 1.73-2.12]; 18-24 years: OR, 3.51 [95% CI, 3.11-3.97]), public payer (OR, 1.43 [95% CI, 1.31-1.57]), outpatient testing (OR, 2.13 [1.86-2.44]), and emergency department testing (OR, 3.16 [95% CI, 2.72-3.67]) were also associated with increased risk of infection. In univariate analyses, nonmalignant chronic disease was associated with lower likelihood of testing, and preexisting respiratory conditions were associated with lower risk of positive test results (standardized ratio [SR], 0.78 [95% CI, 0.73-0.84]). However, several other diagnosis groups were associated with a higher risk of positive test results: malignant disorders (SR, 1.54 [95% CI, 1.19-1.93]), cardiac disorders (SR, 1.18 [95% CI, 1.05-1.32]), endocrinologic disorders (SR, 1.52 [95% CI, 1.31-1.75]), gastrointestinal disorders (SR, 2.00 [95% CI, 1.04-1.38]), genetic disorders (SR, 1.19 [95% CI, 1.00-1.40]), hematologic disorders (SR, 1.26 [95% CI, 1.06-1.47]), musculoskeletal disorders (SR, 1.18 [95% CI, 1.07-1.30]), mental health disorders (SR, 1.20 [95% CI, 1.10-1.30]), and metabolic disorders (SR, 1.42 [95% CI, 1.24-1.61]). Among the 5374 patients with positive test results, 359 (7%) were hospitalized for respiratory, hypotensive, or COVID-19-specific illness. Of these, 99 (28%) required intensive care unit services, and 33 (9%) required mechanical ventilation. The case fatality rate was 0.2% (8 of 5374). The number of patients with a diagnosis of Kawasaki disease in early 2020 was 40% lower (259 vs 433 and 430) than in 2018 or 2019. Conclusions and Relevance: In this large cohort study of US pediatric patients, SARS-CoV-2 infection rates were low, and clinical manifestations were typically mild. Black, Hispanic, and Asian race/ethnicity; adolescence and young adulthood; and nonrespiratory chronic medical conditions were associated with identified infection. Kawasaki disease diagnosis is not an effective proxy for multisystem inflammatory syndrome of childhood.


Assuntos
Teste para COVID-19/estatística & dados numéricos , COVID-19/diagnóstico , Etnicidade/estatística & dados numéricos , Adolescente , Fatores Etários , COVID-19/epidemiologia , Criança , Pré-Escolar , Estudos de Coortes , Comorbidade , Feminino , Humanos , Masculino , Estudos Retrospectivos , Fatores de Risco , SARS-CoV-2/isolamento & purificação , Fatores Socioeconômicos , Estados Unidos , Adulto Jovem
9.
BMC Bioinformatics ; 11: 587, 2010 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-21118553

RESUMO

BACKGROUND: High-throughput profiling of DNA methylation status of CpG islands is crucial to understand the epigenetic regulation of genes. The microarray-based Infinium methylation assay by Illumina is one platform for low-cost high-throughput methylation profiling. Both Beta-value and M-value statistics have been used as metrics to measure methylation levels. However, there are no detailed studies of their relations and their strengths and limitations. RESULTS: We demonstrate that the relationship between the Beta-value and M-value methods is a Logit transformation, and show that the Beta-value method has severe heteroscedasticity for highly methylated or unmethylated CpG sites. In order to evaluate the performance of the Beta-value and M-value methods for identifying differentially methylated CpG sites, we designed a methylation titration experiment. The evaluation results show that the M-value method provides much better performance in terms of Detection Rate (DR) and True Positive Rate (TPR) for both highly methylated and unmethylated CpG sites. Imposing a minimum threshold of difference can improve the performance of the M-value method but not the Beta-value method. We also provide guidance for how to select the threshold of methylation differences. CONCLUSIONS: The Beta-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels. Therefore, we recommend using the M-value method for conducting differential methylation analysis and including the Beta-value statistics when reporting the results to investigators.


Assuntos
Metilação de DNA , Análise em Microsséries/métodos , Ilhas de CpG , Interpretação Estatística de Dados
10.
BMC Bioinformatics ; 11: 237, 2010 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-20459804

RESUMO

BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. RESULTS: We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. CONCLUSIONS: ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database.


Assuntos
Imunoprecipitação da Cromatina/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Sítios de Ligação , Genoma
11.
Bioinformatics ; 25(12): i63-8, 2009 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-19478018

RESUMO

Subjective methods have been reported to adapt a general-purpose ontology for a specific application. For example, Gene Ontology (GO) Slim was created from GO to generate a highly aggregated report of the human-genome annotation. We propose statistical methods to adapt the general purpose, OBO Foundry Disease Ontology (DO) for the identification of gene-disease associations. Thus, we need a simplified definition of disease categories derived from implicated genes. On the basis of the assumption that the DO terms having similar associated genes are closely related, we group the DO terms based on the similarity of gene-to-DO mapping profiles. Two types of binary distance metrics are defined to measure the overall and subset similarity between DO terms. A compactness-scalable fuzzy clustering method is then applied to group similar DO terms. To reduce false clustering, the semantic similarities between DO terms are also used to constrain clustering results. As such, the DO terms are aggregated and the redundant DO terms are largely removed. Using these methods, we constructed a simplified vocabulary list from the DO called Disease Ontology Lite (DOLite). We demonstrated that DOLite results in more interpretable results than DO for gene-disease association tests. The resultant DOLite has been used in the Functional Disease Ontology (FunDO) Web application at http://www.projects.bioinformatics.northwestern.edu/fundo.


Assuntos
Biologia Computacional/métodos , Doença/genética , Vocabulário Controlado , Interpretação Estatística de Dados , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas/classificação , Genoma , Terminologia como Assunto
12.
Nucleic Acids Res ; 36(2): e11, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18178591

RESUMO

Variance stabilization is a step in the preprocessing of microarray data that can greatly benefit the performance of subsequent statistical modeling and inference. Due to the often limited number of technical replicates for Affymetrix and cDNA arrays, achieving variance stabilization can be difficult. Although the Illumina microarray platform provides a larger number of technical replicates on each array (usually over 30 randomly distributed beads per probe), these replicates have not been leveraged in the current log2 data transformation process. We devised a variance-stabilizing transformation (VST) method that takes advantage of the technical replicates available on an Illumina microarray. We have compared VST with log2 and Variance-stabilizing normalization (VSN) by using the Kruglyak bead-level data (2006) and Barnes titration data (2005). The results of the Kruglyak data suggest that VST stabilizes variances of bead-replicates within an array. The results of the Barnes data show that VST can improve the detection of differentially expressed genes and reduce false-positive identifications. We conclude that although both VST and VSN are built upon the same model of measurement noise, VST stabilizes the variance better and more efficiently for the Illumina platform by leveraging the availability of a larger number of within-array replicates. The algorithms and Supplementary Data are included in the lumi package of Bioconductor, available at: www.bioconductor.org.


Assuntos
Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Reprodutibilidade dos Testes
13.
Adv Exp Med Biol ; 680: 709-15, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20865558

RESUMO

The functions of a gene are traditionally annotated textually using either free text (Gene Reference Into Function or GeneRIF) or controlled vocabularies (e.g., Gene Ontology or Disease Ontology). Inspired by the latest word cloud tools developed by the Information Visualization Group at IBM Research, we have prototyped a visual system for capturing gene annotations, which we named Gene Graph Into Function or GeneGIF. Fully developing the GeneGIF system would be a significant effort. To justify the necessity and to specify the design requirements of GeneGIF, we first surveyed the end-user preferences. From 53 responses, we found that a majority (64%, p < 0.05) of the users were either positive or neutral toward using GeneGIF in their daily work (acceptance); in terms of preference, a slight majority (51%, p > 0.05) of the users favored visual presentation of information (GeneGIF) compared to textual (GeneRIF) information. The results of this study indicate that a visual presentation tool, such as GeneGIF, can complement standard textual presentation of gene annotations. Moreover, the survey participants provided many constructive comments that will specify the development of a phase-two project (http://128.248.174.241/) to visually annotate each gene in the human genome.


Assuntos
Gráficos por Computador , Anotação de Sequência Molecular/estatística & dados numéricos , Algoritmos , Biologia Computacional , Coleta de Dados , Bases de Dados Genéticas , Humanos , Software , Interface Usuário-Computador , Vocabulário Controlado
14.
Front Public Health ; 8: 58, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32181236

RESUMO

Background: Previous studies revealed patients with genetic disease have more frequent and longer hospitalizations and therefore higher healthcare costs. To understand the financial impact of genetic disease on a pediatric accountable care organization (ACO), we analyzed medical claims from 2014 provided by Partners for Kids, an ACO in partnership with Nationwide Children's Hospital (NCH; Columbus, OH, USA). Methods: Study population included insurance claims from 258,399 children. We assigned patients to four different categories (1-A, 1-B, 2, & 3) based on the strength of genetic basis of disease. Results: We identified 22.7% of patients as category 1A or 1B- having a disease with a "strong genetic basis" (e.g., single gene diseases, chromosomal abnormalities). Total ACO paid claims in 2014 were $379M, of which $161M (42.5%) was attributed to category 1 patients. Furthermore, we identified 23.3% of patients as category 2- having a disease with a suspected genetic component or predisposition (e.g., asthma, type 1 diabetes)- whom accounted for an additional 28.6% of 2014 costs. Category 1 patients were more likely to experience at least one hospitalization compared to category 3 patients- those without genetic disease [odds ratio [OR] = 4.12; 95% confidence interval [CI] = 3.86-4.39; p < 0.0001]. Overall, category 1 patients experienced nearly five times the number of inpatient (IP) admissions and twice the number of outpatient (OP) visits compared to category 3 patients (p < 0.0001). Conclusion: Nearly half (42.5%) of healthcare paid claims cost in 2014 for this study population were accounted for by patients with single-gene diseases or chromosomal abnormalities. These findings precede and support a need for an ACO to plan for effective healthcare strategies and capitation models for children with genetic disease.


Assuntos
Organizações de Assistência Responsáveis , Asma , Criança , Custos de Cuidados de Saúde , Hospitalização , Humanos , Estudos Retrospectivos
15.
BMC Genomics ; 10 Suppl 1: S6, 2009 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-19594883

RESUMO

BACKGROUND: The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. RESULTS: We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. CONCLUSION: The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Software , Unified Medical Language System , Biologia Computacional/métodos , Humanos
16.
Bioinformatics ; 24(13): 1547-8, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18467348

RESUMO

UNLABELLED: Illumina microarray is becoming a popular microarray platform. The BeadArray technology from Illumina makes its preprocessing and quality control different from other microarray technologies. Unfortunately, most other analyses have not taken advantage of the unique properties of the BeadArray system, and have just incorporated preprocessing methods originally designed for Affymetrix microarrays. lumi is a Bioconductor package especially designed to process the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene annotation portions. In specific, the lumi package includes a variance-stabilizing transformation (VST) algorithm that takes advantage of the technical replicates available on every Illumina microarray. Different normalization method options and multiple quality control plots are provided in the package. To better annotate the Illumina data, a vendor independent nucleotide universal identifier (nuID) was devised to identify the probes of Illumina microarray. The nuID annotation packages and output of lumi processed results can be easily integrated with other Bioconductor packages to construct a statistical data analysis pipeline for Illumina data. AVAILABILITY: The lumi Bioconductor package, www.bioconductor.org


Assuntos
Algoritmos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Perfilação da Expressão Gênica/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Integração de Sistemas
17.
Am J Manag Care ; 25(10): e310-e315, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31622071

RESUMO

OBJECTIVES: Current models for patient risk prediction rely on practitioner expertise and domain knowledge. This study presents a deep learning model-a type of machine learning that does not require human inputs-to analyze complex clinical and financial data for population risk stratification. STUDY DESIGN: A comparative predictive analysis of deep learning versus other popular risk prediction modeling strategies using medical claims data from a cohort of 112,641 pediatric accountable care organization members. METHODS: "Skip-Gram," an unsupervised deep learning approach that uses neural networks for prediction modeling, used data from 2014 and 2015 to predict the risk of hospitalization in 2016. The area under the curve (AUC) of the deep learning model was compared with that of both the Clinical Classifications Software and the commercial DxCG Intelligence predictive risk models, each with and without demographic and utilization features. We then calculated costs for patients in the top 1% and 5% of hospitalization risk identified by each model. RESULTS: The deep learning model performed the best across 6 predictive models, with an AUC of 75.1%. The top 1% of members selected by the deep learning model had a combined healthcare cost $5 million higher than that of the group identified by the DxCG Intelligence model. CONCLUSIONS: The deep learning model outperforms the traditional risk models in prospective hospitalization prediction. Thus, deep learning may improve the ability of managed care organizations to perform predictive modeling of financial risk, in addition to improving the accuracy of risk stratification for population health management activities.


Assuntos
Organizações de Assistência Responsáveis/estatística & dados numéricos , Aprendizado Profundo , Serviços de Saúde/estatística & dados numéricos , Fatores Etários , Criança , Recursos em Saúde , Humanos , Redes Neurais de Computação , Estudos Prospectivos , Reprodutibilidade dos Testes , Características de Residência , Medição de Risco , Fatores de Risco , Fatores Sexuais , Fatores Socioeconômicos
18.
Bioinformatics ; 22(17): 2059-65, 2006 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-16820428

RESUMO

MOTIVATION: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.


Assuntos
Algoritmos , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Mapeamento de Peptídeos/métodos , Proteoma/análise , Processamento de Sinais Assistido por Computador , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Controle de Qualidade , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
20.
Methods Mol Biol ; 377: 223-42, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17634620

RESUMO

Methods are described to take a list of genes generated from a microarray experiment and interpret these results using various tools and ontologies. A workflow is described that details how to convert gene identifiers with SOURCE and MatchMiner and then use these converted gene lists to search the gene ontology (GO) and the medical subject headings (MeSH) ontology. Examples of searching GO with DAVID, EASE, and GOMiner are provided along with an interpretation of results. The mining of MeSH using high-density array pattern interpreter with a set of gene identifiers is also described.


Assuntos
Genes , Medical Subject Headings , Biologia Molecular/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise por Conglomerados , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA