Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
1.
JAMA Pediatr ; 175(2): 176-184, 2021 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-33226415

RESUMEN

Importance: There is limited information on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) testing and infection among pediatric patients across the United States. Objective: To describe testing for SARS-CoV-2 and the epidemiology of infected patients. Design, Setting, and Participants: A retrospective cohort study was conducted using electronic health record data from 135 794 patients younger than 25 years who were tested for SARS-CoV-2 from January 1 through September 8, 2020. Data were from PEDSnet, a network of 7 US pediatric health systems, comprising 6.5 million patients primarily from 11 states. Data analysis was performed from September 8 to 24, 2020. Exposure: Testing for SARS-CoV-2. Main Outcomes and Measures: SARS-CoV-2 infection and coronavirus disease 2019 (COVID-19) illness. Results: A total of 135 794 pediatric patients (53% male; mean [SD] age, 8.8 [6.7] years; 3% Asian patients, 15% Black patients, 11% Hispanic patients, and 59% White patients; 290 per 10 000 population [range, 155-395 per 10 000 population across health systems]) were tested for SARS-CoV-2, and 5374 (4%) were infected with the virus (12 per 10 000 population [range, 7-16 per 10 000 population]). Compared with White patients, those of Black, Hispanic, and Asian race/ethnicity had lower rates of testing (Black: odds ratio [OR], 0.70 [95% CI, 0.68-0.72]; Hispanic: OR, 0.65 [95% CI, 0.63-0.67]; Asian: OR, 0.60 [95% CI, 0.57-0.63]); however, they were significantly more likely to have positive test results (Black: OR, 2.66 [95% CI, 2.43-2.90]; Hispanic: OR, 3.75 [95% CI, 3.39-4.15]; Asian: OR, 2.04 [95% CI, 1.69-2.48]). Older age (5-11 years: OR, 1.25 [95% CI, 1.13-1.38]; 12-17 years: OR, 1.92 [95% CI, 1.73-2.12]; 18-24 years: OR, 3.51 [95% CI, 3.11-3.97]), public payer (OR, 1.43 [95% CI, 1.31-1.57]), outpatient testing (OR, 2.13 [1.86-2.44]), and emergency department testing (OR, 3.16 [95% CI, 2.72-3.67]) were also associated with increased risk of infection. In univariate analyses, nonmalignant chronic disease was associated with lower likelihood of testing, and preexisting respiratory conditions were associated with lower risk of positive test results (standardized ratio [SR], 0.78 [95% CI, 0.73-0.84]). However, several other diagnosis groups were associated with a higher risk of positive test results: malignant disorders (SR, 1.54 [95% CI, 1.19-1.93]), cardiac disorders (SR, 1.18 [95% CI, 1.05-1.32]), endocrinologic disorders (SR, 1.52 [95% CI, 1.31-1.75]), gastrointestinal disorders (SR, 2.00 [95% CI, 1.04-1.38]), genetic disorders (SR, 1.19 [95% CI, 1.00-1.40]), hematologic disorders (SR, 1.26 [95% CI, 1.06-1.47]), musculoskeletal disorders (SR, 1.18 [95% CI, 1.07-1.30]), mental health disorders (SR, 1.20 [95% CI, 1.10-1.30]), and metabolic disorders (SR, 1.42 [95% CI, 1.24-1.61]). Among the 5374 patients with positive test results, 359 (7%) were hospitalized for respiratory, hypotensive, or COVID-19-specific illness. Of these, 99 (28%) required intensive care unit services, and 33 (9%) required mechanical ventilation. The case fatality rate was 0.2% (8 of 5374). The number of patients with a diagnosis of Kawasaki disease in early 2020 was 40% lower (259 vs 433 and 430) than in 2018 or 2019. Conclusions and Relevance: In this large cohort study of US pediatric patients, SARS-CoV-2 infection rates were low, and clinical manifestations were typically mild. Black, Hispanic, and Asian race/ethnicity; adolescence and young adulthood; and nonrespiratory chronic medical conditions were associated with identified infection. Kawasaki disease diagnosis is not an effective proxy for multisystem inflammatory syndrome of childhood.


Asunto(s)
Prueba de COVID-19/estadística & datos numéricos , COVID-19/diagnóstico , Etnicidad/estadística & datos numéricos , Adolescente , Factores de Edad , COVID-19/epidemiología , Niño , Preescolar , Estudios de Cohortes , Comorbilidad , Femenino , Humanos , Masculino , Estudios Retrospectivos , Factores de Riesgo , SARS-CoV-2/aislamiento & purificación , Factores Socioeconómicos , Estados Unidos , Adulto Joven
2.
Front Public Health ; 8: 58, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32181236

RESUMEN

Background: Previous studies revealed patients with genetic disease have more frequent and longer hospitalizations and therefore higher healthcare costs. To understand the financial impact of genetic disease on a pediatric accountable care organization (ACO), we analyzed medical claims from 2014 provided by Partners for Kids, an ACO in partnership with Nationwide Children's Hospital (NCH; Columbus, OH, USA). Methods: Study population included insurance claims from 258,399 children. We assigned patients to four different categories (1-A, 1-B, 2, & 3) based on the strength of genetic basis of disease. Results: We identified 22.7% of patients as category 1A or 1B- having a disease with a "strong genetic basis" (e.g., single gene diseases, chromosomal abnormalities). Total ACO paid claims in 2014 were $379M, of which $161M (42.5%) was attributed to category 1 patients. Furthermore, we identified 23.3% of patients as category 2- having a disease with a suspected genetic component or predisposition (e.g., asthma, type 1 diabetes)- whom accounted for an additional 28.6% of 2014 costs. Category 1 patients were more likely to experience at least one hospitalization compared to category 3 patients- those without genetic disease [odds ratio [OR] = 4.12; 95% confidence interval [CI] = 3.86-4.39; p < 0.0001]. Overall, category 1 patients experienced nearly five times the number of inpatient (IP) admissions and twice the number of outpatient (OP) visits compared to category 3 patients (p < 0.0001). Conclusion: Nearly half (42.5%) of healthcare paid claims cost in 2014 for this study population were accounted for by patients with single-gene diseases or chromosomal abnormalities. These findings precede and support a need for an ACO to plan for effective healthcare strategies and capitation models for children with genetic disease.


Asunto(s)
Organizaciones Responsables por la Atención , Asma , Niño , Costos de la Atención en Salud , Hospitalización , Humanos , Estudios Retrospectivos
3.
Bioinformatics ; 36(4): 1241-1251, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31584634

RESUMEN

MOTIVATION: Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. RESULTS: We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. AVAILABILITY AND IMPLEMENTATION: As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Neurales de la Computación , Programas Informáticos , Interacciones Farmacológicas , Proteínas , Semántica
4.
Am J Manag Care ; 25(10): e310-e315, 2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-31622071

RESUMEN

OBJECTIVES: Current models for patient risk prediction rely on practitioner expertise and domain knowledge. This study presents a deep learning model-a type of machine learning that does not require human inputs-to analyze complex clinical and financial data for population risk stratification. STUDY DESIGN: A comparative predictive analysis of deep learning versus other popular risk prediction modeling strategies using medical claims data from a cohort of 112,641 pediatric accountable care organization members. METHODS: "Skip-Gram," an unsupervised deep learning approach that uses neural networks for prediction modeling, used data from 2014 and 2015 to predict the risk of hospitalization in 2016. The area under the curve (AUC) of the deep learning model was compared with that of both the Clinical Classifications Software and the commercial DxCG Intelligence predictive risk models, each with and without demographic and utilization features. We then calculated costs for patients in the top 1% and 5% of hospitalization risk identified by each model. RESULTS: The deep learning model performed the best across 6 predictive models, with an AUC of 75.1%. The top 1% of members selected by the deep learning model had a combined healthcare cost $5 million higher than that of the group identified by the DxCG Intelligence model. CONCLUSIONS: The deep learning model outperforms the traditional risk models in prospective hospitalization prediction. Thus, deep learning may improve the ability of managed care organizations to perform predictive modeling of financial risk, in addition to improving the accuracy of risk stratification for population health management activities.


Asunto(s)
Organizaciones Responsables por la Atención/estadística & datos numéricos , Aprendizaje Profundo , Servicios de Salud/estadística & datos numéricos , Factores de Edad , Niño , Recursos en Salud , Humanos , Redes Neurales de la Computación , Estudios Prospectivos , Reproducibilidad de los Resultados , Características de la Residencia , Medición de Riesgo , Factores de Riesgo , Factores Sexuales , Factores Socioeconómicos
5.
6.
Epilepsy Behav ; 64(Pt A): 116-121, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27741462

RESUMEN

INTRODUCTION: Epilepsy is a common neurological condition. Seizure diary reports and patient- or caregiver-reported seizure counts are often inaccurate and underestimated. Many caregivers express stress and anxiety about the patient with epilepsy having seizures when they are not present. Therefore, a need exists for the ability to recognize and/or detect a seizure in the home setting. However, few studies have inquired on detection device features that are important to patients and their caregivers. METHODS: A survey instrument utilizing a population of patients and caregivers was created to obtain information on the design criteria most desired for patients with epilepsy in regard to wearable devices. RESULTS: One thousand one hundred sixty-eight responses were collected. Respondents thought that sensors for muscle signal (61.4%) and heart rate (58.0%) would be most helpful followed by the O2 sensor (41.4%). There was more interest in these three sensor types than for an accelerometer (25.5%). There was very little interest in a microphone (8.9%), galvanic skin response sensor (8.0%), or a barometer (4.9%). Based on a rating scale of 1-5 with 5 being the most important, respondents felt that "detecting all seizures" (4.73) is the most important device feature followed by "text/email alerts" (4.53), "comfort" (4.46), and "battery life" (4.43) as an equally important group of features. Respondents felt that "not knowing device is for seizures" (2.60) and "multiple uses" (2.57) were equally the least important device features. Average ratings differed significantly across age groups for the following features: button, multiuse, not knowing device is for seizures, alarm, style, and text ability. The p-values were all<0.002. Eighty-two point five percent of respondents [95% confidence interval: 80.0%, 84.7%] were willing to pay more than $100 for a wearable seizure detection device, and 42.8% of respondents [95% confidence interval: 39.8%, 45.9%] were willing to pay more than $200. CONCLUSIONS: Our survey results demonstrated that patients and caregivers have design features that are important to them in regard to a wearable seizure detection device. Overall, the ability to detect all seizures rated highest among respondents which continues to be an unmet need in the community with epilepsy in regard to seizure detection. Additional uses for a wearable were not as important. Based on our results, it is important that an alert (via test and/or email) for events be a portion of the system. A reasonable price point appears to be around $200 to $300. An accelerometer was less important to those surveyed when compared with the use of heart rate, oxygen saturation, or muscle twitches/signals. As further products become developed for use in other health arenas, it will be important to consider patient and caregiver desires in order to meet the need and address the gap in devices that currently exist.


Asunto(s)
Cuidadores , Diseño de Equipo/normas , Monitoreo Ambulatorio/instrumentación , Monitorización Neurofisiológica/instrumentación , Prioridad del Paciente , Convulsiones/diagnóstico , Adulto , Humanos , Monitoreo Ambulatorio/economía , Monitoreo Ambulatorio/normas , Monitorización Neurofisiológica/economía , Monitorización Neurofisiológica/normas
7.
Artículo en Inglés | MEDLINE | ID: mdl-27189611

RESUMEN

The process of discovering new drugs has been extremely costly and slow in the last decades despite enormous investment in pharmaceutical research. Drug repurposing enables researchers to speed up the process of discovering other conditions that existing drugs can effectively treat, with low cost and fast FDA approval. Here, we introduce 'RE:fine Drugs', a freely available interactive website for integrated search and discovery of drug repurposing candidates from GWAS and PheWAS repurposing datasets constructed using previously reported methods in Nature Biotechnology. 'RE:fine Drugs' demonstrates the possibilities to identify and prioritize novelty of candidates for drug repurposing based on the theory of transitive Drug-Gene-Disease triads. This public website provides a starting point for research, industry, clinical and regulatory communities to accelerate the investigation and validation of new therapeutic use of old drugs.Database URL: http://drug-repurposing.nationwidechildrens.org.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Farmacéuticas , Reposicionamiento de Medicamentos , Interfaz Usuario-Computador , Quimioterapia , Humanos
8.
Comput Struct Biotechnol J ; 14: 131-4, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27069559

RESUMEN

New vocabularies are rapidly evolving in the literature relative to the practice of clinical medicine and translational research. To provide integrated access to new terms, we developed a mobile and desktop online reference-Marshfield Dictionary of Clinical and Translational Science (MD-CTS). It is the first public resource that comprehensively integrates Wiktionary (word definition), BioPortal (ontology), Wiki (image reference), and Medline abstract (word usage) information. MD-CTS is accessible at http://spellchecker.mfldclin.edu/. The website provides a broadened capacity for the wider clinical and translational science community to keep pace with newly emerging scientific vocabulary. An initial evaluation using 63 randomly selected biomedical words suggests that online references generally provided better coverage (73%-95%) than paper-based dictionaries (57-71%).

9.
Artículo en Inglés | MEDLINE | ID: mdl-26702340

RESUMEN

The constant improvement and falling prices of whole human genome Next Generation Sequencing (NGS) has resulted in rapid adoption of genomic information at both clinics and research institutions. Considered together, the complexity of genomics data, due to its large volume and diversity along with the need for genomic data sharing, has resulted in the creation of Application Programming Interface (API) for secure, modular, interoperable access to genomic data from different applications, platforms, and even organizations. The Genomics APIs are a set of special protocols that assist software developers in dealing with multiple genomic data sources for building seamless, interoperable applications leading to the advancement of both genomic and clinical research. These APIs help define a standard for retrieval of genomic data from multiple sources as well as to better package genomic information for integration with Electronic Health Records. This review covers three currently available Genomics APIs: a) Google Genomics, b) SMART Genomics, and c) 23andMe. The functionalities, reference implementations (if available) and authentication protocols of each API are reviewed. A comparative analysis of the different features across the three APIs is provided in the Discussion section. Though Genomics APIs are still under active development and have yet to reach widespread adoption, they hold the promise to make building of complicated genomics applications easier with downstream constructive effects on healthcare.

10.
Genome Biol ; 16: 133, 2015 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-26109056

RESUMEN

BACKGROUND: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. RESULTS: We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. CONCLUSIONS: We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.


Asunto(s)
Perfilación de la Expresión Génica , Neuroblastoma/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ARN , Adolescente , Adulto , Niño , Preescolar , Determinación de Punto Final , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Modelos Genéticos , Neuroblastoma/clasificación , Neuroblastoma/diagnóstico , Células Tumorales Cultivadas , Adulto Joven
12.
J Med Genet ; 52(4): 282-8, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25587064

RESUMEN

BACKGROUND: Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. METHODS: Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). RESULTS: We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. CONCLUSIONS: These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders.


Asunto(s)
Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Conjuntos de Datos como Asunto , Exoma , Genoma Humano , Humanos , Mutación
13.
Nat Commun ; 5: 5125, 2014 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-25254650

RESUMEN

There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard 'dashboard' of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols.


Asunto(s)
Perfilación de la Expresión Génica/métodos , ARN Mensajero/genética , Perfilación de la Expresión Génica/normas , Humanos , Estándares de Referencia , Reproducibilidad de los Resultados
14.
Biomed Res Int ; 2014: 736798, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24995327

RESUMEN

NAGNAG alternative splicing plays an essential role in biological processes and represents a highly adaptable system for posttranslational regulation of gene function. NAGNAG alternative splicing impacts a myriad of biological processes. Previous studies of NAGNAG largely focused on messenger RNA. To the best of our knowledge, this is the first study testing the hypothesis that NAGNAG alternative splicing is also operative in large intergenic noncoding RNA (lincRNA). The RNA-seq data sets from recent deep sequencing studies were queried to test our hypothesis. NAGNAG alternative splicing of human lincRNA was identified while querying two independent RNA-seq data sets. Within these datasets, 31 NAGNAG alternative splicing sites were identified in lincRNA. Notably, most exons of lincRNA containing NAGNAG acceptors were longer than those from protein-coding genes. Furthermore, presence of CAG coding appeared to participate in the splice site selection. Finally, expression of the isoforms of NAGNAG lincRNA exhibited tissue specificity. Together, this study improves our understanding of the NAGNAG alternative splicing in lincRNA.


Asunto(s)
Empalme Alternativo/genética , Biología Computacional , ARN Largo no Codificante/genética , Análisis de Secuencia de ARN , Genoma Humano , Humanos , Especificidad de Órganos , ARN Mensajero/genética
16.
Genet Med ; 15(10): 802-9, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24008998

RESUMEN

Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of "big data." The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.


Asunto(s)
Bases de Datos Factuales , Registros Electrónicos de Salud , Pruebas Genéticas , Informática Médica/normas , Genética Médica , Genómica , Humanos , Almacenamiento y Recuperación de la Información , Uso Significativo , Fenotipo , Investigación Biomédica Traslacional/tendencias
17.
Nucleic Acids Res ; 41(Database issue): D553-60, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23197658

RESUMEN

Disease and Gene Annotations database (DGA, http://dga.nubic.northwestern.edu) is a collaborative effort aiming to provide a comprehensive and integrative annotation of the human genes in disease network context by integrating computable controlled vocabulary of the Disease Ontology (DO version 3 revision 2510, which has 8043 inherited, developmental and acquired human diseases), NCBI Gene Reference Into Function (GeneRIF) and molecular interaction network (MIN). DGA integrates these resources together using semantic mappings to build an integrative set of disease-to-gene and gene-to-gene relationships with excellent coverage based on current knowledge. DGA is kept current by periodically reparsing DO, GeneRIF, and MINs. DGA provides a user-friendly and interactive web interface system enabling users to efficiently query, download and visualize the DO tree structure and annotations as a tree, a network graph or a tabular list. To facilitate integrative analysis, DGA provides a web service Application Programming Interface for integration with external analytic tools.


Asunto(s)
Bases de Datos Genéticas , Enfermedad/genética , Genes , Anotación de Secuencia Molecular , Humanos , Internet , Proteínas/genética , Proteínas/metabolismo , Vocabulario Controlado
18.
PLoS One ; 7(12): e49686, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23251346

RESUMEN

Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Humano , Predisposición Genética a la Enfermedad , Humanos , Programas Informáticos , Vocabulario Controlado
19.
PLoS One ; 7(10): e46450, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23049702

RESUMEN

The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.


Asunto(s)
Algoritmos , Clasificación/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Modelos Genéticos , Programas Informáticos , Biología Computacional/métodos , Especificidad de la Especie
20.
PLoS One ; 7(9): e44483, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22970228

RESUMEN

During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Genotipo , Humanos , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...