Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Immunity ; 43(6): 1199-211, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26682989

RESUMO

Respiratory viral infections are a significant burden to healthcare worldwide. Many whole genome expression profiles have identified different respiratory viral infection signatures, but these have not translated to clinical practice. Here, we performed two integrated, multi-cohort analyses of publicly available transcriptional data of viral infections. First, we identified a common host signature across different respiratory viral infections that could distinguish (1) individuals with viral infections from healthy controls and from those with bacterial infections, and (2) symptomatic from asymptomatic subjects prior to symptom onset in challenge studies. Second, we identified an influenza-specific host response signature that (1) could distinguish influenza-infected samples from those with bacterial and other respiratory viral infections, (2) was a diagnostic and prognostic marker in influenza-pneumonia patients and influenza challenge studies, and (3) was predictive of response to influenza vaccine. Our results have applications in the diagnosis, prognosis, and identification of drug targets in viral infections.


Assuntos
Infecções Respiratórias/diagnóstico , Infecções Respiratórias/genética , Transcriptoma , Viroses/diagnóstico , Viroses/genética , Estudos de Coortes , Conjuntos de Dados como Assunto , Humanos
2.
Pac Symp Biocomput ; 29: 433-445, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38160297

RESUMO

The incompleteness of race and ethnicity information in real-world data (RWD) hampers its utility in promoting healthcare equity. This study introduces two methods-one heuristic and the other machine learning-based-to impute race and ethnicity from genetic ancestry using tumor profiling data. Analyzing de-identified data from over 100,000 cancer patients sequenced with the Tempus xT panel, we demonstrate that both methods outperform existing geolocation and surname-based methods, with the machine learning approach achieving high recall (range: 0.859-0.993) and precision (range: 0.932-0.981) across four mutually exclusive race and ethnicity categories. This work presents a novel pathway to enhance RWD utility in studying racial disparities in healthcare.


Assuntos
Etnicidade , Nomes , Humanos , Etnicidade/genética , Grupos Raciais/genética , Biologia Computacional , Testes Genéticos
3.
Genome Med ; 16(1): 99, 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39138508

RESUMO

BACKGROUND: There are known disparities in incidence and outcomes of colorectal cancer (CRC) by race and ethnicity. Some of these disparities may be mediated by molecular changes in tumors that occur at different rates across populations. Genetic ancestry is a measure complementary to race and ethnicity that can overcome missing data issues and better capture genetic similarity in admixed populations. We aimed to identify somatic mutations and tumor gene expression differences associated with both genetic ancestry and imputed race and ethnicity. METHODS: Sequencing was performed with the Tempus xT NGS 648-gene panel and whole exome capture RNA-Seq for 8454 primarily late-stage CRC patients. Genetic ancestry proportions for five continental groups-Africa (AFR), American indigenous (AMR), East Asia (EAS), Europe (EUR), and South Asia (SAS)-were estimated using ancestry informative markers. To address data gaps, race and ethnicity categories were imputed, resulting in assignments for 952 Hispanic/Latino, 420 non-Hispanic (NH) Asian, 1061 NH Black, and 5763 NH White individuals. We assessed association of genetic ancestry proportions and imputed race and ethnicity categories with somatic mutations in relevant CRC genes and in 2608 expression profiles, as well as 1957 consensus molecular subtypes (CMS). RESULTS: Increased AFR ancestry was associated with higher odds of somatic mutations in APC, KRAS, and PIK3CA and lower odds of BRAF mutations. Additionally, increased EAS ancestry was associated with lower odds of mutations in KRAS, EUR with higher odds in BRAF, and the Hispanic/Latino category with lower odds in BRAF. Greater AFR ancestry and the NH Black category were associated with higher rates of CMS3, while a higher proportion of Hispanic/Latino patients exhibited indeterminate CMS classifications. CONCLUSIONS: Molecular differences in CRC tumor mutation frequencies and gene expression that may underlie observed differences by race and ethnicity were identified. The association of AFR ancestry with increased KRAS mutations aligns with higher CMS3 subtype rates in NH Black patients. The increase of indeterminate CMS in Hispanic/Latino patients suggests that subtype classification methods could benefit from enhanced patient diversity.


Assuntos
Neoplasias Colorretais , Mutação , Humanos , Neoplasias Colorretais/genética , Masculino , Feminino , Pessoa de Meia-Idade , Proteínas Proto-Oncogênicas B-raf/genética , Idoso , Classe I de Fosfatidilinositol 3-Quinases/genética , Proteínas Proto-Oncogênicas p21(ras)/genética , Biomarcadores Tumorais/genética , Proteína da Polipose Adenomatosa do Colo/genética
4.
Bioinform Adv ; 3(1): vbad062, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37416509

RESUMO

Summary: RNA sequencing (RNA-seq) can be applied to diverse tasks including quantifying gene expression, discovering quantitative trait loci and identifying gene fusion events. Although RNA-seq can detect germline variants, the complexities of variable transcript abundance, target capture and amplification introduce challenging sources of error. Here, we extend DeepVariant, a deep-learning-based variant caller, to learn and account for the unique challenges presented by RNA-seq data. Our DeepVariant RNA-seq model produces highly accurate variant calls from RNA-sequencing data, and outperforms existing approaches such as Platypus and GATK. We examine factors that influence accuracy, how our model addresses RNA editing events and how additional thresholding can be used to facilitate our models' use in a production pipeline. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

5.
NPJ Genom Med ; 3: 2, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29354287

RESUMO

Next-generation deep sequencing of gene panels is being adopted as a diagnostic test to identify actionable mutations in cancer patient samples. However, clinical samples, such as formalin-fixed, paraffin-embedded specimens, frequently provide low quantities of degraded, poor quality DNA. To overcome these issues, many sequencing assays rely on extensive PCR amplification leading to an accumulation of bias and artifacts. Thus, there is a need for a targeted sequencing assay that performs well with DNA of low quality and quantity without relying on extensive PCR amplification. We evaluate the performance of a targeted sequencing assay based on Oligonucleotide Selective Sequencing, which permits the enrichment of genes and regions of interest and the identification of sequence variants from low amounts of damaged DNA. This assay utilizes a repair process adapted to clinical FFPE samples, followed by adaptor ligation to single stranded DNA and a primer-based capture technique. Our approach generates sequence libraries of high fidelity with reduced reliance on extensive PCR amplification-this facilitates the accurate assessment of copy number alterations in addition to delivering accurate single nucleotide variant and insertion/deletion detection. We apply this method to capture and sequence the exons of a panel of 130 cancer-related genes, from which we obtain high read coverage uniformity across the targeted regions at starting input DNA amounts as low as 10 ng per sample. We demonstrate the performance using a series of reference DNA samples, and by identifying sequence variants in DNA from matched clinical samples originating from different tissue types.

6.
BMC Bioinformatics ; 8: 244, 2007 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-17623104

RESUMO

BACKGROUND: Using computational database searches, we have demonstrated previously that no gene sequences could be found for at least 36% of enzyme activities that have been assigned an Enzyme Commission number. Here we present a follow-up literature-based survey involving a statistically significant sample of such "orphan" activities. The survey was intended to determine whether sequences for these enzyme activities are truly unknown, or whether these sequences are absent from the public sequence databases but can be found in the literature. RESULTS: We demonstrate that for ~80% of sampled orphans, the absence of sequence data is bona fide. Our analyses further substantiate the notion that many of these enzyme activities play biologically important roles. CONCLUSION: This survey points toward significant scientific cost of having such a large fraction of characterized enzyme activities disconnected from sequence data. It also suggests that a larger effort, beginning with a comprehensive survey of all putative orphan activities, would resolve nearly 300 artifactual orphans and reconnect a wealth of enzyme research with modern genomics. For these reasons, we propose that a systematic effort to identify the cognate genes of orphan enzymes be undertaken.


Assuntos
Biologia Computacional/métodos , Coleta de Dados , Enzimas/classificação , Enzimas/genética , Bases de Dados Factuais , Bases de Dados Genéticas , Bases de Dados de Proteínas , Enzimas/metabolismo , Genômica , Proteômica , Reprodutibilidade dos Testes , Especificidade da Espécie
7.
BMC Bioinformatics ; 7: 170, 2006 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-16556315

RESUMO

BACKGROUND: This article addresses the problem of interoperation of heterogeneous bioinformatics databases. RESULTS: We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. CONCLUSION: BioWarehouse embodies significant progress on the database integration problem for bioinformatics.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Bases de Dados Genéticas , Bases de Dados de Proteínas , Engenharia de Proteínas/métodos , Proteínas/química , Proteínas/genética , Semântica , Transdução de Sinais/genética , Software
9.
Arthritis Res Ther ; 17: 262, 2015 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-26387933

RESUMO

INTRODUCTION: In the present study, we sought to identify markers in patients with anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (AAV) that distinguish those achieving remission at 6 months following rituximab or cyclophosphamide treatment from those for whom treatment failed in the Rituximab in ANCA-Associated Vasculitis (RAVE) trial. METHODS: Clinical and flow cytometry data from the RAVE trial were downloaded from the Immunology Database and Analysis Portal and Immune Tolerance Network TrialShare public repositories. Flow cytometry data were analyzed using validated automated gating and joined with clinical data. Lymphocyte and granulocyte populations were measured in patients who achieved or failed to achieve remission. RESULTS: There was no difference in lymphocyte subsets and treatment outcome with either treatment. We defined a Granularity Index (GI) that measures the difference between the percentage of hypergranular and hypogranular granulocytes. We found that rituximab-treated patients who achieved remission had a significantly higher GI at baseline than those who did not (p = 0.0085) and that this pattern was reversed in cyclophosphamide-treated patients (p = 0.037). We defined optimal cutoff values of the GI using the Youden index. Cyclophosphamide was superior to rituximab in inducing remission in patients with GI below -9.25% (67% vs. 30%, respectively; p = 0.033), whereas rituximab was superior to cyclophosphamide for patients with GI greater than 47.6% (83% vs. 33%, respectively; p = 0.0002). CONCLUSIONS: We identified distinct subsets of granulocytes found at baseline in patients with AAV that predicted whether they were more likely to achieve remission with cyclophosphamide or rituximab. Profiling patients on the basis of the GI may lead to more successful trials and therapeutic courses in AAV. TRIAL REGISTRATION: ClinicalTrials.gov identifier (for original study from which data were obtained): NCT00104299 . Date of registration: 24 February 2005.


Assuntos
Vasculite Associada a Anticorpo Anticitoplasma de Neutrófilos/tratamento farmacológico , Vasculite Associada a Anticorpo Anticitoplasma de Neutrófilos/imunologia , Biomarcadores/sangue , Granulócitos/imunologia , Fatores Imunológicos/uso terapêutico , Rituximab/uso terapêutico , Feminino , Citometria de Fluxo , Humanos , Masculino , Pessoa de Meia-Idade , Indução de Remissão , Resultado do Tratamento
10.
J Am Med Inform Assoc ; 22(6): 1148-52, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26112029

RESUMO

The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.


Assuntos
Pesquisa Biomédica , Mineração de Dados , Conjuntos de Dados como Assunto , Ontologias Biológicas , Humanos , Armazenamento e Recuperação da Informação , Estados Unidos
11.
Genome Med ; 2(8): 51, 2010 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-20691073

RESUMO

With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA