Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
J Comput Soc Sci ; 6(1): 165-190, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38249661

RESUMO

The Flint Water Crisis (FWC) was an avoidable public health disaster that has profoundly affected the city's residents, a majority of whom are Black. Although many scholars and journalists have called attention to the role of racism in the water crisis, little is known about the extent to which the public attributed the FWC to racism as it was unfolding. In this study, we used natural language processing to analyze nearly six million Flint-related tweets posted between April 1, 2014, and June 1, 2016. We found that key developments in the FWC corresponded to increases in the number and percentage of tweets that mentioned terms related to race and racism. Similar patterns were found for other topics hypothesized to be related to the water crisis, including water and politics. Using sentiment analysis, we found that tweets with a negative polarity score were more common in the subset of tweets that mentioned terms related to race and racism when compared to the full set of tweets. Next, we found that word pairs that included terms related to race and racism first appeared after the January 2016 state and federal emergency declarations and a corresponding increase in media coverage of the FWC. We conclude that many Twitter users connected the events of the water crisis to race and racism in real-time. Given growing evidence of negative health effects of second-hand exposure to racism, this may have implications for understanding minority health and health disparities in the US.

2.
Front Artif Intell ; 5: 952424, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36034596

RESUMO

Food samples are routinely screened for food-contaminating beetles (i.e., pantry beetles) due to their adverse impact on the economy, environment, public health and safety. If found, their remains are subsequently analyzed to identify the species responsible for the contamination; each species poses different levels of risk, requiring different regulatory and management steps. At present, this identification is done through manual microscopic examination since each species of beetle has a unique pattern on its elytra (hardened forewing). Our study sought to automate the pattern recognition process through machine learning. Such automation will enable more efficient identification of pantry beetle species and could potentially be scaled up and implemented across various analysis centers in a consistent manner. In our earlier studies, we demonstrated that automated species identification of pantry beetles is feasible through elytral pattern recognition. Due to poor image quality, however, we failed to achieve prediction accuracies of more than 80%. Subsequently, we modified the traditional imaging technique, allowing us to acquire high-quality elytral images. In this study, we explored whether high-quality elytral images can truly achieve near-perfect prediction accuracies for 27 different species of pantry beetles. To test this hypothesis, we developed a convolutional neural network (CNN) model and compared performance between two different image sets for various pantry beetles. Our study indicates improved image quality indeed leads to better prediction accuracy; however, it was not the only requirement for achieving good accuracy. Also required are many high-quality images, especially for species with a high number of variations in their elytral patterns. The current study provided a direction toward achieving our ultimate goal of automated species identification through elytral pattern recognition.

3.
Genome Biol ; 22(1): 109, 2021 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863344

RESUMO

BACKGROUND: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS: All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION: This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.


Assuntos
Biomarcadores Tumorais , Testes Genéticos/métodos , Genômica/métodos , Neoplasias/genética , Oncogenes , Variações do Número de Cópias de DNA , Testes Genéticos/normas , Genômica/normas , Humanos , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Mutação , Neoplasias/diagnóstico , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
4.
Genome Biol ; 22(1): 111, 2021 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863366

RESUMO

BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS: In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION: These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.


Assuntos
Alelos , Biomarcadores Tumorais , Frequência do Gene , Testes Genéticos/métodos , Variação Genética , Genômica/métodos , Neoplasias/genética , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Heterogeneidade Genética , Testes Genéticos/normas , Genômica/normas , Humanos , Neoplasias/diagnóstico , Fluxo de Trabalho
5.
PLoS One ; 16(3): e0248375, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33788842

RESUMO

We evaluated the utility of leucocyte epigenomic-biomarkers for Alzheimer's Disease (AD) detection and elucidates its molecular pathogeneses. Genome-wide DNA methylation analysis was performed using the Infinium MethylationEPIC BeadChip array in 24 late-onset AD (LOAD) and 24 cognitively healthy subjects. Data were analyzed using six Artificial Intelligence (AI) methodologies including Deep Learning (DL) followed by Ingenuity Pathway Analysis (IPA) was used for AD prediction. We identified 152 significantly (FDR p<0.05) differentially methylated intragenic CpGs in 171 distinct genes in AD patients compared to controls. All AI platforms accurately predicted AD with AUCs ≥0.93 using 283,143 intragenic and 244,246 intergenic/extragenic CpGs. DL had an AUC = 0.99 using intragenic CpGs, with both sensitivity and specificity being 97%. High AD prediction was also achieved using intergenic/extragenic CpG sites (DL significance value being AUC = 0.99 with 97% sensitivity and specificity). Epigenetically altered genes included CR1L & CTSV (abnormal morphology of cerebral cortex), S1PR1 (CNS inflammation), and LTB4R (inflammatory response). These genes have been previously linked with AD and dementia. The differentially methylated genes CTSV & PRMT5 (ventricular hypertrophy and dilation) are linked to cardiovascular disease and of interest given the known association between impaired cerebral blood flow, cardiovascular disease, and AD. We report a novel, minimally invasive approach using peripheral blood leucocyte epigenomics, and AI analysis to detect AD and elucidate its pathogenesis.


Assuntos
Doença de Alzheimer/sangue , Doença de Alzheimer/genética , Aprendizado Profundo , Epigênese Genética , Epigenômica/métodos , Transtornos de Início Tardio/genética , Leucócitos/metabolismo , Idoso , Idoso de 80 Anos ou mais , Biomarcadores/sangue , Estudos de Casos e Controles , Ilhas de CpG/genética , Metilação de DNA/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Prognóstico , Sensibilidade e Especificidade , Transdução de Sinais/genética
6.
Metabolites ; 10(9)2020 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-32878308

RESUMO

The lack of sensitive and specific biomarkers for the early detection of mild cognitive impairment (MCI) and Alzheimer's disease (AD) is a major hurdle to improving patient management. A targeted, quantitative metabolomics approach using both 1H NMR and mass spectrometry was employed to investigate the performance of urine metabolites as potential biomarkers for MCI and AD. Correlation-based feature selection (CFS) and least absolute shrinkage and selection operator (LASSO) methods were used to develop biomarker panels tested using support vector machine (SVM) and logistic regression models for diagnosis of each disease state. Metabolic changes were investigated to identify which biochemical pathways were perturbed as a direct result of MCI and AD in urine. Using SVM, we developed a model with 94% sensitivity, 78% specificity, and 78% AUC to distinguish healthy controls from AD sufferers. Using logistic regression, we developed a model with 85% sensitivity, 86% specificity, and an AUC of 82% for AD diagnosis as compared to cognitively healthy controls. Further, we identified 11 urinary metabolites that were significantly altered to include glucose, guanidinoacetate, urocanate, hippuric acid, cytosine, 2- and 3-hydroxyisovalerate, 2-ketoisovalerate, tryptophan, trimethylamine N oxide, and malonate in AD patients, which are also capable of diagnosing MCI, with a sensitivity value of 76%, specificity of 75%, and accuracy of 81% as compared to healthy controls. This pilot study suggests that urine metabolomics may be useful for developing a test capable of diagnosing and distinguishing MCI and AD from cognitively healthy controls.

7.
Metabolites ; 10(6)2020 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-32585915

RESUMO

Epilepsy not-otherwise-specified (ENOS) is one of the most common causes of chronic disorders impacting human health, with complex multifactorial etiology and clinical presentation. Understanding the metabolic processes associated with the disorder may aid in the discovery of preventive and therapeutic measures. Post-mortem brain samples were harvested from the frontal cortex (BA8/46) of people diagnosed with ENOS cases (n = 15) and age- and sex-matched control subjects (n = 15). We employed a targeted metabolomics approach using a combination of proton nuclear magnetic resonance (1H-NMR) and direct injection/liquid chromatography tandem mass spectrometry (DI/LC-MS/MS). We accurately identified and quantified 72 metabolites using 1H-NMR and 159 using DI/LC-MS/MS. Among the 212 detected metabolites, 14 showed significant concentration changes between ENOS cases and controls (p < 0.05; q < 0.05). Of these, adenosine monophosphate and O-acetylcholine were the most commonly selected metabolites used to develop predictive models capable of discriminating between ENOS and unaffected controls. Metabolomic set enrichment analysis identified ethanol degradation, butyrate metabolism and the mitochondrial beta-oxidation of fatty acids as the top three significantly perturbed metabolic pathways. We report, for the first time, the metabolomic profiling of postmortem brain tissue form patients who died from epilepsy. These findings can potentially expand upon the complex etiopathogenesis and help identify key predictive biomarkers of ENOS.

8.
PLoS One ; 14(4): e0214121, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30998683

RESUMO

OBJECTIVE: To interrogate the pathogenesis of intrauterine growth restriction (IUGR) and apply Artificial Intelligence (AI) techniques to multi-platform i.e. nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) based metabolomic analysis for the prediction of IUGR. MATERIALS AND METHODS: MS and NMR based metabolomic analysis were performed on cord blood serum from 40 IUGR (birth weight < 10th percentile) cases and 40 controls. Three variable selection algorithms namely: Correlation-based feature selection (CFS), Partial least squares regression (PLS) and Learning Vector Quantization (LVQ) were tested for their diagnostic performance. For each selected set of metabolites and the panel consists of metabolites common in three selection algorithms so-called overlapping set (OL), support vector machine (SVM) models were developed for which parameter selection was performed busing 10-fold cross validations. Area under the receiver operating characteristics curve (AUC), sensitivity and specificity values were calculated for IUGR diagnosis. Metabolite set enrichment analysis (MSEA) was performed to identify which metabolic pathways were perturbed as a direct result of IUGR in cord blood serum. RESULTS: All selected metabolites and their overlapping set achieved statistically significant accuracies in the range of 0.78-0.82 for their optimized SVM models. The model utilizing all metabolites in the dataset had an AUC = 0.91 with a sensitivity of 0.83 and specificity equal to 0.80. CFS and OL (Creatinine, C2, C4, lysoPC.a.C16.1, lysoPC.a.C20.3, lysoPC.a.C28.1, PC.aa.C24.0) showed the highest performance with sensitivity (0.87) and specificity (0.87), respectively. MSEA revealed significantly altered metabolic pathways in IUGR cases. Dysregulated pathways include: beta oxidation of very long fatty acids, oxidation of branched chain fatty acids, phospholipid biosynthesis, lysine degradation, urea cycle and fatty acid metabolism. CONCLUSION: A systematically selected panel of metabolites was shown to accurately detect IUGR in newborn cord blood serum. Significant disturbance of hepatic function and energy generating pathways were found in IUGR cases.


Assuntos
Peso ao Nascer/fisiologia , Sangue Fetal/metabolismo , Retardo do Crescimento Fetal/metabolismo , Metabolômica/métodos , Inteligência Artificial , Feminino , Retardo do Crescimento Fetal/diagnóstico , Retardo do Crescimento Fetal/fisiopatologia , Idade Gestacional , Humanos , Recém-Nascido , Recém-Nascido Pequeno para a Idade Gestacional , Análise dos Mínimos Quadrados , Espectroscopia de Ressonância Magnética , Espectrometria de Massas , Curva ROC
9.
Sci Rep ; 8(1): 6532, 2018 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-29695741

RESUMO

Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy  for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.


Assuntos
Besouros/crescimento & desenvolvimento , Contaminação de Alimentos/prevenção & controle , Inocuidade dos Alimentos/métodos , Algoritmos , Animais , Inteligência Artificial , Aprendizado de Máquina , Redes Neurais de Computação , Máquina de Vetores de Suporte
10.
Front Genet ; 9: 22, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29467792

RESUMO

MicroRNAs (miRNAs) are key post-transcriptional regulators that affect protein translation by targeting mRNAs. Their role in disease etiology and toxicity are well recognized. Given the rapid advancement of next-generation sequencing techniques, miRNA profiling has been increasingly conducted with RNA-seq, namely miRNA-seq. Analysis of miRNA-seq data requires several steps: (1) mapping the reads to miRBase, (2) considering mismatches during the hairpin alignment (windowing), and (3) counting the reads (quantification). The choice made in each step with respect to the parameter settings could affect miRNA quantification, differentially expressed miRNAs (DEMs) detection and novel miRNA identification. Furthermore, these parameters do not act in isolation and their joint effects impact miRNA-seq results and interpretation. In toxicogenomics, the variation associated with parameter setting should not overpower the treatment effect (such as the dose/time-dependent effect). In this study, four commonly used miRNA-seq analysis tools (i.e., miRDeep2, miRExpress, miRNAkey, sRNAbench) were comparatively evaluated with a standard toxicogenomics study design. We tested 30 different parameter settings on miRNA-seq data generated from thioacetamide-treated rat liver samples for three dose levels across four time points, followed by four normalization options. Because both miRExpress and miRNAkey yielded larger variation than that of the treatment effects across multiple parameter settings, our analyses mainly focused on the side-by-side comparison between miRDeep2 and sRNAbench. While the number of miRNAs detected by miRDeep2 was almost the subset of those detected by sRNAbench, the number of DEMs identified by both tools was comparable under the same parameter settings and normalization method. Change in the number of nucleotides out of the mature sequence in the hairpin alignment (window option) yielded the largest variation for miRNA quantification and DEMs detection. However, such a variation is relatively small compared to the treatment effect when the study focused on DEMs that are more critical to interpret the toxicological effect. While the normalization methods introduced a large variation in DEMs, toxic behavior of thioacetamide showed consistency in the trend of time-dose responses. Overall, miRDeep2 was found to be preferable over other choices when the window option allowed up to three nucleotides from both ends.

11.
Sci Rep ; 7(1): 3054, 2017 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-28596526

RESUMO

Environmental chemicals exposure is one of the primary factors for liver toxicity and hepatocarcinoma. Thioacetamide (TAA) is a well-known hepatotoxicant and could be a liver carcinogen in humans. The discovery of early and sensitive microRNA (miRNA) biomarkers in liver injury and tumor progression could improve cancer diagnosis, prognosis, and management. To study this, we performed next generation sequencing of the livers of Sprague-Dawley rats treated with TAA at three doses (4.5, 15 and 45 mg/kg) and four time points (3-, 7-, 14- and 28-days). Overall, 330 unique differentially expressed miRNAs (DEMs) were identified in the entire TAA-treatment course. Of these, 129 DEMs were found significantly enriched for the "liver cancer" annotation. These results were further complemented by pathway analysis (Molecular Mechanisms of Cancer, p53-, TGF-ß-, MAPK- and Wnt-signaling). Two miRNAs (rno-miR-34a-5p and rno-miR-455-3p) out of 48 overlapping DEMs were identified to be early and sensitive biomarkers for TAA-induced hepatocarcinogenicity. We have shown significant regulatory associations between DEMs and TAA-induced liver carcinogenesis at an earlier stage than histopathological features. Most importantly, miR-34a-5p is the most suitable early and sensitive biomarker for TAA-induced hepatocarcinogenesis due to its consistent elevation during the entire treatment course.


Assuntos
Carcinogênese/genética , Neoplasias Hepáticas/genética , MicroRNAs/genética , Animais , Fígado/metabolismo , Neoplasias Hepáticas/etiologia , Neoplasias Hepáticas/patologia , Sistema de Sinalização das MAP Quinases , Masculino , Ratos , Ratos Sprague-Dawley , Tioacetamida/toxicidade , Via de Sinalização Wnt
12.
13.
PLoS One ; 11(6): e0157940, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27341524

RESUMO

A crucial step of food contamination inspection is identifying the species of beetle fragments found in the sample, since the presence of some storage beetles is a good indicator of insanitation or potential food safety hazards. The current pratice, visual examination by human analysts, is time consuming and requires several years of experience. Here we developed a species identification algorithm which utilizes images of microscopic elytra fragments. The elytra, or hardened forewings, occupy a large portion of the body, and contain distinctive patterns. In addition, elytra fragments are more commonly recovered from processed food products than other body parts due to their hardness. As a preliminary effort, we chose 15 storage product beetle species frequently detected in food inspection. The elytra were then separated from the specimens and imaged under a microscope. Both global and local characteristics were quantified and used as feature inputs to artificial neural networks for species classification. With leave-one-out cross validation, we achieved overall accuracy of 80% through the proposed global and local features, which indicates that our proposed features could differentiate these species. Through examining the overall and per species accuracies, we further demonstrated that the local features are better suited than the global features for species identification. Future work will include robust testing with more beetle species and algorithm refinement for a higher accuracy.

14.
BMC Bioinformatics ; 15: 267, 2014 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-25103881

RESUMO

BACKGROUND: The phenome represents a distinct set of information in the human population. It has been explored particularly in its relationship with the genome to identify correlations for diseases. The phenome has been also explored for drug repositioning with efforts focusing on the search space for the most similar candidate drugs. For a comprehensive analysis of the phenome, we assumed that all phenotypes (indications and side effects) were inter-connected with a probabilistic distribution and this characteristic may offer an opportunity to identify new therapeutic indications for a given drug. Correspondingly, we employed Latent Dirichlet Allocation (LDA), which introduces latent variables (topics) to govern the phenome distribution. RESULTS: We developed our model on the phenome information in Side Effect Resource (SIDER). We first developed a LDA model optimized based on its recovery potential through perturbing the drug-phenotype matrix for each of the drug-indication pairs where each drug-indication relationship was switched to "unknown" one at the time and then recovered based on the remaining drug-phenotype pairs. Of the probabilistically significant pairs, 70% was successfully recovered. Next, we applied the model on the whole phenome to narrow down repositioning candidates and suggest alternative indications. We were able to retrieve approved indications of 6 drugs whose indications were not listed in SIDER. For 908 drugs that were present with their indication information, our model suggested alternative treatment options for further investigations. Several of the suggested new uses can be supported with information from the scientific literature. CONCLUSIONS: The results demonstrated that the phenome can be further analyzed by a generative model, which can discover probabilistic associations between drugs and therapeutic uses. In this regard, LDA serves as an enrichment tool to explore new uses of existing drugs by narrowing down the search space.


Assuntos
Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos , Modelos Estatísticos , Fenótipo , Mineração de Dados , Bases de Dados de Produtos Farmacêuticos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos
15.
Biomark Med ; 8(2): 201-13, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24521015

RESUMO

Drug-induced liver injury (DILI) is a frequent cause for the termination of drug development programs and a leading reason of drug withdrawal from the marketplace. Unfortunately, the current preclinical testing strategies, including the regulatory-required animal toxicity studies or simple in vitro tests, are insufficiently powered to predict DILI in patients reliably. Notably, the limited predictive power of such testing strategies is mostly attributed to the complex nature of DILI, a poor understanding of its mechanism, a scarcity of human hepatotoxicity data and inadequate bioinformatics capabilities. With the advent of high-content screening assays, toxicogenomics and bioinformatics, multiple end points can be studied simultaneously to improve prediction of clinically relevant DILIs. This review focuses on the current state of efforts in developing predictive models from diverse data sources for potential use in detecting human hepatotoxicity, and also aims to provide perspectives on how to further improve DILI prediction.


Assuntos
Doença Hepática Induzida por Substâncias e Drogas/patologia , Modelos Biológicos , Animais , Biomarcadores/metabolismo , Doença Hepática Induzida por Substâncias e Drogas/metabolismo , Biologia Computacional , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Preparações Farmacêuticas/classificação , Preparações Farmacêuticas/metabolismo , Relação Quantitativa Estrutura-Atividade , Toxicogenética/tendências
16.
BMC Bioinformatics ; 14 Suppl 14: S11, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24267543

RESUMO

BACKGROUND: High Content Screening (HCS) has become an important tool for toxicity assessment, partly due to its advantage of handling multiple measurements simultaneously. This approach has provided insight and contributed to the understanding of systems biology at cellular level. To fully realize this potential, the simultaneously measured multiple endpoints from a live cell should be considered in a probabilistic relationship to assess the cell's condition to response stress from a treatment, which poses a great challenge to extract hidden knowledge and relationships from these measurements. METHOD: In this work, we applied a text mining method of Latent Dirichlet Allocation (LDA) to analyze cellular endpoints from in vitro HCS assays and related to the findings to in vivo histopathological observations. We measured multiple HCS assay endpoints for 122 drugs. Since LDA requires the data to be represented in document-term format, we first converted the continuous value of the measurements to the word frequency that can processed by the text mining tool. For each of the drugs, we generated a document for each of the 4 time points. Thus, we ended with 488 documents (drug-hour) each having different values for the 10 endpoints which are treated as words. We extracted three topics using LDA and examined these to identify diagnostic topics for 45 common drugs located in vivo experiments from the Japanese Toxicogenomics Project (TGP) observing their necrosis findings at 6 and 24 hours after treatment. RESULTS: We found that assay endpoints assigned to particular topics were in concordance with the histopathology observed. Drugs showing necrosis at 6 hour were linked to severe damage events such as Steatosis, DNA Fragmentation, Mitochondrial Potential, and Lysosome Mass. DNA Damage and Apoptosis were associated with drugs causing necrosis at 24 hours, suggesting an interplay of the two pathways in these drugs. Drugs with no sign of necrosis we related to the Cell Loss and Nuclear Size assays, which is suggestive of hepatocyte regeneration. CONCLUSIONS: The evidence from this study suggests that topic modeling with LDA can enable us to interpret relationships of endpoints of in vitro assays along with an in vivo histological finding, necrosis. Effectiveness of this approach may add substantially to our understanding of systems biology.


Assuntos
Mineração de Dados , Toxicogenética/métodos , Animais , Apoptose/efeitos dos fármacos , Células Cultivadas , Dano ao DNA , Bases de Dados Genéticas , Hepatócitos/efeitos dos fármacos , Hepatócitos/metabolismo , Ensaios de Triagem em Larga Escala , Lisossomos/metabolismo , Masculino , Mitocôndrias/efeitos dos fármacos , Mitocôndrias/genética , Mitocôndrias/metabolismo , Necrose/genética , Necrose/metabolismo , Ratos , Ratos Sprague-Dawley
17.
BMC Bioinformatics ; 13 Suppl 15: S6, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23046522

RESUMO

BACKGROUND: Drug repositioning offers an opportunity to revitalize the slowing drug discovery pipeline by finding new uses for currently existing drugs. Our hypothesis is that drugs sharing similar side effect profiles are likely to be effective for the same disease, and thus repositioning opportunities can be identified by finding drug pairs with similar side effects documented in U.S. Food and Drug Administration (FDA) approved drug labels. The safety information in the drug labels is usually obtained in the clinical trial and augmented with the observations in the post-market use of the drug. Therefore, our drug repositioning approach can take the advantage of more comprehensive safety information comparing with conventional de novo approach. METHOD: A probabilistic topic model was constructed based on the terms in the Medical Dictionary for Regulatory Activities (MedDRA) that appeared in the Boxed Warning, Warnings and Precautions, and Adverse Reactions sections of the labels of 870 drugs. Fifty-two unique topics, each containing a set of terms, were identified by using topic modeling. The resulting probabilistic topic associations were used to measure the distance (similarity) between drugs. The success of the proposed model was evaluated by comparing a drug and its nearest neighbor (i.e., a drug pair) for common indications found in the Indications and Usage Section of the drug labels. RESULTS: Given a drug with more than three indications, the model yielded a 75% recall, meaning 75% of drug pairs shared one or more common indications. This is significantly higher than the 22% recall rate achieved by random selection. Additionally, the recall rate grows rapidly as the number of drug indications increases and reaches 84% for drugs with 11 indications. The analysis also demonstrated that 65 drugs with a Boxed Warning, which indicates significant risk of serious and possibly life-threatening adverse effects, might be replaced with safer alternatives that do not have a Boxed Warning. In addition, we identified two therapeutic groups of drugs (Musculo-skeletal system and Anti-infective for systemic use) where over 80% of the drugs have a potential replacement with high significance. CONCLUSION: Topic modeling can be a powerful tool for the identification of repositioning opportunities by examining the adverse event terms in FDA approved drug labels. The proposed framework not only suggests drugs that can be repurposed, but also provides insight into the safety of repositioned drugs.


Assuntos
Reposicionamento de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Modelos Teóricos , Rotulagem de Medicamentos , Estados Unidos , United States Food and Drug Administration
18.
BMC Bioinformatics ; 12 Suppl 10: S11, 2011 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-22166012

RESUMO

BACKGROUND: The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive. METHOD: In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering "topics" that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs. RESULTS: The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics. CONCLUSIONS: The successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents.


Assuntos
Mineração de Dados/métodos , Rotulagem de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Rotulagem de Alimentos , Humanos , Semântica , Estados Unidos , United States Food and Drug Administration
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...