Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 58
Filter
1.
Drug Discov Today ; 29(6): 104018, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38723763

ABSTRACT

Text summarization is crucial in scientific research, drug discovery and development, regulatory review, and more. This task demands domain expertise, language proficiency, semantic prowess, and conceptual skill. The recent advent of large language models (LLMs), such as ChatGPT, offers unprecedented opportunities to automate this process. We compared ChatGPT-generated summaries with those produced by human experts using FDA drug labeling documents. The labeling contains summaries of key labeling sections, making them an ideal human benchmark to evaluate ChatGPT's summarization capabilities. Analyzing >14000 summaries, we observed that ChatGPT-generated summaries closely resembled those generated by human experts. Importantly, ChatGPT exhibited even greater similarity when summarizing drug safety information. These findings highlight ChatGPT's potential to accelerate work in critical areas, including drug safety.


Subject(s)
Drug Labeling , United States Food and Drug Administration , Humans , United States , Natural Language Processing , Drug-Related Side Effects and Adverse Reactions
2.
Article in English | MEDLINE | ID: mdl-38619534

ABSTRACT

In the rapidly evolving field of artificial intelligence (AI), explainability has been traditionally assessed in a post-modeling process and is often subjective. In contrary, many quantitative metrics have been routinely used to assess a model's performance. We proposed a unified formular named PERForm, by incorporating explainability as a weight into the existing statistical metrics to provide an integrated and quantitative measure of both predictivity and explainability to guide model selection, application, and evaluation. PERForm was designed as a generic formula and can be applied to any data types. We applied PERForm on a range of diverse datasets, including DILIst, Tox21, and three MAQC-II benchmark datasets, using various modeling algorithms to predict a total of 73 distinct endpoints. For example, AdaBoost algorithms exhibited superior performance (PERForm AUC for AdaBoost is 0.129 where Linear regression is 0) in DILIst prediction, where linear regression outperformed other models in the majority of Tox21 endpoints (PERForm AUC for linear regression is 0.301 where AdaBoost is 0.283 in average). This research marks a significant step toward comprehensively evaluating the utility of an AI model to advance transparency and interpretability, where the tradeoff between a model's performance and its interpretability can have profound implications.

3.
Regul Toxicol Pharmacol ; 149: 105613, 2024 May.
Article in English | MEDLINE | ID: mdl-38570021

ABSTRACT

Regulatory agencies consistently deal with extensive document reviews, ranging from product submissions to both internal and external communications. Large Language Models (LLMs) like ChatGPT can be invaluable tools for these tasks, however present several challenges, particularly the proprietary information, combining customized function with specific review needs, and transparency and explainability of the model's output. Hence, a localized and customized solution is imperative. To tackle these challenges, we formulated a framework named askFDALabel on FDA drug labeling documents that is a crucial resource in the FDA drug review process. AskFDALabel operates within a secure IT environment and comprises two key modules: a semantic search and a Q&A/text-generation module. The Module S built on word embeddings to enable comprehensive semantic queries within labeling documents. The Module T utilizes a tuned LLM to generate responses based on references from Module S. As the result, our framework enabled small LLMs to perform comparably to ChatGPT with as a computationally inexpensive solution for regulatory application. To conclude, through AskFDALabel, we have showcased a pathway that harnesses LLMs to support agency operations within a secure environment, offering tailored functions for the needs of regulatory research.


Subject(s)
Drug Labeling , United States Food and Drug Administration , Drug Labeling/standards , Drug Labeling/legislation & jurisprudence , United States Food and Drug Administration/standards , United States , Humans
4.
Clin Pharmacol Ther ; 115(4): 687-697, 2024 04.
Article in English | MEDLINE | ID: mdl-38018360

ABSTRACT

Artificial intelligence (AI) is increasingly being used in decision making across various industries, including the public health arena. Bias in any decision-making process can significantly skew outcomes, and AI systems have been shown to exhibit biases at times. The potential for AI systems to perpetuate and even amplify biases is a growing concern. Bias, as used in this paper, refers to the tendency toward a particular characteristic or behavior, and thus, a biased AI system is one that shows biased associations entities. In this literature review, we examine the current state of research on AI bias, including its sources, as well as the methods for measuring, benchmarking, and mitigating it. We also examine the biases and methods of mitigation specifically relevant to the healthcare field and offer a perspective on bias measurement and mitigation in regulatory science decision making.


Subject(s)
Artificial Intelligence , Benchmarking , Humans , Bias , Public Health
5.
Chem Res Toxicol ; 36(8): 1321-1331, 2023 08 21.
Article in English | MEDLINE | ID: mdl-37540590

ABSTRACT

The pathology of animal studies is crucial for toxicity evaluations and regulatory assessments, but the manual examination of slides by pathologists remains time-consuming and requires extensive training. One inherent challenge in this process is the interobserver variability, which can compromise the consistency and accuracy of a study. Artificial intelligence (AI) has demonstrated its ability to automate similar examinations in clinical applications with enhanced efficiency, consistency, and accuracy. However, training AI models typically relies on costly pixel-level annotation of injured regions and is often not available for animal pathology. To address this, we developed the PathologAI system, a "weakly" supervised approach for WSI classification in rat images without explicit lesion annotation at the pixel level. Using rat liver imaging data from the Open TG-GATEs system, PathologAI was applied to predict necrosis of n = 816 WSIs (377 controls). TG-GATEs studied 170 compounds at three dose levels (low, middle, and high) for four time points (3, 7, 14, and 28 days). PathologAI first preprocessed WSIs at the tile level to generate a high-level representation with a Generative Adversarial Network architecture. The prediction of liver necrosis relied on an ensemble model of 5 CNN classifiers trained on 335 WSIs. The ensemble model achieved notable classification accuracy on the holdout test set: 87% among 87 control slides free of findings, 83% among 120 controls with spontaneous necrosis, 67% among 147 treated animals with spontaneous minimal or slight necrosis, and 59% among 127 treated animals with minimal or slight necrosis caused by the treatment. Importantly, PathologAI was able to discriminate WSIs with spontaneous necrosis from those with treatment related necrosis and discriminated mild lesion level findings (slight vs minimal) and between treatment dose levels. PathologAI could provide an inexpensive and rapid screening tool to assist the digital pathology analysis in preclinical applications and general toxicological studies.


Subject(s)
Artificial Intelligence , Deep Learning , Animals , Rats , Necrosis
6.
Chem Res Toxicol ; 36(8): 1290-1299, 2023 08 21.
Article in English | MEDLINE | ID: mdl-37487037

ABSTRACT

The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents.


Subject(s)
Artificial Intelligence , Drug Labeling , United States , Electric Power Supplies , Product Labeling , United States Food and Drug Administration
7.
Foot Ankle Int ; 44(1): 13-20, 2023 01.
Article in English | MEDLINE | ID: mdl-36461676

ABSTRACT

BACKGROUND: There are 2 general types of total ankle replacement (TAR) designs with respect to the polyethylene insert, mobile-bearing (MB) and fixed-bearing (FB) TARs. The aim of this study is to compare polyethylene-related adverse events (AEs), particularly revisions, reported for MB TARs and FB TARs using the US Food and Drug Administration's (FDA's) Manufacturer and User Facility Device Experience (MAUDE) database. METHODS: A text mining method was applied to the medical device reporting (MDR) in the MAUDE database from 1991 to 2020, followed by manual reviews to identify, characterize, and describe all polyethylene-related AEs, including revisions, reported for MB and FB TARs. RESULTS: We found 1841 MDRs for MB (STAR Ankle only) and 1273 MDRs for 40+ FB TARs approved/cleared by the FDA. For the MB design, 33% (606/1841) of the AEs reported related to the polyethylene component, compared to 24% (291/1273) of the AEs reported for FB designs. Polyethylene fractures were reported in 11.3% (208/1841) for the MB designs compared to 0.2% (2/1273) for the FB designs. Half of the polyethylene-related revisions occurred within an average of 4.1 years after implantation for the MB design compared within an average of 5.2 years for FB designs. CONCLUSION: Analysis of this database revealed a higher proportion of reported polyethylene fractures and greater need for earlier revisions for polyethylene-related issues with use of the primary MB design in the database as compared with FB TAR designs. Further study of device-related complications with more recent designs for both MB and FB ankle replacement components are needed to improve the outcomes of total ankle replacement. LEVEL OF EVIDENCE: Level III, retrospective comparative study.


Subject(s)
Arthroplasty, Replacement, Ankle , United States , Humans , Arthroplasty, Replacement, Ankle/adverse effects , Polyethylene , United States Food and Drug Administration , Retrospective Studies , Ankle Joint/surgery , Databases, Factual
8.
Regul Toxicol Pharmacol ; 137: 105287, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36372266

ABSTRACT

In the field of regulatory science, reviewing literature is an essential and important step, which most of the time is conducted by manually reading hundreds of articles. Although this process is highly time-consuming and labor-intensive, most output of this process is not well transformed into machine-readable format. The limited availability of data has largely constrained the artificial intelligence (AI) system development to facilitate this literature reviewing in the regulatory process. In the past decade, AI has revolutionized the area of text mining as many deep learning approaches have been developed to search, annotate, and classify relevant documents. After the great advancement of AI algorithms, a lack of high-quality data instead of the algorithms has recently become the bottleneck of AI system development. Herein, we constructed two large benchmark datasets, Chlorine Efficacy dataset (CHE) and Chlorine Safety dataset (CHS), under a regulatory scenario that sought to assess the antiseptic efficacy and toxicity of chlorine. For each dataset, ∼10,000 scientific articles were initially collected, manually reviewed, and their relevance to the review task were labeled. To ensure high data quality, each paper was labeled by a consensus among multiple experienced reviewers. The overall relevance rate was 27.21% (2,663 of 9,788) for CHE and 7.50% (761 of 10,153) for CHS, respectively. Furthermore, the relevant articles were categorized into five subgroups based on the focus of their content. Next, we developed an attention-based classification language model using these two datasets. The proposed classification model yielded 0.857 and 0.908 of Area Under the Curve (AUC) for CHE and CHS dataset, respectively. This performance was significantly better than permutation test (p < 10E-9), demonstrating that the labeling processes were valid. To conclude, our datasets can be used as benchmark to develop AI systems, which can further facilitate the literature review process in regulatory science.


Subject(s)
Artificial Intelligence , Machine Learning , Benchmarking , Sentiment Analysis , Chlorine , Data Mining
9.
Exp Biol Med (Maywood) ; 248(21): 1937-1943, 2023 11.
Article in English | MEDLINE | ID: mdl-38166420

ABSTRACT

The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 F1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.


Subject(s)
Drug-Related Side Effects and Adverse Reactions , Prescription Drugs , United States , Humans , Artificial Intelligence , Drug Labeling , Data Mining
10.
Microbiol Resour Announc ; 11(10): e0084722, 2022 Oct 20.
Article in English | MEDLINE | ID: mdl-36047780

ABSTRACT

Campylobacter coli is a leading bacterial cause of human gastroenteritis. We reported the circularized 1.8-Mbp complete genome of MLST type 1055 C. coli strain P4581 isolated from a rhesus monkey, Macaca mulatta, hybridizing Illumina short- and Nanopore long-reads.

11.
Article in English | MEDLINE | ID: mdl-36011614

ABSTRACT

COVID-19 can lead to multiple severe outcomes including neurological and psychological impacts. However, it is challenging to manually scan hundreds of thousands of COVID-19 articles on a regular basis. To update our knowledge, provide sound science to the public, and communicate effectively, it is critical to have an efficient means of following the most current published data. In this study, we developed a language model to search abstracts using the most advanced artificial intelligence (AI) to accurately retrieve articles on COVID-19-associated neurological disorders. We applied this NeuroCORD model to the largest benchmark dataset of COVID-19, CORD-19. We found that the model developed on the training set yielded 94% prediction accuracy on the test set. This result was subsequently verified by two experts in the field. In addition, when applied to 96,000 non-labeled articles that were published after 2020, the NeuroCORD model accurately identified approximately 3% of them to be relevant for the study of COVID-19-associated neurological disorders, while only 0.5% were retrieved using conventional keyword searching. In conclusion, NeuroCORD provides an opportunity to profile neurological disorders resulting from COVID-19 in a rapid and efficient fashion, and its general framework could be used to study other COVID-19-related emerging health issues.


Subject(s)
COVID-19 , Nervous System Diseases , Artificial Intelligence , Humans , Language , Nervous System Diseases/epidemiology , Nervous System Diseases/etiology
12.
Front Artif Intell ; 5: 952424, 2022.
Article in English | MEDLINE | ID: mdl-36034596

ABSTRACT

Food samples are routinely screened for food-contaminating beetles (i.e., pantry beetles) due to their adverse impact on the economy, environment, public health and safety. If found, their remains are subsequently analyzed to identify the species responsible for the contamination; each species poses different levels of risk, requiring different regulatory and management steps. At present, this identification is done through manual microscopic examination since each species of beetle has a unique pattern on its elytra (hardened forewing). Our study sought to automate the pattern recognition process through machine learning. Such automation will enable more efficient identification of pantry beetle species and could potentially be scaled up and implemented across various analysis centers in a consistent manner. In our earlier studies, we demonstrated that automated species identification of pantry beetles is feasible through elytral pattern recognition. Due to poor image quality, however, we failed to achieve prediction accuracies of more than 80%. Subsequently, we modified the traditional imaging technique, allowing us to acquire high-quality elytral images. In this study, we explored whether high-quality elytral images can truly achieve near-perfect prediction accuracies for 27 different species of pantry beetles. To test this hypothesis, we developed a convolutional neural network (CNN) model and compared performance between two different image sets for various pantry beetles. Our study indicates improved image quality indeed leads to better prediction accuracy; however, it was not the only requirement for achieving good accuracy. Also required are many high-quality images, especially for species with a high number of variations in their elytral patterns. The current study provided a direction toward achieving our ultimate goal of automated species identification through elytral pattern recognition.

13.
Front Artif Intell ; 4: 729834, 2021.
Article in English | MEDLINE | ID: mdl-34939028

ABSTRACT

Background & Aims: The United States Food and Drug Administration (FDA) regulates a broad range of consumer products, which account for about 25% of the United States market. The FDA regulatory activities often involve producing and reading of a large number of documents, which is time consuming and labor intensive. To support regulatory science at FDA, we evaluated artificial intelligence (AI)-based natural language processing (NLP) of regulatory documents for text classification and compared deep learning-based models with a conventional keywords-based model. Methods: FDA drug labeling documents were used as a representative regulatory data source to classify drug-induced liver injury (DILI) risk by employing the state-of-the-art language model BERT. The resulting NLP-DILI classification model was statistically validated with both internal and external validation procedures and applied to the labeling data from the European Medicines Agency (EMA) for cross-agency application. Results: The NLP-DILI model developed using FDA labeling documents and evaluated by cross-validations in this study showed remarkable performance in DILI classification with a recall of 1 and a precision of 0.78. When cross-agency data were used to validate the model, the performance remained comparable, demonstrating that the model was portable across agencies. Results also suggested that the model was able to capture the semantic meanings of sentences in drug labeling. Conclusion: Deep learning-based NLP models performed well in DILI classification of drug labeling documents and learned the meanings of complex text in drug labeling. This proof-of-concept work demonstrated that using AI technologies to assist regulatory activities is a promising approach to modernize and advance regulatory science.

14.
Sci Rep ; 11(1): 7957, 2021 04 12.
Article in English | MEDLINE | ID: mdl-33846381

ABSTRACT

Identifying the exact species of pantry beetle responsible for food contamination, is imperative in assessing the risks associated with contamination scenarios. Each beetle species is known to have unique patterns on their hardened forewings (known as elytra) through which they can be identified. Currently, this is done through manual microanalysis of the insect or their fragments in contaminated food samples. We envision that the use of automated pattern analysis would expedite and scale up the identification process. However, such automation would require images to be captured in a consistent manner, thereby enabling the creation of large repositories of high-quality images. Presently, there is no standard imaging technique for capturing images of beetle elytra, which consequently means, there is no standard method of beetle species identification through elytral pattern analysis. This deficiency inspired us to optimize and standardize imaging methods, especially for food-contaminating beetles. For this endeavor, we chose multiple species of beetles belonging to different families or genera that have near-identical elytral patterns, and thus are difficult to identify correctly at the species level. Our optimized imaging method provides enhanced images such that the elytral patterns between individual species could easily be distinguished from each other, through visual observation. We believe such standardization is critical in developing automated species identification of pantry beetles and/or other insects. This eventually may lead to improved taxonomical classification, allowing for better management of food contamination and ecological conservation.


Subject(s)
Coleoptera/classification , Food Contamination , Imaging, Three-Dimensional , Animals , Pattern Recognition, Automated , Species Specificity
15.
Arch Toxicol ; 95(5): 1763-1778, 2021 05.
Article in English | MEDLINE | ID: mdl-33704509

ABSTRACT

Exposure to cigarette smoke (CS) is strongly associated with impaired mucociliary clearance (MCC), which has been implicated in the pathogenesis of CS-induced respiratory diseases, such as chronic obstructive pulmonary diseases (COPD). In this study, we aimed to identify microRNAs (miRNAs) that are associated with impaired MCC caused by CS in an in vitro human air-liquid-interface (ALI) airway tissue model. ALI cultures were exposed to CS (diluted with 0.5 L/min, 1.0 L/min, and 4.0 L/min of clean air) from smoking five 3R4F University of Kentucky reference cigarettes under the International Organization for Standardization (ISO) machine smoking regimen, every other day for 1 week (a total of 3 days, 40 min/day). Transcriptome analyses of ALI cultures exposed to the high concentration of CS identified 5090 differentially expressed genes and 551 differentially expressed miRNAs after the third exposure. Genes involved in ciliary function and ciliogenesis were significantly perturbed by repeated CS exposures, leading to changes in cilia beating frequency and ciliary protein expression. In particular, a time-dependent decrease in the expression of miR-449a, a conserved miRNA highly enriched in ciliated airway epithelia and implicated in motile ciliogenesis, was observed in CS-exposed cultures. Similar alterations in miR-449a have been reported in smokers with COPD. Network analysis further indicates that downregulation of miR-449a by CS may derepress cell-cycle proteins, which, in turn, interferes with ciliogenesis. Investigating the effects of CS on transcriptome profile in human ALI cultures may provide not only mechanistic insights, but potential early biomarkers for CS exposure and harm.


Subject(s)
Nicotiana/toxicity , Smoke , Bronchi , Cells, Cultured , Cigarette Smoking , Cilia , Down-Regulation , Epithelial Cells , Gene Expression Profiling , Humans , Lung , MicroRNAs , Mucociliary Clearance , Pulmonary Disease, Chronic Obstructive , Smoking , Tobacco Products , Transcriptome
16.
Chem Res Toxicol ; 34(2): 541-549, 2021 02 15.
Article in English | MEDLINE | ID: mdl-33513003

ABSTRACT

Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model's performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.


Subject(s)
Chemical and Drug Induced Liver Injury , Machine Learning , Pharmaceutical Preparations/chemistry , Databases, Factual , Humans , Models, Molecular , Quantitative Structure-Activity Relationship
17.
Cell Rep Methods ; 1(7): 100106, 2021 11 22.
Article in English | MEDLINE | ID: mdl-35475002

ABSTRACT

The primary objective of the FDA-led Sequencing and Quality Control Phase 2 (SEQC2) project is to develop standard analysis protocols and quality control metrics for use in DNA testing to enhance scientific research and precision medicine. This study reports a targeted next-generation sequencing (NGS) method that will enable more accurate detection of actionable mutations in circulating tumor DNA (ctDNA) clinical specimens. To accomplish this, a synthetic internal standard spike-in was designed for each actionable mutation target, suitable for use in NGS following hybrid capture enrichment and unique molecular index (UMI) or non-UMI library preparation. When mixed with contrived ctDNA reference samples, internal standards enabled calculation of technical error rate, limit of blank, and limit of detection for each variant at each nucleotide position in each sample. True-positive mutations with variant allele fraction too low for detection by current practice were detected with this method, thereby increasing sensitivity.


Subject(s)
Circulating Tumor DNA , Humans , Circulating Tumor DNA/genetics , Mutation/genetics , High-Throughput Nucleotide Sequencing/methods , Precision Medicine/methods , Quality Control
18.
Chem Res Toxicol ; 34(2): 412-421, 2021 02 15.
Article in English | MEDLINE | ID: mdl-33251791

ABSTRACT

The mechanisms leading to organ level toxicities are poorly understood. In this study, we applied an integrated approach to deduce the molecular targets and biological pathways involved in chemically induced toxicity for eight common human organ level toxicity end points (carcinogenicity, cardiotoxicity, developmental toxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, reproductive toxicity, and skin toxicity). Integrated analysis of in vitro assay data, molecular targets and pathway annotations from the literature, and toxicity-molecular target associations derived from text mining, combined with machine learning techniques, were used to generate molecular targets for each of the organ level toxicity end points. A total of 1516 toxicity-related genes were identified and subsequently analyzed for biological pathway coverage, resulting in 206 significant pathways (p-value <0.05), ranging from 3 (e.g., developmental toxicity) to 101 (e.g., skin toxicity) for each toxicity end point. This study presents a systematic and comprehensive analysis of molecular targets and pathways related to various in vivo toxicity end points. These molecular targets and pathways could aid in understanding the biological mechanisms of toxicity and serve as a guide for the design of suitable in vitro assays for more efficient toxicity testing. In addition, these results are complementary to the existing adverse outcome pathway (AOP) framework and can be used to aid in the development of novel AOPs. Our results provide abundant testable hypotheses for further experimental validation.


Subject(s)
Environmental Pollutants/analysis , Machine Learning , Toxicity Tests , Environmental Pollutants/adverse effects , Humans
19.
BMC Med Inform Decis Mak ; 20(1): 68, 2020 04 15.
Article in English | MEDLINE | ID: mdl-32293428

ABSTRACT

BACKGROUND: Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to the end consumer. Image of the label also called Display Panel or label could be used to identify illegal, illicit, unapproved and potentially dangerous drugs. Due to the time-consuming process and high labor cost of investigation, an artificial intelligence-based deep learning model is necessary for fast and accurate identification of the drugs. METHODS: In addition to image-based identification technology, we take advantages of rich text information on the pharmaceutical package insert of drug label images. In this study, we developed the Drug Label Identification through Image and Text embedding model (DLI-IT) to model text-based patterns of historical data for detection of suspicious drugs. In DLI-IT, we first trained a Connectionist Text Proposal Network (CTPN) to crop the raw image into sub-images based on the text. The texts from the cropped sub-images are recognized independently through the Tesseract OCR Engine and combined as one document for each raw image. Finally, we applied universal sentence embedding to transform these documents into vectors and find the most similar reference images to the test image through the cosine similarity. RESULTS: We trained the DLI-IT model on 1749 opioid and 2365 non-opioid drug label images. The model was then tested on 300 external opioid drug label images, the result demonstrated our model achieves up-to 88% of the precision in drug label identification, which outperforms previous image-based or text-based identification method by up-to 35% improvement. CONCLUSION: To conclude, by combining Image and Text embedding analysis under deep learning framework, our DLI-IT approach achieved a competitive performance in advancing drug label identification.


Subject(s)
Deep Learning , Pharmaceutical Preparations , Artificial Intelligence
20.
Front Immunol ; 11: 224, 2020.
Article in English | MEDLINE | ID: mdl-32265897

ABSTRACT

To evaluate the expression of immune checkpoint genes, their concordance with expression of IFNγ, and to identify potential novel ICP related genes (ICPRG) in colorectal cancer (CRC), the biological connectivity of six well documented ("classical") ICPs (CTLA4, PD1, PDL1, Tim3, IDO1, and LAG3) with IFNγ and its co-expressed genes was examined by NGS in 79 CRC/healthy colon tissue pairs. Identification of novel IFNγ- induced molecules with potential ICP activity was also sought. In our study, the six classical ICPs were statistically upregulated and correlated with IFNγ, CD8A, CD8B, CD4, and 180 additional immunologically related genes in IFNγ positive (FPKM > 1) tumors. By ICP co-expression analysis, we also identified three IFNγ-induced genes [(IFNγ-inducible lysosomal thiol reductase (IFI30), guanylate binding protein1 (GBP1), and guanylate binding protein 4 (GBP4)] as potential novel ICPRGs. These three genes were upregulated in tumor compared to normal tissues in IFNγ positive tumors, co-expressed with CD8A and had relatively high abundance (average FPKM = 362, 51, and 25, respectively), compared to the abundance of the 5 well-defined ICPs (Tim3, LAG3, PDL1, CTLA4, PD1; average FPKM = 10, 9, 6, 6, and 2, respectively), although IDO1 is expressed at comparably high levels (FPKM = 39). We extended our evaluation by querying the TCGA database which revealed the commonality of IFNγ dependent expression of the three potential ICPRGs in 638 CRCs, 103 skin cutaneous melanomas (SKCM), 1105 breast cancers (BC), 184 esophageal cancers (ESC), 416 stomach cancers (STC), and 501 lung squamous carcinomas (LUSC). In terms of prognosis, based on Pathology Atlas data, correlation of GBP1 and GBP4, but not IFI30, with 5-year survival rate was favorable in CRC, BC, SKCM, and STC. Thus, further studies defining the role of IFI30, GBP1, and GBP4 in CRC are warranted.


Subject(s)
Breast Neoplasms/genetics , Colon/physiology , Colorectal Neoplasms/genetics , Interferon-gamma/metabolism , Melanoma/genetics , Skin Neoplasms/genetics , Stomach Neoplasms/genetics , Breast Neoplasms/immunology , Breast Neoplasms/mortality , Colorectal Neoplasms/immunology , Colorectal Neoplasms/mortality , Female , GTP-Binding Proteins/genetics , GTP-Binding Proteins/metabolism , High-Throughput Nucleotide Sequencing , Humans , Immune Checkpoint Proteins/genetics , Male , Melanoma/immunology , Oxidoreductases Acting on Sulfur Group Donors/genetics , Oxidoreductases Acting on Sulfur Group Donors/metabolism , Prognosis , Skin Neoplasms/immunology , Stomach Neoplasms/immunology , Stomach Neoplasms/mortality , Survival Analysis , Melanoma, Cutaneous Malignant
SELECTION OF CITATIONS
SEARCH DETAIL
...