RESUMO
BACKGROUND: The recent global pandemic has placed a high priority on identifying drugs to prevent or lessen clinical infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), caused by Coronavirus disease-2019 (COVID-19). METHODS: We applied two computational approaches to identify potential therapeutics. First, we sought to identify existing FDA approved drugs that could block coronaviruses from entering cells by binding to ACE2 or TMPRSS2 using a high-throughput AI-based binding affinity prediction platform. Second, we sought to identify FDA approved drugs that could attenuate the gene expression patterns induced by coronaviruses, using our Disease Cancelling Technology (DCT) platform. RESULTS: Top results for ACE2 binding iincluded several ACE inhibitors, a beta-lactam antibiotic, two antiviral agents (Fosamprenavir and Emricasan) and glutathione. The platform also assessed specificity for ACE2 over ACE1, important for avoiding counterregulatory effects. Further studies are needed to weigh the benefit of blocking virus entry against potential counterregulatory effects and possible protective effects of ACE2. However, the data herein suggest readily available drugs that warrant experimental evaluation to assess potential benefit. DCT was run on an animal model of SARS-CoV, and ranked compounds by their ability to induce gene expression signals that counteract disease-associated signals. Top hits included Vitamin E, ruxolitinib, and glutamine. Glutathione and its precursor glutamine were highly ranked by two independent methods, suggesting both warrant further investigation for potential benefit against SARS-CoV-2. CONCLUSIONS: While these findings are not yet ready for clinical translation, this report highlights the potential use of two bioinformatics technologies to rapidly discover existing therapeutic agents that warrant further investigation for established and emerging disease processes.
Assuntos
Betacoronavirus/fisiologia , Biologia Computacional , Infecções por Coronavirus/genética , Infecções por Coronavirus/terapia , Pneumonia Viral/genética , Pneumonia Viral/terapia , Enzima de Conversão de Angiotensina 2 , Animais , Betacoronavirus/genética , COVID-19 , Regulação da Expressão Gênica , Glutamina/metabolismo , Humanos , Camundongos , Pandemias , Peptidil Dipeptidase A/metabolismo , SARS-CoV-2 , Serina Endopeptidases/metabolismoRESUMO
BACKGROUND: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. METHODS: We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. RESULTS: The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). CONCLUSION: A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome.
Assuntos
Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , HumanosRESUMO
After publication of this supplement article.
RESUMO
Pridopidine has demonstrated improvement in Huntington Disease (HD) motor symptoms as measured by secondary endpoints in clinical trials. Originally described as a dopamine stabilizer, this mechanism is insufficient to explain the clinical and preclinical effects of pridopidine. This study therefore explored pridopidine's potential mechanisms of action. The effect of pridopidine versus sham treatment on genome-wide expression profiling in the rat striatum was analysed and compared to the pathological expression profile in Q175 knock-in (Q175 KI) vs Q25 WT mouse models. A broad, unbiased pathway analysis was conducted, followed by testing the enrichment of relevant pathways. Pridopidine upregulated the BDNF pathway (P = 1.73E-10), and its effect on BDNF secretion was sigma 1 receptor (S1R) dependent. Many of the same genes were independently found to be downregulated in Q175 KI mice compared to WT (5.2e-7 < P < 0.04). In addition, pridopidine treatment upregulated the glucocorticoid receptor (GR) response, D1R-associated genes and the AKT/PI3K pathway (P = 1E-10, P = 0.001, P = 0.004, respectively). Pridopidine upregulates expression of BDNF, D1R, GR and AKT/PI3K pathways, known to promote neuronal plasticity and survival, as well as reported to demonstrate therapeutic benefit in HD animal models. Activation of S1R, necessary for its effect on the BDNF pathway, represents a core component of the mode of action of pridopidine. Since the newly identified pathways are downregulated in neurodegenerative diseases, including HD, these findings suggest that pridopidine may exert neuroprotective effects beyond its role in alleviating some symptoms of HD.
Assuntos
Fator Neurotrófico Derivado do Encéfalo/biossíntese , Corpo Estriado/metabolismo , Doença de Huntington/tratamento farmacológico , Fármacos Neuroprotetores/administração & dosagem , Piperidinas/administração & dosagem , Animais , Fator Neurotrófico Derivado do Encéfalo/genética , Corpo Estriado/efeitos dos fármacos , Corpo Estriado/patologia , Modelos Animais de Doenças , Regulação da Expressão Gênica/genética , Genoma , Humanos , Doença de Huntington/genética , Doença de Huntington/patologia , Camundongos , Fármacos Neuroprotetores/metabolismo , Ratos , Receptores de Dopamina D5/biossíntese , Receptores de Dopamina D5/genética , Receptores de Glucocorticoides/biossíntese , Receptores de Glucocorticoides/genética , Transdução de Sinais/efeitos dos fármacosRESUMO
Endocrine disrupting chemicals (EDCs) mimic natural hormones and disrupt endocrine function. Humans and wildlife are exposed to EDCs might alter endocrine functions through various mechanisms and lead to an adverse effects. Hence, EDCs identification is important to protect the ecosystem and to promote the public health. Leveraging in-vitro and in-vivo experiments to identify potential EDCs is time consuming and expensive. Hence, quantitative structure-activity relationship is applied to screen the potential EDCs. Here, we summarize the predictive models developed using various algorithms to forecast the binding activity of chemicals to the estrogen and androgen receptors, alpha-fetoprotein, and sex hormone binding globulin.
Assuntos
Simulação por Computador , Disruptores Endócrinos/toxicidade , Poluentes Ambientais/toxicidade , Testes de Toxicidade/métodos , Algoritmos , Animais , Estrogênios , Humanos , Relação Quantitativa Estrutura-Atividade , Receptores Androgênicos , Receptores de Estrogênio , Globulina de Ligação a Hormônio Sexual , alfa-FetoproteínasRESUMO
The tri-nucleotide repeat expansion underlying Huntington disease (HD) results in corticostriatal synaptic dysfunction and subsequent neurodegeneration of striatal medium spiny neurons (MSNs). HD is a devastating autosomal dominant disease with no disease-modifying treatments. Pridopidine, a postulated "dopamine stabilizer", has been shown to improve motor symptoms in clinical trials of HD. However, the target(s) and mechanism of action of pridopidine remain to be fully elucidated. As binding studies identified sigma-1 receptor (S1R) as a high-affinity receptor for pridopidine, we evaluated the relevance of S1R as a therapeutic target of pridopidine in HD. S1R is an endoplasmic reticulum - (ER) resident transmembrane protein and is regulated by ER calcium homeostasis, which is perturbed in HD. Consistent with ER calcium dysregulation, we observed striatal upregulation of S1R in aged YAC128 transgenic HD mice and HD patients. We previously demonstrated that dendritic MSN spines are lost in aged corticostriatal co-cultures from YAC128 mice. We report here that pridopidine and the chemically similar S1R agonist 3-PPP prevent MSN spine loss in aging YAC128 co-cultures. Spine protection was blocked by neuronal deletion of S1R. Pridopidine treatment suppressed supranormal ER Ca2+ release, restored ER calcium levels and reduced excessive store-operated calcium (SOC) entry in spines, which may account for its synaptoprotective effects. Normalization of ER Ca2+ levels by pridopidine was prevented by S1R deletion. To evaluate long-term effects of pridopidine, we analyzed expression profiles of calcium signaling genes. Pridopidine elevated striatal expression of calbindin and homer1a, whereas their striatal expression was reduced in aged Q175KI and YAC128 HD mouse models compared to WT. Pridopidine and 3-PPP are proposed to prevent calcium dysregulation and synaptic loss in a YAC128 corticostriatal co-culture model of HD. The actions of pridopidine were mediated by S1R and led to normalization of ER Ca2+ release, ER Ca2+ levels and spine SOC entry in YAC128 MSNs. This is a new potential mechanism of action for pridopidine, highlighting S1R as a potential target for HD therapy. Upregulation of striatal proteins that regulate calcium, including calbindin and homer1a, upon chronic therapy with pridopidine, may further contribute to long-term beneficial effects of pridopidine in HD.
Assuntos
Doença de Huntington/tratamento farmacológico , Doença de Huntington/metabolismo , Fármacos Neuroprotetores/farmacologia , Piperidinas/farmacologia , Receptores sigma/metabolismo , Envelhecimento/efeitos dos fármacos , Envelhecimento/metabolismo , Animais , Calbindinas/metabolismo , Cálcio/metabolismo , Cátions Bivalentes/metabolismo , Técnicas de Cocultura , Corpo Estriado/efeitos dos fármacos , Corpo Estriado/metabolismo , Espinhas Dendríticas/efeitos dos fármacos , Espinhas Dendríticas/metabolismo , Modelos Animais de Doenças , Retículo Endoplasmático/efeitos dos fármacos , Retículo Endoplasmático/metabolismo , Humanos , Camundongos , Camundongos Transgênicos , Fármacos Neuroprotetores/química , Piperidinas/química , Ratos Endogâmicos SHR , Receptores sigma/genética , Sinapses/efeitos dos fármacos , Sinapses/metabolismo , Receptor Sigma-1RESUMO
RATIONALE: Despite shared environmental exposures, idiopathic pulmonary fibrosis (IPF) and chronic obstructive pulmonary disease are usually studied in isolation, and the presence of shared molecular mechanisms is unknown. OBJECTIVES: We applied an integrative genomic approach to identify convergent transcriptomic pathways in emphysema and IPF. METHODS: We defined the transcriptional repertoire of chronic obstructive pulmonary disease, IPF, or normal histology lungs using RNA-seq (n = 87). MEASUREMENTS AND MAIN RESULTS: Genes increased in both emphysema and IPF relative to control were enriched for the p53/hypoxia pathway, a finding confirmed in an independent cohort using both gene expression arrays and the nCounter Analysis System (n = 193). Immunohistochemistry confirmed overexpression of HIF1A, MDM2, and NFKBIB members of this pathway in tissues from patients with emphysema or IPF. Using reads aligned across splice junctions, we determined that alternative splicing of p53/hypoxia pathway-associated molecules NUMB and PDGFA occurred more frequently in IPF or emphysema compared with control and validated these findings by quantitative polymerase chain reaction and the nCounter Analysis System on an independent sample set (n = 193). Finally, by integrating parallel microRNA and mRNA-Seq data on the same samples, we identified MIR96 as a key novel regulatory hub in the p53/hypoxia gene-expression network and confirmed that modulation of MIR96 in vitro recapitulates the disease-associated gene-expression network. CONCLUSIONS: Our results suggest convergent transcriptional regulatory hubs in diseases as varied phenotypically as chronic obstructive pulmonary disease and IPF and suggest that these hubs may represent shared key responses of the lung to environmental stresses.
Assuntos
Redes Reguladoras de Genes/genética , Fibrose Pulmonar Idiopática/genética , Doença Pulmonar Obstrutiva Crônica/genética , Adulto , Enfisema/genética , Feminino , Humanos , Subunidade alfa do Fator 1 Induzível por Hipóxia/metabolismo , Proteínas I-kappa B/metabolismo , Masculino , Proteínas de Membrana/metabolismo , Pessoa de Meia-Idade , Proteínas do Tecido Nervoso/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Fator de Crescimento Derivado de Plaquetas/metabolismo , Proteínas Proto-Oncogênicas c-mdm2/metabolismoRESUMO
Text summarization is crucial in scientific research, drug discovery and development, regulatory review, and more. This task demands domain expertise, language proficiency, semantic prowess, and conceptual skill. The recent advent of large language models (LLMs), such as ChatGPT, offers unprecedented opportunities to automate this process. We compared ChatGPT-generated summaries with those produced by human experts using FDA drug labeling documents. The labeling contains summaries of key labeling sections, making them an ideal human benchmark to evaluate ChatGPT's summarization capabilities. Analyzing >14000 summaries, we observed that ChatGPT-generated summaries closely resembled those generated by human experts. Importantly, ChatGPT exhibited even greater similarity when summarizing drug safety information. These findings highlight ChatGPT's potential to accelerate work in critical areas, including drug safety.
Assuntos
Rotulagem de Medicamentos , United States Food and Drug Administration , Humanos , Estados Unidos , Processamento de Linguagem Natural , Efeitos Colaterais e Reações Adversas Relacionados a MedicamentosRESUMO
Accurately calling indels with next-generation sequencing (NGS) data is critical for clinical application. The precisionFDA team collaborated with the U.S. Food and Drug Administration's (FDA's) National Center for Toxicological Research (NCTR) and successfully completed the NCTR Indel Calling from Oncopanel Sequencing Data Challenge, to evaluate the performance of indel calling pipelines. Top performers were selected based on precision, recall, and F1-score. The performance of many other pipelines was close to the top performers, which produced a top cluster of performers. The performance was significantly higher in high confidence regions and coding regions, and significantly lower in low complexity regions. Oncopanel capture and other issues may have occurred that affected the recall rate. Indels with higher variant allele frequency (VAF) may generally be called with higher confidence. Many of the indel calling pipelines had good performance. Some of them performed generally well across all three oncopanels, while others were better for a specific oncopanel. The performance of indel calling can further be improved by restricting the calls within high confidence intervals (HCIs) and coding regions, and by excluding low complexity regions (LCR) regions. Certain VAF cut-offs could be applied according to the applications.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Next-generation sequencing (NGS) has revolutionized genomic research by enabling high-throughput, cost-effective genome and transcriptome sequencing accelerating personalized medicine for complex diseases, including cancer. Whole genome/transcriptome sequencing (WGS/WTS) provides comprehensive insights, while targeted sequencing is more cost-effective and sensitive. In comparison to short-read sequencing, which still dominates the field due to high speed and cost-effectiveness, long-read sequencing can overcome alignment limitations and better discriminate similar sequences from alternative transcripts or repetitive regions. Hybrid sequencing combines the best strengths of different technologies for a more comprehensive view of genomic/transcriptomic variations. Understanding each technology's strengths and limitations is critical for translating cutting-edge technologies into clinical applications. In this study, we sequenced DNA and RNA libraries of reference samples using various targeted DNA and RNA panels and the whole transcriptome on both short-read and long-read platforms. This study design enables a comprehensive analysis of sequencing technologies, targeting protocols, and library preparation methods. Our expanded profiling landscape establishes a reference point for assessing current sequencing technologies, facilitating informed decision-making in genomic research and precision medicine.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Humanos , RNA-Seq , Análise de Sequência de DNA/métodos , Transcriptoma , Análise de Sequência de RNA , Medicina de PrecisãoRESUMO
Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.
Assuntos
Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Biologia Computacional , Controle de Qualidade , Mutação INDEL , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The lack of suitable reference genomic material to enable a transparent cross-lab study of oncopanels inspired the SEQC2 Oncopanel Sequencing Working Group to develop four reference samples, sequenced with eight oncopanels at independent test laboratories with ultra-deep sequencing depth. This rich, publicly available dataset enabled performance assessment of the clinical applicability of oncopanels. In addition, this dataset present sample opportunities for developing specific and robust bioinformatics pipelines and fine-tuning parameters for more accurate variant calling, investigating ideal sequencing depth for variant calling of a given minimum VAF and variant type, and also recommending best use cases for Unique Molecular Identifier (UMI) technology.
Assuntos
DNA de Neoplasias , Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias , Benchmarking , Frequência do Gene , Humanos , Neoplasias/genética , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a paramount role in cancer target identification, oncology diagnostics, and personalized medicine. As part of the SEQC2 Consortium effort, the present study established and evaluated a consensus SV call set using a breast cancer reference cell line and matched normal control derived from the same donor, which were used in our companion benchmarking studies as reference samples. RESULTS: We systematically investigated somatic SVs in the reference cancer cell line by comparing to a matched normal cell line using multiple NGS platforms including Illumina short-read, 10X Genomics linked reads, PacBio long reads, Oxford Nanopore long reads, and high-throughput chromosome conformation capture (Hi-C). We established a consensus SV call set of a total of 1788 SVs including 717 deletions, 230 duplications, 551 insertions, 133 inversions, 146 translocations, and 11 breakends for the reference cancer cell line. To independently evaluate and cross-validate the accuracy of our consensus SV call set, we used orthogonal methods including PCR-based validation, Affymetrix arrays, Bionano optical mapping, and identification of fusion genes detected from RNA-seq. We evaluated the strengths and weaknesses of each NGS technology for SV determination, and our findings provide an actionable guide to improve cancer genome SV detection sensitivity and accuracy. CONCLUSIONS: A high-confidence consensus SV call set was established for the reference cancer cell line. A large subset of the variants identified was validated by multiple orthogonal methods.
Assuntos
Variações do Número de Cópias de DNA , Neoplasias , Humanos , Análise de Sequência de DNA/métodos , Variação Estrutural do Genoma , Tecnologia , Linhagem Celular , Sequenciamento de Nucleotídeos em Larga Escala , Genoma Humano , Neoplasias/genéticaRESUMO
BACKGROUND: Clinical laboratories routinely use formalin-fixed paraffin-embedded (FFPE) tissue or cell block cytology samples in oncology panel sequencing to identify mutations that can predict patient response to targeted therapy. To understand the technical error due to FFPE processing, a robustly characterized diploid cell line was used to create FFPE samples with four different pre-tissue processing formalin fixation times. A total of 96 FFPE sections were then distributed to different laboratories for targeted sequencing analysis by four oncopanels, and variants resulting from technical error were identified. RESULTS: Tissue sections that fail more frequently show low cellularity, lower than recommended library preparation DNA input, or target sequencing depth. Importantly, sections from block surfaces are more likely to show FFPE-specific errors, akin to "edge effects" seen in histology, while the inner samples display no quality degradation related to fixation time. CONCLUSIONS: To assure reliable results, we recommend avoiding the block surface portion and restricting mutation detection to genomic regions of high confidence.
Assuntos
Formaldeído , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Inclusão em Parafina , Análise de Sequência de DNA , Fixação de TecidosRESUMO
BACKGROUND: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
Assuntos
Genoma Humano , Polimorfismo de Nucleotídeo Único , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Reprodutibilidade dos Testes , Sequenciamento Completo do GenomaRESUMO
INTRODUCTION: Clitoral priapism due to venous outflow obstruction is a rare event and medical emergency. Androgen-induced clitoromegaly in transgender men has not been previously identified as a risk factor. AIMS: Advance current knowledge on identification and treatment of clitoral priapism in the transgender male. METHODS: A 32 year-old presurgical transgender male underwent gender-affirming laparoscopic total hysterectomy and bilateral salpingo-oöphorectomy without incident. Seven days postop, he developed progressive and painful clitoral engorgement that was persistent. Examination and imaging were consistent with clitoral priapism. RESULTS: Clitoral priapism was treated with adrenergic drugs (imipramine and pseudoephedrine) with rapid resolution of symptoms. CONCLUSION: Clitoral priapism is a rare phenomenon usually associated with use of a psychotropic medication. Whether clitoromegaly secondary to androgen administration in transgender men is a risk factor for this rare medical emergency is unknown. Prompt recognition and treatment is paramount. Kusko RE, Singhal E, Kauffman RP. Clitoral Priapism in a Transgender Male. Sex Med 2021;9:100431.
RESUMO
BACKGROUND: The concurrent growth of large-scale oncology data alongside the computational methods with which to analyze and model it has created a promising environment for revolutionizing cancer diagnosis, treatment, prevention, and drug discovery. Computational methods applied to large datasets have accelerated the drug discovery process by reducing bottlenecks and widening the search space beyond what is experimentally tractable. As the research community gains understanding of the myriad genetic underpinnings of cancer via sequencing, imaging, screens, and more that are ingested, transformed, and modeled by top open-source machine learning and artificial intelligence tools readily available, the next big drug candidate might seem merely an "Enter" key away. Of course, the reality is more convoluted, but still promising. SCOPE OF REVIEW: We present methods to approach the process of building an AI model, with strong emphasis on the aspects of model development we believe to be crucial to success but that are not commonly discussed: diligence in posing questions, identifying suitable datasets and curating them, and collaborating closely with biology and oncology experts while designing and evaluating the model. Digital pathology, Electronic Health Records, and other data types outside of high-throughput molecular data are reviewed well by others and outside of the scope of this review. This review emphasizes the importance of considering the limitations of the datasets, computational methods, and our minds when designing AI models. For example, datasets can be biased towards areas of research interest, funding, and particular patient populations. Neural networks may learn representations and correlations within the data that are grounded not in biological phenomena, but statistical anomalies erroneously extracted from the training data. Researchers may mis-interpret or over-interpret the output, or design and evaluate the training process such that the resultant model generalizes poorly. Fortunately, awareness of the strengths and limitations of applying data analytics and AI to drug discovery enables us to leverage them carefully and insightfully while maximizing their utility. These applications when performed in close collaboration with domain experts, together with continuous critical evaluation, generation of new data to minimize known blind spots as they are found, and rigorous experimental validation, increases the success rate of the study. We will discuss applications including AI-assisted target identification, drug repurposing, patient stratification, and gene prioritization. MAJOR CONCLUSIONS: Data analytics and AI have demonstrated capabilities to revolutionize cancer research, prevention, and treatment by maximizing our understanding and use of the expanding panoply of experimental data. However, to separate promise from true utility, computational tools must be carefully designed, critically evaluated, and constantly improved. Once that is achieved, a human-computer hybrid discovery process will outperform one driven by each alone. GENERAL SIGNIFICANCE: This review highlights the challenges and promise of synergizing predictive AI models with human expertise towards greater understanding of cancer.
Assuntos
Inteligência Artificial , Pesquisa Biomédica , Mineração de Dados , Bases de Dados Factuais , Oncologia , Animais , Confiabilidade dos Dados , Humanos , Aprendizado de MáquinaRESUMO
Cancer is the second leading cause of mortality worldwide despite tremendous advances in treatment. The promise of precision oncology depends on accurate characterization of tumor mutations and subsequent therapy selection. The lack of tumor reference samples along with the associated next generation sequencing (NGS) technical assessments has hindered the development of NGS assays and the realization of benefits for precision oncology. The summarized results and recommendations of several seminal SEQC2 studies along with a vision of the changing landscape of precision oncology and anticipated next steps by the SEQC2 consortium are reported. Importantly, these studies utilized a new robust reference sample material which was developed and constructed to support multiple DNA and RNA-based NGS assay studies. These studies focused on a wide variety of precision oncology assay scenarios and provided guidelines for standardized analyses and best practice recommendations. The evolving landscape of precision oncology requires insights into critical factors supporting the sensitivity and reproducibility of clinical NGS assays for continued improvement in patient outcomes. Persistent development of robust reference materials, quantitative performance metrics, and actionable data analysis recommendations are needed. This series of SEQC2 studies serve to advance NGS-based assays for precision oncology and support regulatory science endeavors.
RESUMO
Reproducibility is essential to open science, as there is limited relevance for findings that can not be reproduced by independent research groups, regardless of its validity. It is therefore crucial for scientists to describe their experiments in sufficient detail so they can be reproduced, scrutinized, challenged, and built upon. However, the intrinsic complexity and continuous growth of biomedical data makes it increasingly difficult to process, analyze, and share with the community in a FAIR (findable, accessible, interoperable, and reusable) manner. To overcome these issues, we created a cloud-based platform called ORCESTRA ( orcestra.ca ), which provides a flexible framework for the reproducible processing of multimodal biomedical data. It enables processing of clinical, genomic and perturbation profiles of cancer samples through automated processing pipelines that are user-customizable. ORCESTRA creates integrated and fully documented data objects with persistent identifiers (DOI) and manages multiple dataset versions, which can be shared for future studies.
RESUMO
The primary objective of the FDA-led Sequencing and Quality Control Phase 2 (SEQC2) project is to develop standard analysis protocols and quality control metrics for use in DNA testing to enhance scientific research and precision medicine. This study reports a targeted next-generation sequencing (NGS) method that will enable more accurate detection of actionable mutations in circulating tumor DNA (ctDNA) clinical specimens. To accomplish this, a synthetic internal standard spike-in was designed for each actionable mutation target, suitable for use in NGS following hybrid capture enrichment and unique molecular index (UMI) or non-UMI library preparation. When mixed with contrived ctDNA reference samples, internal standards enabled calculation of technical error rate, limit of blank, and limit of detection for each variant at each nucleotide position in each sample. True-positive mutations with variant allele fraction too low for detection by current practice were detected with this method, thereby increasing sensitivity.