Search | Nursing VHL Search Portal

1.

Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR.

Olson, Robert D; Assaf, Rida; Brettin, Thomas; Conrad, Neal; Cucinell, Clark; Davis, James J; Dempsey, Donald M; Dickerman, Allan; Dietrich, Emily M; Kenyon, Ronald W; Kuscuoglu, Mehmet; Lefkowitz, Elliot J; Lu, Jian; Machi, Dustin; Macken, Catherine; Mao, Chunhong; Niewiadomska, Anna; Nguyen, Marcus; Olsen, Gary J; Overbeek, Jamie C; Parrello, Bruce; Parrello, Victoria; Porter, Jacob S; Pusch, Gordon D; Shukla, Maulik; Singh, Indresh; Stewart, Lucy; Tan, Gene; Thomas, Chris; VanOeffelen, Margo; Vonstein, Veronika; Wallace, Zachary S; Warren, Andrew S; Wattam, Alice R; Xia, Fangfang; Yoo, Hyunseung; Zhang, Yun; Zmasek, Christian M; Scheuermann, Richard H; Stevens, Rick L.

Nucleic Acids Res ; 51(D1): D678-D689, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36350631

ABSTRACT

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.

Subject(s)

Genomics , Software , Viruses , Humans , Bacteria/genetics , Computational Biology , Databases, Genetic , Influenza, Human , Viruses/genetics

2.

A cross-study analysis of drug response prediction in cancer cell lines.

Xia, Fangfang; Allen, Jonathan; Balaprakash, Prasanna; Brettin, Thomas; Garcia-Cardona, Cristina; Clyde, Austin; Cohn, Judith; Doroshow, James; Duan, Xiaotian; Dubinkina, Veronika; Evrard, Yvonne; Fan, Ya Ju; Gans, Jason; He, Stewart; Lu, Pinyi; Maslov, Sergei; Partin, Alexander; Shukla, Maulik; Stahlberg, Eric; Wozniak, Justin M; Yoo, Hyunseung; Zaki, George; Zhu, Yitan; Stevens, Rick.

Brief Bioinform ; 23(1)2022 01 17.

Article in English | MEDLINE | ID: mdl-34524425

ABSTRACT

To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.

Subject(s)

Neoplasms , Algorithms , Cell Line , Humans , Machine Learning , Neoplasms/drug therapy , Neoplasms/genetics , Neural Networks, Computer

3.

A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes.

VanOeffelen, Margo; Nguyen, Marcus; Aytan-Aktug, Derya; Brettin, Thomas; Dietrich, Emily M; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Olson, Robert; Pusch, Gordon D; Shukla, Maulik; Stevens, Rick; Vonstein, Veronika; Warren, Andrew S; Wattam, Alice R; Yoo, Hyunseung; Davis, James J.

Brief Bioinform ; 22(6)2021 11 05.

Article in English | MEDLINE | ID: mdl-34379107

ABSTRACT

Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.

Subject(s)

Computational Biology/methods , Databases, Genetic , Drug Resistance, Microbial , Genomics/methods , Microbial Sensitivity Tests , Artificial Intelligence , Bacteria/drug effects , Bacteria/genetics , Genome, Bacterial , Humans , Laboratories , Machine Learning , Phenotype

4.

A communal catalogue reveals Earth's multiscale microbial diversity.

Thompson, Luke R; Sanders, Jon G; McDonald, Daniel; Amir, Amnon; Ladau, Joshua; Locey, Kenneth J; Prill, Robert J; Tripathi, Anupriya; Gibbons, Sean M; Ackermann, Gail; Navas-Molina, Jose A; Janssen, Stefan; Kopylova, Evguenia; Vázquez-Baeza, Yoshiki; González, Antonio; Morton, James T; Mirarab, Siavash; Zech Xu, Zhenjiang; Jiang, Lingjing; Haroon, Mohamed F; Kanbar, Jad; Zhu, Qiyun; Jin Song, Se; Kosciolek, Tomasz; Bokulich, Nicholas A; Lefler, Joshua; Brislawn, Colin J; Humphrey, Gregory; Owens, Sarah M; Hampton-Marcell, Jarrad; Berg-Lyons, Donna; McKenzie, Valerie; Fierer, Noah; Fuhrman, Jed A; Clauset, Aaron; Stevens, Rick L; Shade, Ashley; Pollard, Katherine S; Goodwin, Kelly D; Jansson, Janet K; Gilbert, Jack A; Knight, Rob.

Nature ; 551(7681): 457-463, 2017 11 23.

Article in English | MEDLINE | ID: mdl-29088705

ABSTRACT

Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.

Subject(s)

Biodiversity , Earth, Planet , Microbiota/genetics , Animals , Archaea/genetics , Archaea/isolation & purification , Bacteria/genetics , Bacteria/isolation & purification , Ecology/methods , Gene Dosage , Geographic Mapping , Humans , Plants/microbiology , RNA, Ribosomal, 16S/analysis , RNA, Ribosomal, 16S/genetics

5.

High-Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor.

Clyde, Austin; Galanie, Stephanie; Kneller, Daniel W; Ma, Heng; Babuji, Yadu; Blaiszik, Ben; Brace, Alexander; Brettin, Thomas; Chard, Kyle; Chard, Ryan; Coates, Leighton; Foster, Ian; Hauner, Darin; Kertesz, Vilmos; Kumar, Neeraj; Lee, Hyungro; Li, Zhuozhao; Merzky, Andre; Schmidt, Jurgen G; Tan, Li; Titov, Mikhail; Trifan, Anda; Turilli, Matteo; Van Dam, Hubertus; Chennubhotla, Srinivas C; Jha, Shantenu; Kovalevsky, Andrey; Ramanathan, Arvind; Head, Martha S; Stevens, Rick.

J Chem Inf Model ; 62(1): 116-128, 2022 01 10.

Article in English | MEDLINE | ID: mdl-34793155

ABSTRACT

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.

Subject(s)

COVID-19 , Protease Inhibitors , Antiviral Agents , Coronavirus 3C Proteases , Humans , Molecular Docking Simulation , Molecular Dynamics Simulation , Orotic Acid/analogs & derivatives , Piperazines , SARS-CoV-2

6.

The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities.

Davis, James J; Wattam, Alice R; Aziz, Ramy K; Brettin, Thomas; Butler, Ralph; Butler, Rory M; Chlenski, Philippe; Conrad, Neal; Dickerman, Allan; Dietrich, Emily M; Gabbard, Joseph L; Gerdes, Svetlana; Guard, Andrew; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Murphy-Olson, Dan; Nguyen, Marcus; Nordberg, Eric K; Olsen, Gary J; Olson, Robert D; Overbeek, Jamie C; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomas, Chris; VanOeffelen, Margo; Vonstein, Veronika; Warren, Andrew S; Xia, Fangfang; Xie, Dawen; Yoo, Hyunseung; Stevens, Rick.

Nucleic Acids Res ; 48(D1): D606-D612, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31667520

ABSTRACT

The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.

Subject(s)

Bacteria/genetics , Computational Biology/methods , Databases, Genetic , Algorithms , Animals , Caenorhabditis elegans/genetics , Chickens/genetics , Drosophila melanogaster/genetics , Host-Pathogen Interactions/genetics , Humans , Internet , Macaca mulatta/genetics , Metagenomics , Mice , National Institute of Allergy and Infectious Diseases (U.S.) , Phenotype , Phylogeny , Rats , Swine/genetics , United States , Zebrafish/genetics

7.

Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action.

Trifan, Anda; Gorgun, Defne; Salim, Michael; Li, Zongyi; Brace, Alexander; Zvyagin, Maxim; Ma, Heng; Clyde, Austin; Clark, David; Hardy, David J; Burnley, Tom; Huang, Lei; McCalpin, John; Emani, Murali; Yoo, Hyenseung; Yin, Junqi; Tsaris, Aristeidis; Subbiah, Vishal; Raza, Tanveer; Liu, Jessica; Trebesch, Noah; Wells, Geoffrey; Mysore, Venkatesh; Gibbs, Thomas; Phillips, James; Chennubhotla, S Chakra; Foster, Ian; Stevens, Rick; Anandkumar, Anima; Vishwanath, Venkatram; Stone, John E; Tajkhorshid, Emad; A Harris, Sarah; Ramanathan, Arvind.

Int J High Perform Comput Appl ; 36(5-6): 603-623, 2022 Nov.

Article in English | MEDLINE | ID: mdl-38464362

ABSTRACT

The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical compounds is a pathway to treating COVID-19. Conventional tools, e.g., cryo-electron microscopy and all-atom molecular dynamics (AAMD), do not provide sufficiently high resolution or timescale to capture important dynamics of this molecular machine. Consequently, we develop an innovative workflow that bridges the gap between these resolutions, using mesoscale fluctuating finite element analysis (FFEA) continuum simulations and a hierarchy of AI-methods that continually learn and infer features for maintaining consistency between AAMD and FFEA simulations. We leverage a multi-site distributed workflow manager to orchestrate AI, FFEA, and AAMD jobs, providing optimal resource utilization across HPC centers. Our study provides unprecedented access to study the SARS-CoV-2 RTC machinery, while providing general capability for AI-enabled multi-resolution simulations at scale.

8.

Learning curves for drug response prediction in cancer cell lines.

Partin, Alexander; Brettin, Thomas; Evrard, Yvonne A; Zhu, Yitan; Yoo, Hyunseung; Xia, Fangfang; Jiang, Songhao; Clyde, Austin; Shukla, Maulik; Fonstein, Michael; Doroshow, James H; Stevens, Rick L.

BMC Bioinformatics ; 22(1): 252, 2021 May 17.

Article in English | MEDLINE | ID: mdl-34001007

ABSTRACT

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.

Subject(s)

Neoplasms , Pharmaceutical Preparations , Cell Line , Learning Curve , Machine Learning , Neoplasms/drug therapy , Neoplasms/genetics , Prospective Studies

9.

Identifying genomic islands with deep neural networks.

Assaf, Rida; Xia, Fangfang; Stevens, Rick.

BMC Genomics ; 22(Suppl 3): 281, 2021 Jun 02.

Article in English | MEDLINE | ID: mdl-34078279

ABSTRACT

BACKGROUND: Horizontal gene transfer is the main source of adaptability for bacteria, through which genes are obtained from different sources including bacteria, archaea, viruses, and eukaryotes. This process promotes the rapid spread of genetic information across lineages, typically in the form of clusters of genes referred to as genomic islands (GIs). Different types of GIs exist, and are often classified by the content of their cargo genes or their means of integration and mobility. While various computational methods have been devised to detect different types of GIs, no single method is capable of detecting all types. RESULTS: We propose a method, which we call Shutter Island, that uses a deep learning model (Inception V3, widely used in computer vision) to detect genomic islands. The intrinsic value of deep learning methods lies in their ability to generalize. Via a technique called transfer learning, the model is pre-trained on a large generic dataset and then re-trained on images that we generate to represent genomic fragments. We demonstrate that this image-based approach generalizes better than the existing tools. CONCLUSIONS: We used a deep neural network and an image-based approach to detect the most out of the correct GI predictions made by other tools, in addition to making novel GI predictions. The fact that the deep neural network was re-trained on only a limited number of GI datasets and then successfully generalized indicates that this approach could be applied to other problems in the field where data is still lacking or hard to curate.

Subject(s)

Genomic Islands , Neural Networks, Computer , Eukaryota/genetics , Gene Transfer, Horizontal , Genomics

10.

PATRIC as a unique resource for studying antimicrobial resistance.

Antonopoulos, Dionysios A; Assaf, Rida; Aziz, Ramy Karam; Brettin, Thomas; Bun, Christopher; Conrad, Neal; Davis, James J; Dietrich, Emily M; Disz, Terry; Gerdes, Svetlana; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Murphy-Olson, Daniel E; Nordberg, Eric K; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Santerre, John; Shukla, Maulik; Stevens, Rick L; VanOeffelen, Margo; Vonstein, Veronika; Warren, Andrew S; Wattam, Alice R; Xia, Fangfang; Yoo, Hyunseung.

Brief Bioinform ; 20(4): 1094-1102, 2019 07 19.

Article in English | MEDLINE | ID: mdl-28968762

ABSTRACT

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.

Subject(s)

Computational Biology/methods , Databases, Genetic , Drug Resistance, Microbial/genetics , Systems Integration , Computational Biology/trends , Databases, Genetic/statistics & numerical data , Genome, Microbial , Humans , Internet , Molecular Sequence Annotation

11.

Exascale applications: skin in the game.

Alexander, Francis; Almgren, Ann; Bell, John; Bhattacharjee, Amitava; Chen, Jacqueline; Colella, Phil; Daniel, David; DeSlippe, Jack; Diachin, Lori; Draeger, Erik; Dubey, Anshu; Dunning, Thom; Evans, Thomas; Foster, Ian; Francois, Marianne; Germann, Tim; Gordon, Mark; Habib, Salman; Halappanavar, Mahantesh; Hamilton, Steven; Hart, William; Henry Huang, Zhenyu; Hungerford, Aimee; Kasen, Daniel; Kent, Paul R C; Kolev, Tzanio; Kothe, Douglas B; Kronfeld, Andreas; Luo, Ye; Mackenzie, Paul; McCallen, David; Messer, Bronson; Mniszewski, Sue; Oehmen, Chris; Perazzo, Amedeo; Perez, Danny; Richards, David; Rider, William J; Rieben, Rob; Roche, Kenneth; Siegel, Andrew; Sprague, Michael; Steefel, Carl; Stevens, Rick; Syamlal, Madhava; Taylor, Mark; Turner, John; Vay, Jean-Luc; Voter, Artur F; Windus, Theresa L.

Philos Trans A Math Phys Eng Sci ; 378(2166): 20190056, 2020 Mar 06.

Article in English | MEDLINE | ID: mdl-31955678

ABSTRACT

As noted in Wikipedia, skin in the game refers to having 'incurred risk by being involved in achieving a goal', where 'skin is a synecdoche for the person involved, and game is the metaphor for actions on the field of play under discussion'. For exascale applications under development in the US Department of Energy Exascale Computing Project, nothing could be more apt, with the skin being exascale applications and the game being delivering comprehensive science-based computational applications that effectively exploit exascale high-performance computing technologies to provide breakthrough modelling and simulation and data science solutions. These solutions will yield high-confidence insights and answers to the most critical problems and challenges for the USA in scientific discovery, national security, energy assurance, economic competitiveness and advanced healthcare. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.

12.

Applying Artificial Intelligence to Address the Knowledge Gaps in Cancer Care.

Simon, George; DiNardo, Courtney D; Takahashi, Koichi; Cascone, Tina; Powers, Cynthia; Stevens, Rick; Allen, Joshua; Antonoff, Mara B; Gomez, Daniel; Keane, Pat; Suarez Saiz, Fernando; Nguyen, Quynh; Roarty, Emily; Pierce, Sherry; Zhang, Jianjun; Hardeman Barnhill, Emily; Lakhani, Kate; Shaw, Kenna; Smith, Brett; Swisher, Stephen; High, Rob; Futreal, P Andrew; Heymach, John; Chin, Lynda.

Oncologist ; 24(6): 772-782, 2019 06.

Article in English | MEDLINE | ID: mdl-30446581

ABSTRACT

BACKGROUND: Rapid advances in science challenge the timely adoption of evidence-based care in community settings. To bridge the gap between what is possible and what is practiced, we researched approaches to developing an artificial intelligence (AI) application that can provide real-time patient-specific decision support. MATERIALS AND METHODS: The Oncology Expert Advisor (OEA) was designed to simulate peer-to-peer consultation with three core functions: patient history summarization, treatment options recommendation, and management advisory. Machine-learning algorithms were trained to construct a dynamic summary of patients cancer history and to suggest approved therapy or investigative trial options. All patient data used were retrospectively accrued. Ground truth was established for approximately 1,000 unique patients. The full Medline database of more than 23 million published abstracts was used as the literature corpus. RESULTS: OEA's accuracies of searching disparate sources within electronic medical records to extract complex clinical concepts from unstructured text documents varied, with F1 scores of 90%-96% for non-time-dependent concepts (e.g., diagnosis) and F1 scores of 63%-65% for time-dependent concepts (e.g., therapy history timeline). Based on constructed patient profiles, OEA suggests approved therapy options linked to supporting evidence (99.9% recall; 88% precision), and screens for eligible clinical trials on ClinicalTrials.gov (97.9% recall; 96.9% precision). CONCLUSION: Our results demonstrated technical feasibility of an AI-powered application to construct longitudinal patient profiles in context and to suggest evidence-based treatment and trial options. Our experience highlighted the necessity of collaboration across clinical and AI domains, and the requirement of clinical expertise throughout the process, from design to training to testing. IMPLICATIONS FOR PRACTICE: Artificial intelligence (AI)-powered digital advisors such as the Oncology Expert Advisor have the potential to augment the capacity and update the knowledge base of practicing oncologists. By constructing dynamic patient profiles from disparate data sources and organizing and vetting vast literature for relevance to a specific patient, such AI applications could empower oncologists to consider all therapy options based on the latest scientific evidence for their patients, and help them spend less time on information "hunting and gathering" and more time with the patients. However, realization of this will require not only AI technology maturation but also active participation and leadership by clincial experts.

Subject(s)

Artificial Intelligence , Decision Support Systems, Clinical , Evidence-Based Medicine/methods , Medical Oncology/methods , Neoplasms/diagnosis , Clinical Decision-Making/methods , Clinical Trials as Topic , Electronic Health Records/statistics & numerical data , Evidence-Based Medicine/statistics & numerical data , Feasibility Studies , Humans , Medical Oncology/statistics & numerical data , Neoplasms/therapy , Patient Selection

13.

Using Machine Learning To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella.

Nguyen, Marcus; Long, S Wesley; McDermott, Patrick F; Olsen, Randall J; Olson, Robert; Stevens, Rick L; Tyson, Gregory H; Zhao, Shaohua; Davis, James J.

J Clin Microbiol ; 57(2)2019 02.

Article in English | MEDLINE | ID: mdl-30333126

ABSTRACT

Nontyphoidal Salmonella species are the leading bacterial cause of foodborne disease in the United States. Whole-genome sequences and paired antimicrobial susceptibility data are available for Salmonella strains because of surveillance efforts from public health agencies. In this study, a collection of 5,278 nontyphoidal Salmonella genomes, collected over 15 years in the United States, was used to generate extreme gradient boosting (XGBoost)-based machine learning models for predicting MICs for 15 antibiotics. The MIC prediction models had an overall average accuracy of 95% within ±1 2-fold dilution step (confidence interval, 95% to 95%), an average very major error rate of 2.7% (confidence interval, 2.4% to 3.0%), and an average major error rate of 0.1% (confidence interval, 0.1% to 0.2%). The model predicted MICs with no a priori information about the underlying gene content or resistance phenotypes of the strains. By selecting diverse genomes for the training sets, we show that highly accurate MIC prediction models can be generated with less than 500 genomes. We also show that our approach for predicting MICs is stable over time, despite annual fluctuations in antimicrobial resistance gene content in the sampled genomes. Finally, using feature selection, we explore the important genomic regions identified by the models for predicting MICs. To date, this is one of the largest MIC modeling studies to be published. Our strategy for developing whole-genome sequence-based models for surveillance and clinical diagnostics can be readily applied to other important human pathogens.

Subject(s)

Drug Resistance, Bacterial , Genotyping Techniques/methods , Machine Learning , Microbial Sensitivity Tests/methods , Salmonella Infections/microbiology , Salmonella/drug effects , Salmonella/genetics , Foodborne Diseases/microbiology , Genome, Bacterial , Humans , Salmonella/isolation & purification , United States

14.

Artificial intelligence for drug response prediction in disease models.

Ballester, Pedro J; Stevens, Rick; Haibe-Kains, Benjamin; Huang, R Stephanie; Aittokallio, Tero.

Brief Bioinform ; 23(1)2022 01 17.

Article in English | MEDLINE | ID: mdl-34655289

Subject(s)

Artificial Intelligence

15.

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center.

Wattam, Alice R; Davis, James J; Assaf, Rida; Boisvert, Sébastien; Brettin, Thomas; Bun, Christopher; Conrad, Neal; Dietrich, Emily M; Disz, Terry; Gabbard, Joseph L; Gerdes, Svetlana; Henry, Christopher S; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K; Olsen, Gary J; Murphy-Olson, Daniel E; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Vonstein, Veronika; Warren, Andrew; Xia, Fangfang; Yoo, Hyunseung; Stevens, Rick L.

Nucleic Acids Res ; 45(D1): D535-D542, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27899627

ABSTRACT

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.

Subject(s)

Bacteria/genetics , Computational Biology/methods , Databases, Genetic , Genome, Bacterial , Genomics/methods , Anti-Bacterial Agents/pharmacology , Bacteria/drug effects , Bacteria/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Drug Resistance, Bacterial , Molecular Sequence Annotation , Proteome , Proteomics/methods , Software , Web Browser

16.

CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research.

Wozniak, Justin M; Jain, Rajeev; Balaprakash, Prasanna; Ozik, Jonathan; Collier, Nicholson T; Bauer, John; Xia, Fangfang; Brettin, Thomas; Stevens, Rick; Mohd-Yusof, Jamaludin; Cardona, Cristina Garcia; Essen, Brian Van; Baughman, Matthew.

BMC Bioinformatics ; 19(Suppl 18): 491, 2018 Dec 21.

Article in English | MEDLINE | ID: mdl-30577736

ABSTRACT

BACKGROUND: Current multi-petaflop supercomputers are powerful systems, but present challenges when faced with problems requiring large machine learning workflows. Complex algorithms running at system scale, often with different patterns that require disparate software packages and complex data flows cause difficulties in assembling and managing large experiments on these machines. RESULTS: This paper presents a workflow system that makes progress on scaling machine learning ensembles, specifically in this first release, ensembles of deep neural networks that address problems in cancer research across the atomistic, molecular and population scales. The initial release of the application framework that we call CANDLE/Supervisor addresses the problem of hyper-parameter exploration of deep neural networks. CONCLUSIONS: Initial results demonstrating CANDLE on DOE systems at ORNL, ANL and NERSC (Titan, Theta and Cori, respectively) demonstrate both scaling and multi-platform execution.

Subject(s)

Early Detection of Cancer/methods , Machine Learning/trends , Neoplasms/diagnosis , Humans , Neoplasms/pathology , Neural Networks, Computer , Workflow

17.

Predicting tumor cell line response to drug pairs with deep learning.

Xia, Fangfang; Shukla, Maulik; Brettin, Thomas; Garcia-Cardona, Cristina; Cohn, Judith; Allen, Jonathan E; Maslov, Sergei; Holbeck, Susan L; Doroshow, James H; Evrard, Yvonne A; Stahlberg, Eric A; Stevens, Rick L.

BMC Bioinformatics ; 19(Suppl 18): 486, 2018 Dec 21.

Article in English | MEDLINE | ID: mdl-30577754

ABSTRACT

BACKGROUND: The National Cancer Institute drug pair screening effort against 60 well-characterized human tumor cell lines (NCI-60) presents an unprecedented resource for modeling combinational drug activity. RESULTS: We present a computational model for predicting cell line response to a subset of drug pairs in the NCI-ALMANAC database. Based on residual neural networks for encoding features as well as predicting tumor growth, our model explains 94% of the response variance. While our best result is achieved with a combination of molecular feature types (gene expression, microRNA and proteome), we show that most of the predictive power comes from drug descriptors. To further demonstrate value in detecting anticancer therapy, we rank the drug pairs for each cell line based on model predicted combination effect and recover 80% of the top pairs with enhanced activity. CONCLUSIONS: We present promising results in applying deep learning to predicting combinational drug response. Our feature analysis indicates screening data involving more cell lines are needed for the models to make better use of molecular features.

Subject(s)

Deep Learning/trends , Drug Evaluation, Preclinical/methods , Cell Line, Tumor , Humans , National Cancer Institute (U.S.) , Neural Networks, Computer , United States

18.

Mutation in an Unannotated Protein Confers Carbapenem Resistance in Mycobacterium tuberculosis.

Kumar, Pankaj; Kaushik, Amit; Bell, Drew T; Chauhan, Varsha; Xia, Fangfang; Stevens, Rick L; Lamichhane, Gyanu.

Antimicrob Agents Chemother ; 61(3)2017 03.

Article in English | MEDLINE | ID: mdl-28069655

ABSTRACT

ß-Lactams are the most widely used antibacterials. Among ß-lactams, carbapenems are considered the last line of defense against recalcitrant infections. As recent developments have prompted consideration of carbapenems for treatment of drug-resistant tuberculosis, it is only a matter of time before Mycobacterium tuberculosis strains resistant to these drugs will emerge. In the present study, we investigated the genetic basis that confers such resistance. To our surprise, instead of mutations in the known ß-lactam targets, a single nucleotide polymorphism in the Rv2421c-Rv2422 intergenic region was common among M. tuberculosis mutants selected with meropenem or biapenem. We present data supporting the hypothesis that this locus harbors a previously unidentified gene that encodes a protein. This protein binds to ß-lactams, slowly hydrolyzes the chromogenic ß-lactam nitrocefin, and is inhibited by select penicillins and carbapenems and the ß-lactamase inhibitor clavulanate. The mutation results in a W62R substitution that reduces the protein's nitrocefin-hydrolyzing activity and binding affinities for carbapenems.

Subject(s)

Bacterial Proteins/genetics , DNA, Intergenic , Mutation , Mycobacterium tuberculosis/genetics , beta-Lactam Resistance/genetics , Amino Acid Sequence , Amino Acid Substitution , Anti-Bacterial Agents/pharmacology , Bacterial Proteins/metabolism , Base Sequence , Cephalosporins/metabolism , Cephalosporins/pharmacology , Clavulanic Acid/metabolism , Clavulanic Acid/pharmacology , Gene Expression , Genetic Loci , Humans , Meropenem , Microbial Sensitivity Tests , Mycobacterium tuberculosis/drug effects , Mycobacterium tuberculosis/isolation & purification , Mycobacterium tuberculosis/metabolism , Open Reading Frames , Protein Binding , Thienamycins/pharmacology , Tuberculosis, Multidrug-Resistant/microbiology

19.

Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains.

Kyrpides, Nikos C; Hugenholtz, Philip; Eisen, Jonathan A; Woyke, Tanja; Göker, Markus; Parker, Charles T; Amann, Rudolf; Beck, Brian J; Chain, Patrick S G; Chun, Jongsik; Colwell, Rita R; Danchin, Antoine; Dawyndt, Peter; Dedeurwaerdere, Tom; DeLong, Edward F; Detter, John C; De Vos, Paul; Donohue, Timothy J; Dong, Xiu-Zhu; Ehrlich, Dusko S; Fraser, Claire; Gibbs, Richard; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Jansson, Janet K; Keasling, Jay D; Knight, Rob; Labeda, David; Lapidus, Alla; Lee, Jung-Sook; Li, Wen-Jun; Ma, Juncai; Markowitz, Victor; Moore, Edward R B; Morrison, Mark; Meyer, Folker; Nelson, Karen E; Ohkuma, Moriya; Ouzounis, Christos A; Pace, Norman; Parkhill, Julian; Qin, Nan; Rossello-Mora, Ramon; Sikorski, Johannes; Smith, David; Sogin, Mitch; Stevens, Rick; Stingl, Uli; Suzuki, Ken-Ichiro.

PLoS Biol ; 12(8): e1001920, 2014 Aug.

Article in English | MEDLINE | ID: mdl-25093819

ABSTRACT

Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currentlyâ¼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.

Subject(s)

Genome, Archaeal/genetics , Genome, Bacterial/genetics , Genomics , Sequence Analysis, DNA , Archaea/classification , Archaea/genetics , Bacteria/classification , Bacteria/genetics , Databases, Genetic , Phylogeny

20.

High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.

Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S.

Proc Natl Acad Sci U S A ; 111(26): 9645-50, 2014 Jul 01.

Article in English | MEDLINE | ID: mdl-24927599

ABSTRACT

The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

Subject(s)

Computational Biology/methods , Databases, Genetic , Genome, Plant/genetics , High-Throughput Nucleotide Sequencing/methods , Molecular Sequence Annotation/methods , Plants/genetics , Software , Metabolic Networks and Pathways/genetics , Models, Biological , Plants/metabolism , Systems Biology/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL