Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 932
Filter
Add more filters

Publication year range
1.
Cell ; 185(1): 1-3, 2022 01 06.
Article in English | MEDLINE | ID: mdl-34995512

ABSTRACT

Psychiatric disease is one of the greatest health challenges of our time. The pipeline for conceptually novel therapeutics remains low, in part because uncovering the biological mechanisms of psychiatric disease has been difficult. We asked experts researching different aspects of psychiatric disease: what do you see as the major urgent questions that need to be addressed? Where are the next frontiers, and what are the current hurdles to understanding the biological basis of psychiatric disease?


Subject(s)
Antidepressive Agents/therapeutic use , Data Science/methods , Depression/drug therapy , Depression/metabolism , Depressive Disorder/drug therapy , Depressive Disorder/metabolism , Genomics/methods , Precision Medicine/methods , Translational Research, Biomedical/methods , Animals , Depression/genetics , Depressive Disorder/genetics , Humans , Neurons/metabolism , Prefrontal Cortex/metabolism , Treatment Outcome
2.
Cell ; 181(6): 1189-1193, 2020 06 11.
Article in English | MEDLINE | ID: mdl-32442404

ABSTRACT

Researchers around the globe have been mounting, accelerating, and redeploying efforts across disciplines and organizations to tackle the SARS-CoV-2 outbreak. However, humankind continues to be afflicted by numerous other devastating diseases in increasing numbers. Here, we outline considerations and opportunities toward striking a good balance between maintaining and redefining research priorities.


Subject(s)
Biomedical Research , Coronavirus Infections , Pandemics , Pneumonia, Viral , Biomedical Research/economics , COVID-19 , Cardiovascular Diseases/diagnosis , Cardiovascular Diseases/drug therapy , Cardiovascular Diseases/prevention & control , Coronavirus Infections/diagnosis , Coronavirus Infections/drug therapy , Coronavirus Infections/prevention & control , Data Science/instrumentation , Data Science/methods , Delivery of Health Care , Humans , Inventions , Metabolic Diseases/diagnosis , Metabolic Diseases/drug therapy , Metabolic Diseases/prevention & control , Neoplasms/diagnosis , Neoplasms/drug therapy , Neoplasms/prevention & control , Pandemics/prevention & control , Pneumonia, Viral/diagnosis , Pneumonia, Viral/drug therapy , Pneumonia, Viral/prevention & control , Research
3.
Nature ; 604(7907): 635-642, 2022 04.
Article in English | MEDLINE | ID: mdl-35478233

ABSTRACT

The prosperity and lifestyle of our society are very much governed by achievements in condensed matter physics, chemistry and materials science, because new products for sectors such as energy, the environment, health, mobility and information technology (IT) rely largely on improved or even new materials. Examples include solid-state lighting, touchscreens, batteries, implants, drug delivery and many more. The enormous amount of research data produced every day in these fields represents a gold mine of the twenty-first century. This gold mine is, however, of little value if these data are not comprehensively characterized and made available. How can we refine this feedstock; that is, turn data into knowledge and value? For this, a FAIR (findable, accessible, interoperable and reusable) data infrastructure is a must. Only then can data be readily shared and explored using data analytics and artificial intelligence (AI) methods. Making data 'findable and AI ready' (a forward-looking interpretation of the acronym) will change the way in which science is carried out today. In this Perspective, we discuss how we can prepare to make this happen for the field of materials science.


Subject(s)
Artificial Intelligence , Data Science
4.
Nature ; 595(7866): 181-188, 2021 07.
Article in English | MEDLINE | ID: mdl-34194044

ABSTRACT

Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions-the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes-and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.


Subject(s)
Computer Simulation , Data Science/methods , Forecasting/methods , Models, Theoretical , Social Sciences/methods , Goals , Humans
5.
Trends Genet ; 39(11): 803-807, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37714735

ABSTRACT

To accelerate the impact of African genomics on human health, data science skills and awareness of Africa's rich genetic diversity must be strengthened globally. We describe the first African genomics data science workshop, implemented by the African Society of Human Genetics (AfSHG) and international partners, providing a framework for future workshops.


Subject(s)
Data Science , Genomics , Humans , Human Genetics
6.
Am J Hum Genet ; 110(9): 1522-1533, 2023 09 07.
Article in English | MEDLINE | ID: mdl-37607538

ABSTRACT

Population-scale biobanks linked to electronic health record data provide vast opportunities to extend our knowledge of human genetics and discover new phenotype-genotype associations. Given their dense phenotype data, biobanks can also facilitate replication studies on a phenome-wide scale. Here, we introduce the phenotype-genotype reference map (PGRM), a set of 5,879 genetic associations from 523 GWAS publications that can be used for high-throughput replication experiments. PGRM phenotypes are standardized as phecodes, ensuring interoperability between biobanks. We applied the PGRM to five ancestry-specific cohorts from four independent biobanks and found evidence of robust replications across a wide array of phenotypes. We show how the PGRM can be used to detect data corruption and to empirically assess parameters for phenome-wide studies. Finally, we use the PGRM to explore factors associated with replicability of GWAS results.


Subject(s)
Biological Specimen Banks , Data Science , Humans , Phenomics , Phenotype , Genotype
7.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38493340

ABSTRACT

Translational bioinformatics and data science play a crucial role in biomarker discovery as it enables translational research and helps to bridge the gap between the bench research and the bedside clinical applications. Thanks to newer and faster molecular profiling technologies and reducing costs, there are many opportunities for researchers to explore the molecular and physiological mechanisms of diseases. Biomarker discovery enables researchers to better characterize patients, enables early detection and intervention/prevention and predicts treatment responses. Due to increasing prevalence and rising treatment costs, mental health (MH) disorders have become an important venue for biomarker discovery with the goal of improved patient diagnostics, treatment and care. Exploration of underlying biological mechanisms is the key to the understanding of pathogenesis and pathophysiology of MH disorders. In an effort to better understand the underlying mechanisms of MH disorders, we reviewed the major accomplishments in the MH space from a bioinformatics and data science perspective, summarized existing knowledge derived from molecular and cellular data and described challenges and areas of opportunities in this space.


Subject(s)
Biomedical Research , Mental Health , Humans , Data Science , Computational Biology , Biomarkers
8.
Nature ; 582(7810): 84-88, 2020 06.
Article in English | MEDLINE | ID: mdl-32483374

ABSTRACT

Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2-5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.


Subject(s)
Data Analysis , Data Science/methods , Data Science/standards , Datasets as Topic , Functional Neuroimaging , Magnetic Resonance Imaging , Research Personnel/organization & administration , Brain/diagnostic imaging , Brain/physiology , Datasets as Topic/statistics & numerical data , Female , Humans , Logistic Models , Male , Meta-Analysis as Topic , Models, Neurological , Reproducibility of Results , Research Personnel/standards , Software
9.
Bioinformatics ; 40(2)2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38402507

ABSTRACT

MOTIVATION: Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments. RESULTS: Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, NumPy and Pandas. The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features. AVAILABILITY AND IMPLEMENTATION: Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python Package Index. The source code is maintained by Open2C on GitHub at https://github.com/open2c/bioframe.


Subject(s)
Computational Biology , Genomics , Gene Library , Binding Sites , Data Science
10.
J Am Chem Soc ; 146(12): 8536-8546, 2024 03 27.
Article in English | MEDLINE | ID: mdl-38480482

ABSTRACT

Methods to access chiral sulfur(VI) pharmacophores are of interest in medicinal and synthetic chemistry. We report the desymmetrization of unprotected sulfonimidamides via asymmetric acylation with a cinchona-phosphinate catalyst. The desired products are formed in excellent yield and enantioselectivity with no observed bis-acylation. A data-science-driven approach to substrate scope evaluation was coupled to high throughput experimentation (HTE) to facilitate statistical modeling in order to inform mechanistic studies. Reaction kinetics, catalyst structural studies, and density functional theory (DFT) transition state analysis elucidated the turnover-limiting step to be the collapse of the tetrahedral intermediate and provided key insights into the catalyst-substrate structure-activity relationships responsible for the origin of the enantioselectivity. This study offers a reliable method for accessing enantioenriched sulfonimidamides to propel their application as pharmacophores and serves as an example of the mechanistic insight that can be gleaned from integrating data science and traditional physical organic techniques.


Subject(s)
Cinchona Alkaloids , Data Science , Molecular Structure , Stereoisomerism , Cinchona Alkaloids/chemistry , Catalysis , Acylation
11.
Bioinformatics ; 39(9)2023 09 02.
Article in English | MEDLINE | ID: mdl-37669147

ABSTRACT

SUMMARY: We present PyDESeq2, a python implementation of the DESeq2 workflow for differential expression analysis on bulk RNA-seq data. This re-implementation yields similar, but not identical, results: it achieves higher model likelihood, allows speed improvements on large datasets, as shown in experiments on TCGA data, and can be more easily interfaced with modern python-based data science tools. AVAILABILITY AND IMPLEMENTATION: PyDESeq2 is released as an open-source software under the MIT license. The source code is available on GitHub at https://github.com/owkin/PyDESeq2 and documented at https://pydeseq2.readthedocs.io. PyDESeq2 is part of the scverse ecosystem.


Subject(s)
Data Science , Ecosystem , RNA-Seq , Probability , Software
12.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-38070155

ABSTRACT

MOTIVATION: Target discovery and drug evaluation for diseases with complex mechanisms call for a streamlined chemical systems analysis platform. Currently available tools lack the emphasis on reaction kinetics, access to relevant databases, and algorithms to visualize perturbations on a chemical scale providing quantitative details as well streamlined visual data analytics functionality. RESULTS: CytoCopasi, a Maven-based application for Cytoscape that combines the chemical systems analysis features of COPASI with the visualization and database access tools of Cytoscape and its plugin applications has been developed. The diverse functionality of CytoCopasi through ab initio model construction, model construction via pathway and parameter databases KEGG and BRENDA is presented. The comparative systems biology visualization analysis toolset is illustrated through a drug competence study on the cancerous RAF/MEK/ERK pathway. AVAILABILITY AND IMPLEMENTATION: The COPASI files, simulation data, native libraries, and the manual are available on https://github.com/scientificomputing/CytoCopasi.


Subject(s)
Data Science , Software , Algorithms , Computer Simulation , Systems Biology
13.
Cytotherapy ; 26(9): 967-979, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38842968

ABSTRACT

Although several cell-based therapies have received FDA approval, and others are showing promising results, scalable, and quality-driven reproducible manufacturing of therapeutic cells at a lower cost remains challenging. Challenges include starting material and patient variability, limited understanding of manufacturing process parameter effects on quality, complex supply chain logistics, and lack of predictive, well-understood product quality attributes. These issues can manifest as increased production costs, longer production times, greater batch-to-batch variability, and lower overall yield of viable, high-quality cells. The lack of data-driven insights and decision-making in cell manufacturing and delivery is an underlying commonality behind all these problems. Data collection and analytics from discovery, preclinical and clinical research, process development, and product manufacturing have not been sufficiently utilized to develop a "systems" understanding and identify actionable controls. Experience from other industries shows that data science and analytics can drive technological innovations and manufacturing optimization, leading to improved consistency, reduced risk, and lower cost. The cell therapy manufacturing industry will benefit from implementing data science tools, such as data-driven modeling, data management and mining, AI, and machine learning. The integration of data-driven predictive capabilities into cell therapy manufacturing, such as predicting product quality and clinical outcomes based on manufacturing data, or ensuring robustness and reliability using data-driven supply-chain modeling could enable more precise and efficient production processes and lead to better patient access and outcomes. In this review, we introduce some of the relevant computational and data science tools and how they are being or can be implemented in the cell therapy manufacturing workflow. We also identify areas where innovative approaches are required to address challenges and opportunities specific to the cell therapy industry. We conclude that interfacing data science throughout a cell therapy product lifecycle, developing data-driven manufacturing workflow, designing better data collection tools and algorithms, using data analytics and AI-based methods to better understand critical quality attributes and critical-process parameters, and training the appropriate workforce will be critical for overcoming current industry and regulatory barriers and accelerating clinical translation.


Subject(s)
Cell- and Tissue-Based Therapy , Data Science , Humans , Cell- and Tissue-Based Therapy/methods , Data Science/methods
14.
Am J Med Genet A ; 194(5): e63505, 2024 05.
Article in English | MEDLINE | ID: mdl-38168469

ABSTRACT

Data science methodologies can be utilized to ascertain and analyze clinical genetic data that is often unstructured and rarely used outside of patient encounters. Genetic variants from all genetic testing resulting to a large pediatric healthcare system for a 5-year period were obtained and reinterpreted utilizing the previously validated Franklin© Artificial Intelligence (AI). Using PowerBI©, the data were further matched to patients in the electronic healthcare record to associate with demographic data to generate a variant data table and mapped by ZIP codes. Three thousand and sixty-five variants were identified and 98% were matched to patients with geographic data. Franklin© changed the interpretation for 24% of variants. One hundred and fifty-six clinically actionable variant reinterpretations were made. A total of 739 Mendelian genetic disorders were identified with disorder prevalence estimation. Mapping of variants demonstrated hot-spots for pathogenic genetic variation such as PEX6-associated Zellweger Spectrum Disorder. Seven patients were identified with Bardet-Biedl syndrome and seven patients with Rett syndrome amenable to newly FDA-approved therapeutics. Utilizing readily available software we developed a database and Exploratory Data Analysis (EDA) methodology enabling us to systematically reinterpret variants, estimate variant prevalence, identify conditions amenable to new treatments, and localize geographies enriched for pathogenic variants.


Subject(s)
Artificial Intelligence , Data Science , Humans , Child , Prevalence , Genetic Testing/methods , ATPases Associated with Diverse Cellular Activities
15.
PLoS Biol ; 19(9): e3001398, 2021 09.
Article in English | MEDLINE | ID: mdl-34555021

ABSTRACT

Hypothesis generation in observational, biomedical data science often starts with computing an association or identifying the statistical relationship between a dependent and an independent variable. However, the outcome of this process depends fundamentally on modeling strategy, with differing strategies generating what can be called "vibration of effects" (VoE). VoE is defined by variation in associations that often lead to contradictory results. Here, we present a computational tool capable of modeling VoE in biomedical data by fitting millions of different models and comparing their output. We execute a VoE analysis on a series of widely reported associations (e.g., carrot intake associated with eyesight) with an extended additional focus on lifestyle exposures (e.g., physical activity) and components of the Framingham Risk Score for cardiovascular health (e.g., blood pressure). We leveraged our tool for potential confounder identification, investigating what adjusting variables are responsible for conflicting models. We propose modeling VoE as a critical step in navigating discovery in observational data, discerning robust associations, and cataloging adjusting variables that impact model output.


Subject(s)
Data Science/methods , Models, Statistical , Observational Studies as Topic/statistics & numerical data , Epidemiologic Methods , Humans
16.
PLoS Biol ; 19(3): e3001165, 2021 03.
Article in English | MEDLINE | ID: mdl-33735179

ABSTRACT

Why would a computational biologist with 40 years of research experience say bioinformatics is dead? The short answer is, in being the Founding Dean of a new School of Data Science, what we do suddenly looks different.


Subject(s)
Computational Biology/methods , Computational Biology/trends , Data Science/trends , Computational Biology/education , Curriculum , Data Science/methods , Humans , Information Dissemination/methods , Schools , Students
17.
Curr HIV/AIDS Rep ; 21(4): 208-219, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38916675

ABSTRACT

PURPOSE OF REVIEW: Big Data Science can be used to pragmatically guide the allocation of resources within the context of national HIV programs and inform priorities for intervention. In this review, we discuss the importance of grounding Big Data Science in the principles of equity and social justice to optimize the efficiency and effectiveness of the global HIV response. RECENT FINDINGS: Social, ethical, and legal considerations of Big Data Science have been identified in the context of HIV research. However, efforts to mitigate these challenges have been limited. Consequences include disciplinary silos within the field of HIV, a lack of meaningful engagement and ownership with and by communities, and potential misinterpretation or misappropriation of analyses that could further exacerbate health inequities. Big Data Science can support the HIV response by helping to identify gaps in previously undiscovered or understudied pathways to HIV acquisition and onward transmission, including the consequences for health outcomes and associated comorbidities. However, in the absence of a guiding framework for equity, alongside meaningful collaboration with communities through balanced partnerships, a reliance on big data could continue to reinforce inequities within and across marginalized populations.


Subject(s)
Big Data , Data Science , HIV Infections , Humans , HIV Infections/epidemiology , HIV Infections/prevention & control , Health Inequities , Social Justice
18.
PLoS Comput Biol ; 19(6): e1011160, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37289659

ABSTRACT

Much guidance on statistical training in STEM fields has been focused largely on the undergraduate cohort, with graduate education often being absent from the equation. Training in quantitative methods and reasoning is critical for graduate students in biomedical and science programs to foster reproducible and responsible research practices. We argue that graduate student education should more center around fundamental reasoning and integration skills rather than mainly on listing 1 statistical test method after the other without conveying the bigger context picture or critical argumentation skills that will enable student to improve research integrity through rigorous practice. Herein, we describe the approach we take in a quantitative reasoning course in the R3 program at the Johns Hopkins Bloomberg School of Public Health, with an error-focused lens, based on visualization and communication competencies. Specifically, we take this perspective stemming from the discussed causes of irreproducibility and apply it specifically to the many aspects of good statistical practice in science, ranging from experimental design to data collection and analysis, and conclusions drawn from the data. We also provide tips and guidelines for the implementation and adaptation of our course material to various graduate biomedical and STEM science programs.


Subject(s)
Data Science , Students , Humans , Curriculum , Philosophy , Communication , Teaching
19.
Chem Rev ; 122(16): 13478-13515, 2022 08 24.
Article in English | MEDLINE | ID: mdl-35862246

ABSTRACT

Electrocatalysts and photocatalysts are key to a sustainable future, generating clean fuels, reducing the impact of global warming, and providing solutions to environmental pollution. Improved processes for catalyst design and a better understanding of electro/photocatalytic processes are essential for improving catalyst effectiveness. Recent advances in data science and artificial intelligence have great potential to accelerate electrocatalysis and photocatalysis research, particularly the rapid exploration of large materials chemistry spaces through machine learning. Here a comprehensive introduction to, and critical review of, machine learning techniques used in electrocatalysis and photocatalysis research are provided. Sources of electro/photocatalyst data and current approaches to representing these materials by mathematical features are described, the most commonly used machine learning methods summarized, and the quality and utility of electro/photocatalyst models evaluated. Illustrations of how machine learning models are applied to novel electro/photocatalyst discovery and used to elucidate electrocatalytic or photocatalytic reaction mechanisms are provided. The review offers a guide for materials scientists on the selection of machine learning methods for electrocatalysis and photocatalysis research. The application of machine learning to catalysis science represents a paradigm shift in the way advanced, next-generation catalysts will be designed and synthesized.


Subject(s)
Artificial Intelligence , Machine Learning , Catalysis , Data Science
20.
Environ Sci Technol ; 58(15): 6457-6474, 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38568682

ABSTRACT

The circular economy (CE) aims to decouple the growth of the economy from the consumption of finite resources through strategies, such as eliminating waste, circulating materials in use, and regenerating natural systems. Due to the rapid development of data science (DS), promising progress has been made in the transition toward CE in the past decade. DS offers various methods to achieve accurate predictions, accelerate product sustainable design, prolong asset life, optimize the infrastructure needed to circulate materials, and provide evidence-based insights. Despite the exciting scientific advances in this field, there still lacks a comprehensive review on this topic to summarize past achievements, synthesize knowledge gained, and navigate future research directions. In this paper, we try to summarize how DS accelerated the transition to CE. We conducted a critical review of where and how DS has helped the CE transition with a focus on four areas including (1) characterizing socioeconomic metabolism, (2) reducing unnecessary waste generation by enhancing material efficiency and optimizing product design, (3) extending product lifetime through repair, and (4) facilitating waste reuse and recycling. We also introduced the limitations and challenges in the current applications and discussed opportunities to provide a clear roadmap for future research in this field.


Subject(s)
Data Science , Waste Management , Recycling
SELECTION OF CITATIONS
SEARCH DETAIL