RESUMO
Psychiatric disease is one of the greatest health challenges of our time. The pipeline for conceptually novel therapeutics remains low, in part because uncovering the biological mechanisms of psychiatric disease has been difficult. We asked experts researching different aspects of psychiatric disease: what do you see as the major urgent questions that need to be addressed? Where are the next frontiers, and what are the current hurdles to understanding the biological basis of psychiatric disease?
Assuntos
Antidepressivos/uso terapêutico , Ciência de Dados/métodos , Depressão/tratamento farmacológico , Depressão/metabolismo , Transtorno Depressivo/tratamento farmacológico , Transtorno Depressivo/metabolismo , Genômica/métodos , Medicina de Precisão/métodos , Pesquisa Translacional Biomédica/métodos , Animais , Depressão/genética , Transtorno Depressivo/genética , Humanos , Neurônios/metabolismo , Córtex Pré-Frontal/metabolismo , Resultado do TratamentoRESUMO
Researchers around the globe have been mounting, accelerating, and redeploying efforts across disciplines and organizations to tackle the SARS-CoV-2 outbreak. However, humankind continues to be afflicted by numerous other devastating diseases in increasing numbers. Here, we outline considerations and opportunities toward striking a good balance between maintaining and redefining research priorities.
Assuntos
Pesquisa Biomédica , Infecções por Coronavirus , Pandemias , Pneumonia Viral , Pesquisa Biomédica/economia , COVID-19 , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/tratamento farmacológico , Doenças Cardiovasculares/prevenção & controle , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/tratamento farmacológico , Infecções por Coronavirus/prevenção & controle , Ciência de Dados/instrumentação , Ciência de Dados/métodos , Atenção à Saúde , Humanos , Invenções , Doenças Metabólicas/diagnóstico , Doenças Metabólicas/tratamento farmacológico , Doenças Metabólicas/prevenção & controle , Neoplasias/diagnóstico , Neoplasias/tratamento farmacológico , Neoplasias/prevenção & controle , Pandemias/prevenção & controle , Pneumonia Viral/diagnóstico , Pneumonia Viral/tratamento farmacológico , Pneumonia Viral/prevenção & controle , PesquisaRESUMO
Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions-the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes-and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.
Assuntos
Simulação por Computador , Ciência de Dados/métodos , Previsões/métodos , Modelos Teóricos , Ciências Sociais/métodos , Objetivos , HumanosRESUMO
Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2-5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.
Assuntos
Análise de Dados , Ciência de Dados/métodos , Ciência de Dados/normas , Conjuntos de Dados como Assunto , Neuroimagem Funcional , Imageamento por Ressonância Magnética , Pesquisadores/organização & administração , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Conjuntos de Dados como Assunto/estatística & dados numéricos , Feminino , Humanos , Modelos Logísticos , Masculino , Metanálise como Assunto , Modelos Neurológicos , Reprodutibilidade dos Testes , Pesquisadores/normas , SoftwareRESUMO
Neuroscience research has evolved to generate increasingly large and complex experimental data sets, and advanced data science tools are taking on central roles in neuroscience research. Neurodata Without Borders (NWB), a standard language for neurophysiology data, has recently emerged as a powerful solution for data management, analysis, and sharing. We here discuss our labs' efforts to implement NWB data science pipelines. We describe general principles and specific use cases that illustrate successes, challenges, and non-trivial decisions in software engineering. We hope that our experience can provide guidance for the neuroscience community and help bridge the gap between experimental neuroscience and data science. Key takeaways from this article are that (1) standardization with NWB requires non-trivial design choices; (2) the general practice of standardization in the lab promotes data awareness and literacy, and improves transparency, rigor, and reproducibility in our science; (3) we offer several feature suggestions to ease the extensibility, publishing/sharing, and usability for NWB standard and users of NWB data.
Assuntos
Neurociências , Animais , Humanos , Ciência de Dados/métodos , Ciência de Dados/normas , Disseminação de Informação/métodos , Neurociências/normas , Neurociências/métodos , Software/normasRESUMO
Although several cell-based therapies have received FDA approval, and others are showing promising results, scalable, and quality-driven reproducible manufacturing of therapeutic cells at a lower cost remains challenging. Challenges include starting material and patient variability, limited understanding of manufacturing process parameter effects on quality, complex supply chain logistics, and lack of predictive, well-understood product quality attributes. These issues can manifest as increased production costs, longer production times, greater batch-to-batch variability, and lower overall yield of viable, high-quality cells. The lack of data-driven insights and decision-making in cell manufacturing and delivery is an underlying commonality behind all these problems. Data collection and analytics from discovery, preclinical and clinical research, process development, and product manufacturing have not been sufficiently utilized to develop a "systems" understanding and identify actionable controls. Experience from other industries shows that data science and analytics can drive technological innovations and manufacturing optimization, leading to improved consistency, reduced risk, and lower cost. The cell therapy manufacturing industry will benefit from implementing data science tools, such as data-driven modeling, data management and mining, AI, and machine learning. The integration of data-driven predictive capabilities into cell therapy manufacturing, such as predicting product quality and clinical outcomes based on manufacturing data, or ensuring robustness and reliability using data-driven supply-chain modeling could enable more precise and efficient production processes and lead to better patient access and outcomes. In this review, we introduce some of the relevant computational and data science tools and how they are being or can be implemented in the cell therapy manufacturing workflow. We also identify areas where innovative approaches are required to address challenges and opportunities specific to the cell therapy industry. We conclude that interfacing data science throughout a cell therapy product lifecycle, developing data-driven manufacturing workflow, designing better data collection tools and algorithms, using data analytics and AI-based methods to better understand critical quality attributes and critical-process parameters, and training the appropriate workforce will be critical for overcoming current industry and regulatory barriers and accelerating clinical translation.
Assuntos
Terapia Baseada em Transplante de Células e Tecidos , Ciência de Dados , Humanos , Terapia Baseada em Transplante de Células e Tecidos/métodos , Ciência de Dados/métodosRESUMO
PURPOSE OF REVIEW: Health data sciences can help mitigate high burden of cardiovascular disease (CVD) management in South Asia by increasing availability and affordability of healthcare services. This review explores the current landscape, challenges, and strategies for leveraging digital health technologies to improve CVD outcomes in the region. RECENT FINDINGS: Several South Asian countries are implementing national digital health strategies that aim to provide unique health account numbers for patients, creating longitudinal digital health records while others aim to digitize healthcare services and improve health outcomes. Significant challenges impede progress, including lack of interoperability, inadequate training of healthcare workers, cultural barriers, and data privacy concerns. Leveraging digital health for CVD management involves using big data for early detection, employing artificial intelligence for diagnostics, and integrating multiomics data for health insights. Addressing these challenges through policy frameworks, capacity building, and international cooperation is crucial for improving CVD outcomes in region.
Assuntos
Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/terapia , Doenças Cardiovasculares/epidemiologia , Ásia/epidemiologia , Ciência de Dados/métodos , Telemedicina , Big Data , Saúde Digital , Ásia MeridionalRESUMO
Hypothesis generation in observational, biomedical data science often starts with computing an association or identifying the statistical relationship between a dependent and an independent variable. However, the outcome of this process depends fundamentally on modeling strategy, with differing strategies generating what can be called "vibration of effects" (VoE). VoE is defined by variation in associations that often lead to contradictory results. Here, we present a computational tool capable of modeling VoE in biomedical data by fitting millions of different models and comparing their output. We execute a VoE analysis on a series of widely reported associations (e.g., carrot intake associated with eyesight) with an extended additional focus on lifestyle exposures (e.g., physical activity) and components of the Framingham Risk Score for cardiovascular health (e.g., blood pressure). We leveraged our tool for potential confounder identification, investigating what adjusting variables are responsible for conflicting models. We propose modeling VoE as a critical step in navigating discovery in observational data, discerning robust associations, and cataloging adjusting variables that impact model output.
Assuntos
Ciência de Dados/métodos , Modelos Estatísticos , Estudos Observacionais como Assunto/estatística & dados numéricos , Métodos Epidemiológicos , HumanosRESUMO
Why would a computational biologist with 40 years of research experience say bioinformatics is dead? The short answer is, in being the Founding Dean of a new School of Data Science, what we do suddenly looks different.
Assuntos
Biologia Computacional/métodos , Biologia Computacional/tendências , Ciência de Dados/tendências , Biologia Computacional/educação , Currículo , Ciência de Dados/métodos , Humanos , Disseminação de Informação/métodos , Instituições Acadêmicas , EstudantesRESUMO
BACKGROUND: For years, nurse researchers have been called upon to engage with "big data" in the electronic health record (EHR) by leading studies focusing on nurse-centric patient outcomes and providing clinical analysis of potential outcome indicators. However, the current gap in nurses' data science education and training poses a significant barrier. OBJECTIVES: We aimed to evaluate the viability of conducting nurse-led, big-data research projects within a custom-designed computational laboratory and examine the support required by a team of researchers with little to no big-data experience. METHODS: Four nurse-led research teams developed a research question reliant on existing EHR data. Each team was given its own virtual computational laboratory populated with raw data. A data science education team provided instruction in coding languages-primarily structured query language and R-and data science techniques to organize and analyze the data. RESULTS: Three research teams have completed studies, resulting in one manuscript currently undergoing peer review and two manuscripts in progress. The final team is performing data analysis. Five barriers and five facilitators to big-data projects were identified. DISCUSSION: As the data science learning curve is steep, organizations need to help bridge the gap between what is currently taught in doctoral nursing programs and what is required of clinical nurse researchers to successfully engage in big-data methods. In addition, clinical nurse researchers require protected research time and a data science infrastructure that supports novice efforts with education, mentorship, and computational laboratory resources.
Assuntos
Ciência de Dados , Registros Eletrônicos de Saúde , Pesquisa em Enfermagem , Humanos , Ciência de Dados/métodos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Big Data , Pesquisadores/estatística & dados numéricosRESUMO
Data-informed decision making is a critical goal for many community-based public health research initiatives. However, community partners often encounter challenges when interacting with data. The Community-Engaged Data Science (CEDS) model offers a goal-oriented, iterative guide for communities to collaborate with research data scientists through data ambassadors. This study presents a case study of CEDS applied to research on the opioid epidemic in 18 counties in Ohio as part of the HEALing Communities Study (HCS). Data ambassadors provided a pivotal role in empowering community coalitions to translate data into action using key steps of CEDS which included: data landscapes identifying available data in the community; data action plans from logic models based on community data needs and gaps of data; data collection/sharing agreements; and data systems including portals and dashboards. Throughout the CEDS process, data ambassadors emphasized sustainable data workflows, supporting continued data engagement beyond the HCS. The implementation of CEDS in Ohio underscored the importance of relationship building, timing of implementation, understanding communities' data preferences, and flexibility when working with communities. Researchers should consider implementing CEDS and integrating a data ambassador in community-based research to enhance community data engagement and drive data-informed interventions to improve public health outcomes.
Assuntos
Ciência de Dados , Humanos , Ohio , Ciência de Dados/métodos , Pesquisa Participativa Baseada na Comunidade , Estudos de Casos Organizacionais , Participação da Comunidade/métodos , Transtornos Relacionados ao Uso de Opioides/epidemiologiaRESUMO
BACKGROUND: Monitoring free-living physical activity (PA) through wearable devices enables the real-time assessment of activity features associated with health outcomes and provision of treatment recommendations and adjustments. The conclusions of studies on PA and health depend crucially on reliable statistical analyses of digital data. Data analytics, however, are challenging due to the various metrics adopted for measuring PA, different aims of studies, and complex temporal variations within variables. The application, interpretation, and appropriateness of these analytical tools have yet to be summarized. OBJECTIVE: This research aimed to review studies that used analytical methods for analyzing PA monitored by accelerometers. Specifically, this review addressed three questions: (1) What metrics are used to describe an individual's free-living daily PA? (2) What are the current analytical tools for analyzing PA data, particularly under the aims of classification, association with health outcomes, and prediction of health events? and (3) What challenges exist in the analyses, and what recommendations for future research are suggested regarding the use of statistical methods in various research tasks? METHODS: This scoping review was conducted following an existing framework to map research studies by exploring the information about PA. Three databases, PubMed, IEEE Xplore, and the ACM Digital Library, were searched in February 2024 to identify related publications. Eligible articles were classification, association, or prediction studies involving human PA monitored through wearable accelerometers. RESULTS: After screening 1312 articles, 428 (32.62%) eligible studies were identified and categorized into at least 1 of the following 3 thematic categories: classification (75/428, 17.5%), association (342/428, 79.9%), and prediction (32/428, 7.5%). Most articles (414/428, 96.7%) derived PA variables from 3D acceleration, rather than 1D acceleration. All eligible articles (428/428, 100%) considered PA metrics represented in the time domain, while a small fraction (16/428, 3.7%) also considered PA metrics in the frequency domain. The number of studies evaluating the influence of PA on health conditions has increased greatly. Among the studies in our review, regression-type models were the most prevalent (373/428, 87.1%). The machine learning approach for classification research is also gaining popularity (32/75, 43%). In addition to summary statistics of PA, several recent studies used tools to incorporate PA trajectories and account for temporal patterns, including longitudinal data analysis with repeated PA measurements and functional data analysis with PA as a continuum for time-varying association (68/428, 15.9%). CONCLUSIONS: Summary metrics can quickly provide descriptions of the strength, frequency, and duration of individuals' overall PA. When the distribution and profile of PA need to be evaluated or detected, considering PA metrics as longitudinal or functional data can provide detailed information and improve the understanding of the role PA plays in health. Depending on the research goal, appropriate analytical tools can ensure the reliability of the scientific findings.
Assuntos
Acelerometria , Exercício Físico , Humanos , Acelerometria/instrumentação , Dispositivos Eletrônicos Vestíveis , Ciência de Dados/métodosRESUMO
BACKGROUND: Artificial intelligence (AI) holds immense potential for enhancing clinical and administrative health care tasks. However, slow adoption and implementation challenges highlight the need to consider how humans can effectively collaborate with AI within broader socio-technical systems in health care. OBJECTIVE: In the example of intensive care units (ICUs), we compare data scientists' and clinicians' assessments of the optimal utilization of human and AI capabilities by determining suitable levels of human-AI teaming for safely and meaningfully augmenting or automating 6 core tasks. The goal is to provide actionable recommendations for policy makers and health care practitioners regarding AI design and implementation. METHODS: In this multimethod study, we combine a systematic task analysis across 6 ICUs with an international Delphi survey involving 19 health data scientists from the industry and academia and 61 ICU clinicians (25 physicians and 36 nurses) to define and assess optimal levels of human-AI teaming (level 1=no performance benefits; level 2=AI augments human performance; level 3=humans augment AI performance; level 4=AI performs without human input). Stakeholder groups also considered ethical and social implications. RESULTS: Both stakeholder groups chose level 2 and 3 human-AI teaming for 4 out of 6 core tasks in the ICU. For one task (monitoring), level 4 was the preferred design choice. For the task of patient interactions, both data scientists and clinicians agreed that AI should not be used regardless of technological feasibility due to the importance of the physician-patient and nurse-patient relationship and ethical concerns. Human-AI design choices rely on interpretability, predictability, and control over AI systems. If these conditions are not met and AI performs below human-level reliability, a reduction to level 1 or shifting accountability away from human end users is advised. If AI performs at or beyond human-level reliability and these conditions are not met, shifting to level 4 automation should be considered to ensure safe and efficient human-AI teaming. CONCLUSIONS: By considering the sociotechnical system and determining appropriate levels of human-AI teaming, our study showcases the potential for improving the safety and effectiveness of AI usage in ICUs and broader health care settings. Regulatory measures should prioritize interpretability, predictability, and control if clinicians hold full accountability. Ethical and social implications must be carefully evaluated to ensure effective collaboration between humans and AI, particularly considering the most recent advancements in generative AI.
Assuntos
Inteligência Artificial , Cuidados Críticos , Humanos , Cuidados Críticos/métodos , Unidades de Terapia Intensiva , Automação , Técnica Delphi , Ciência de Dados/métodos , Masculino , FemininoRESUMO
BACKGROUND: The rapid advancement of digital technologies, particularly in big data analytics (BDA), artificial intelligence (AI), machine learning (ML), and deep learning (DL), is reshaping the global health care system, including in Bangladesh. The increased adoption of these technologies in health care delivery within Bangladesh has sparked their integration into health care and public health research, resulting in a noticeable surge in related studies. However, a critical gap exists, as there is a lack of comprehensive evidence regarding the research landscape; regulatory challenges; use cases; and the application and adoption of BDA, AI, ML, and DL in the health care system of Bangladesh. This gap impedes the attainment of optimal results. As Bangladesh is a leading implementer of digital technologies, bridging this gap is urgent for the effective use of these advancing technologies. OBJECTIVE: This scoping review aims to collate (1) the existing research in Bangladesh's health care system, using the aforementioned technologies and synthesizing their findings, and (2) the limitations faced by researchers in integrating the aforementioned technologies into health care research. METHODS: MEDLINE (via PubMed), IEEE Xplore, Scopus, and Embase databases were searched to identify published research articles between January 1, 2000, and September 10, 2023, meeting the following inclusion criteria: (1) any study using any of the BDA, AI, ML, and DL technologies and health care and public health datasets for predicting health issues and forecasting any kind of outbreak; (2) studies primarily focusing on health care and public health issues in Bangladesh; and (3) original research articles published in peer-reviewed journals and conference proceedings written in English. RESULTS: With the initial search, we identified 1653 studies. Following the inclusion and exclusion criteria and full-text review, 4.66% (77/1653) of the articles were finally included in this review. There was a substantial increase in studies over the last 5 years (2017-2023). Among the 77 studies, the majority (n=65, 84%) used ML models. A smaller proportion of studies incorporated AI (4/77, 5%), DL (7/77, 9%), and BDA (1/77, 1%) technologies. Among the reviewed articles, 52% (40/77) relied on primary data, while the remaining 48% (37/77) used secondary data. The primary research areas of focus were infectious diseases (15/77, 19%), noncommunicable diseases (23/77, 30%), child health (11/77, 14%), and mental health (9/77, 12%). CONCLUSIONS: This scoping review highlights remarkable progress in leveraging BDA, AI, ML, and DL within Bangladesh's health care system. The observed surge in studies over the last 5 years underscores the increasing significance of AI and related technologies in health care research. Notably, most (65/77, 84%) studies focused on ML models, unveiling opportunities for advancements in predictive modeling. This review encapsulates the current state of technological integration and propels us into a promising era for the future of digital Bangladesh.
Assuntos
Inteligência Artificial , Big Data , Aprendizado Profundo , Atenção à Saúde , Aprendizado de Máquina , Bangladesh , Humanos , Atenção à Saúde/estatística & dados numéricos , Ciência de Dados/métodosRESUMO
Information generated via next-generation sequencing (NGS) technologies is often termed multi-omics data [...].
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Oncologia , Neoplasias , Medicina de Precisão , Medicina de Precisão/métodos , Medicina de Precisão/tendências , Humanos , Oncologia/métodos , Oncologia/tendências , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Neoplasias/terapia , Ciência de Dados/métodos , Genômica/métodosRESUMO
The study of non-natural biocatalytic transformations relies heavily on empirical methods, such as directed evolution, for identifying improved variants. Although exceptionally effective, this approach provides limited insight into the molecular mechanisms behind the transformations and necessitates multiple protein engineering campaigns for new reactants. To address this limitation, we disclose a strategy to explore the biocatalytic reaction space and garner insight into the molecular mechanisms driving enzymatic transformations. Specifically, we explored the selectivity of an "ene"-reductase, GluER-T36A, to create a data-driven toolset that explores reaction space and rationalizes the observed and predicted selectivities of substrate/mutant combinations. The resultant statistical models related structural features of the enzyme and substrate to selectivity and were used to effectively predict selectivity in reactions with out-of-sample substrates and mutants. Our approach provided a deeper understanding of enantioinduction by GluER-T36A and holds the potential to enhance the virtual screening of enzyme mutants.
Assuntos
Ciência de Dados , Ciência de Dados/métodos , Biocatálise , Estereoisomerismo , Especificidade por Substrato , Ligantes , Mutação , Modelos MolecularesRESUMO
Several challenges remain in data-independent acquisition (DIA) data analysis, such as to confidently identify peptides, define integration boundaries, remove interferences, and control false discovery rates. In practice, a visual inspection of the signals is still required, which is impractical with large datasets. We present Avant-garde as a tool to refine DIA (and parallel reaction monitoring) data. Avant-garde uses a novel data-driven scoring strategy: signals are refined by learning from the dataset itself, using all measurements in all samples to achieve the best optimization. We evaluate the performance of Avant-garde using benchmark DIA datasets and show that it can determine the quantitative suitability of a peptide peak, and reach the same levels of selectivity, accuracy, and reproducibility as manual validation. Avant-garde is complementary to existing DIA analysis engines and aims to establish a strong foundation for subsequent analysis of quantitative mass spectrometry data.
Assuntos
Análise de Dados , Curadoria de Dados/métodos , Ciência de Dados/métodos , Proteoma/análise , Proteômica/métodos , Linhagem Celular , Células HEK293 , Humanos , Espectrometria de Massas/métodos , Peptídeos/análise , Reprodutibilidade dos Testes , SoftwareRESUMO
Data-independent acquisition modes isolate and concurrently fragment populations of different precursors by cycling through segments of a predefined precursor m/z range. Although these selection windows collectively cover the entire m/z range, overall, only a few per cent of all incoming ions are isolated for mass analysis. Here, we make use of the correlation of molecular weight and ion mobility in a trapped ion mobility device (timsTOF Pro) to devise a scan mode that samples up to 100% of the peptide precursor ion current in m/z and mobility windows. We extend an established targeted data extraction workflow by inclusion of the ion mobility dimension for both signal extraction and scoring and thereby increase the specificity for precursor identification. Data acquired from whole proteome digests and mixed organism samples demonstrate deep proteome coverage and a high degree of reproducibility as well as quantitative accuracy, even from 10 ng sample amounts.
Assuntos
Ciência de Dados/métodos , Ensaios de Triagem em Larga Escala/métodos , Canais Iônicos/metabolismo , Transporte de Íons/fisiologia , Proteoma/metabolismo , Linhagem Celular Tumoral , Células HeLa , Humanos , Íons/química , Proteômica/métodos , Reprodutibilidade dos Testes , Espectrometria de Massas em Tandem/métodosRESUMO
Pluripotent stem cells, in the recent years, have been demonstrated to mimic different aspects of metazoan embryonic development in vitro. This has led to the establishment of synthetic embryology: a field that makes use of in vitro stem cell models to investigate developmental processes that would be otherwise inaccessible in vivo. Currently, a plethora of engineering-inspired techniques, including microfluidic devices and bioreactors, exist to generate and culture organoids at high throughput. Similarly, data analysis and deep learning-based techniques, that were established in in vivo models, are now being used to extract quantitative information from synthetic systems. Finally, theory and data-driven in silico modeling are starting to provide a system-level understanding of organoids and make predictions to be tested with further experiments. Here, we discuss our vision of how engineering, data science and theoretical modeling will synergize to offer an unprecedented view of embryonic development. For every one of these three scientific domains, we discuss examples from in vivo and in vitro systems that we think will pave the way to future developments of synthetic embryology.