Search | Nursing VHL Search Portal

1.

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project.

Stenton, Sarah L; O'Leary, Melanie C; Lemire, Gabrielle; VanNoy, Grace E; DiTroia, Stephanie; Ganesh, Vijay S; Groopman, Emily; O'Heir, Emily; Mangilog, Brian; Osei-Owusu, Ikeoluwa; Pais, Lynn S; Serrano, Jillian; Singer-Berk, Moriel; Weisburd, Ben; Wilson, Michael W; Austin-Tse, Christina; Abdelhakim, Marwa; Althagafi, Azza; Babbi, Giulia; Bellazzi, Riccardo; Bovo, Samuele; Carta, Maria Giulia; Casadio, Rita; Coenen, Pieter-Jan; De Paoli, Federica; Floris, Matteo; Gajapathy, Manavalan; Hoehndorf, Robert; Jacobsen, Julius O B; Joseph, Thomas; Kamandula, Akash; Katsonis, Panagiotis; Kint, Cyrielle; Lichtarge, Olivier; Limongelli, Ivan; Lu, Yulan; Magni, Paolo; Mamidi, Tarun Karthik Kumar; Martelli, Pier Luigi; Mulargia, Marta; Nicora, Giovanna; Nykamp, Keith; Pejaver, Vikas; Peng, Yisu; Pham, Thi Hong Cam; Podda, Maurizio S; Rao, Aditya; Rizzo, Ettore; Saipradeep, Vangala G; Savojardo, Castrense.

Hum Genomics ; 18(1): 44, 2024 Apr 29.

Article in English | MEDLINE | ID: mdl-38685113

ABSTRACT

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.

Subject(s)

Rare Diseases , Humans , Rare Diseases/genetics , Rare Diseases/diagnosis , Genome, Human/genetics , Genetic Variation/genetics , Computational Biology/methods , Phenotype

2.

MALDI mass spectrometry imaging shows a gradual change in the proteome landscape during mouse ovarian folliculogenesis.

Fiorentino, Giulia; Smith, Andrew; Nicora, Giovanna; Bellazzi, Riccardo; Magni, Fulvio; Garagna, Silvia; Zuccotti, Maurizio.

Mol Hum Reprod ; 29(4)2023 04 03.

Article in English | MEDLINE | ID: mdl-36734599

ABSTRACT

Our knowledge regarding the role proteins play in the mutual relationship among oocytes, surrounding follicle cells, stroma, and the vascular network inside the ovary is still poor and obtaining insights into this context would significantly aid our understanding of folliculogenesis. Here, we describe a spatial proteomics approach to characterize the proteome of individual follicles at different growth stages in a whole prepubertal 25-day-old mouse ovary. A total of 401 proteins were identified by nano-scale liquid chromatography-electrospray ionization-tandem mass spectrometry (nLC-ESI-MS/MS), 69 with a known function in ovary biology, as demonstrated by earlier proteomics studies. Enrichment analysis highlighted significant KEGG and Reactome pathways, with apoptosis, developmental biology, PI3K-Akt, epigenetic regulation of gene expression, and extracellular matrix organization being well represented. Then, correlating these data with the spatial information provided by matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI) on 276 follicles enabled the protein profiles of single follicle types to be mapped within their native context, highlighting 94 proteins that were detected throughout the secondary to the pre-ovulatory transition. Statistical analyses identified a group of 37 proteins that showed a gradual quantitative change during follicle differentiation, comprising 10 with a known role in follicle growth (NUMA1, TPM2), oocyte germinal vesicle-to-metaphase II transition (SFPQ, ACTBL, MARCS, NUCL), ovulation (GELS, CO1A2), and preimplantation development (TIF1B, KHDC3). The proteome landscape identified includes molecules of known function in the ovary, but also those whose specific role is emerging. Altogether, this work demonstrates the utility of performing spatial proteomics in the context of the ovary and offers sound bases for more in-depth investigations that aim to further unravel its spatial proteome.

Subject(s)

Proteome , Tandem Mass Spectrometry , Female , Animals , Mice , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Proteome/metabolism , Epigenesis, Genetic , Phosphatidylinositol 3-Kinases/metabolism

3.

Health informatics and EHR to support clinical research in the COVID-19 pandemic: an overview.

Dagliati, Arianna; Malovini, Alberto; Tibollo, Valentina; Bellazzi, Riccardo.

Brief Bioinform ; 22(2): 812-822, 2021 03 22.

Article in English | MEDLINE | ID: mdl-33454728

ABSTRACT

The coronavirus disease 2019 (COVID-19) pandemic has clearly shown that major challenges and threats for humankind need to be addressed with global answers and shared decisions. Data and their analytics are crucial components of such decision-making activities. Rather interestingly, one of the most difficult aspects is reusing and sharing of accurate and detailed clinical data collected by Electronic Health Records (EHR), even if these data have a paramount importance. EHR data, in fact, are not only essential for supporting day-by-day activities, but also they can leverage research and support critical decisions about effectiveness of drugs and therapeutic strategies. In this paper, we will concentrate our attention on collaborative data infrastructures to support COVID-19 research and on the open issues of data sharing and data governance that COVID-19 had made emerge. Data interoperability, healthcare processes modelling and representation, shared procedures to deal with different data privacy regulations, and data stewardship and governance are seen as the most important aspects to boost collaborative research. Lessons learned from COVID-19 pandemic can be a strong element to improve international research and our future capability of dealing with fast developing emergencies and needs, which are likely to be more frequent in the future in our connected and intertwined world.

Subject(s)

COVID-19/epidemiology , Electronic Health Records , Medical Informatics , Pandemics , COVID-19/virology , Humans , SARS-CoV-2/isolation & purification

4.

Localizing in-domain adaptation of transformer-based biomedical language models.

Buonocore, Tommaso Mario; Crema, Claudio; Redolfi, Alberto; Bellazzi, Riccardo; Parimbelli, Enea.

J Biomed Inform ; 144: 104431, 2023 08.

Article in English | MEDLINE | ID: mdl-37385327

ABSTRACT

In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.

Subject(s)

Language , Natural Language Processing , Humans , Records , Italy , Unified Medical Language System

5.

Advancing Italian biomedical information extraction with transformers-based models: Methodological insights and multicenter practical application.

Crema, Claudio; Buonocore, Tommaso Mario; Fostinelli, Silvia; Parimbelli, Enea; Verde, Federico; Fundarò, Cira; Manera, Marina; Ramusino, Matteo Cotta; Capelli, Marco; Costa, Alfredo; Binetti, Giuliano; Bellazzi, Riccardo; Redolfi, Alberto.

J Biomed Inform ; 148: 104557, 2023 Dec.

Article in English | MEDLINE | ID: mdl-38012982

ABSTRACT

The introduction of computerized medical records in hospitals has reduced burdensome activities like manual writing and information fetching. However, the data contained in medical records are still far underutilized, primarily because extracting data from unstructured textual medical records takes time and effort. Information Extraction, a subfield of Natural Language Processing, can help clinical practitioners overcome this limitation by using automated text-mining pipelines. In this work, we created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Transformers-based model. Moreover, we collected and leveraged three external independent datasets to implement an effective multicenter model, with overall F1-score 84.77 %, Precision 83.16 %, Recall 86.44 %. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "low-resource" approach. This allowed us to establish methodological guidelines that pave the way for Natural Language Processing studies in less-resourced languages.

Subject(s)

Data Mining , Language , Humans , Data Mining/methods , Electronic Health Records , Italy , Natural Language Processing , Multicenter Studies as Topic

6.

Evaluating pointwise reliability of machine learning prediction.

Nicora, Giovanna; Rios, Miguel; Abu-Hanna, Ameen; Bellazzi, Riccardo.

J Biomed Inform ; 127: 103996, 2022 03.

Article in English | MEDLINE | ID: mdl-35041981

ABSTRACT

Interest in Machine Learning applications to tackle clinical and biological problems is increasing. This is driven by promising results reported in many research papers, the increasing number of AI-based software products, and by the general interest in Artificial Intelligence to solve complex problems. It is therefore of importance to improve the quality of machine learning output and add safeguards to support their adoption. In addition to regulatory and logistical strategies, a crucial aspect is to detect when a Machine Learning model is not able to generalize to new unseen instances, which may originate from a population distant to that of the training population or from an under-represented subpopulation. As a result, the prediction of the machine learning model for these instances may be often wrong, given that the model is applied outside its "reliable" space of work, leading to a decreasing trust of the final users, such as clinicians. For this reason, when a model is deployed in practice, it would be important to advise users when the model's predictions may be unreliable, especially in high-stakes applications, including those in healthcare. Yet, reliability assessment of each machine learning prediction is still poorly addressed. Here, we review approaches that can support the identification of unreliable predictions, we harmonize the notation and terminology of relevant concepts, and we highlight and extend possible interrelationships and overlap among concepts. We then demonstrate, on simulated and real data for ICU in-hospital death prediction, a possible integrative framework for the identification of reliable and unreliable predictions. To do so, our proposed approach implements two complementary principles, namely the density principle and the local fit principle. The density principle verifies that the instance we want to evaluate is similar to the training set. The local fit principle verifies that the trained model performs well on training subsets that are more similar to the instance under evaluation. Our work can contribute to consolidating work in machine learning especially in medicine.

Subject(s)

Artificial Intelligence , Machine Learning , Hospital Mortality , Reproducibility of Results , Software

7.

SurvMaximin: Robust federated approach to transporting survival risk prediction models.

Wang, Xuan; Zhang, Harrison G; Xiong, Xin; Hong, Chuan; Weber, Griffin M; Brat, Gabriel A; Bonzel, Clara-Lea; Luo, Yuan; Duan, Rui; Palmer, Nathan P; Hutch, Meghan R; Gutiérrez-Sacristán, Alba; Bellazzi, Riccardo; Chiovato, Luca; Cho, Kelly; Dagliati, Arianna; Estiri, Hossein; García-Barrio, Noelia; Griffier, Romain; Hanauer, David A; Ho, Yuk-Lam; Holmes, John H; Keller, Mark S; Klann MEng, Jeffrey G; L'Yi, Sehi; Lozano-Zahonero, Sara; Maidlow, Sarah E; Makoudjou, Adeline; Malovini, Alberto; Moal, Bertrand; Moore, Jason H; Morris, Michele; Mowery, Danielle L; Murphy, Shawn N; Neuraz, Antoine; Yuan Ngiam, Kee; Omenn, Gilbert S; Patel, Lav P; Pedrera-Jiménez, Miguel; Prunotto, Andrea; Jebathilagam Samayamuthu, Malarkodi; Sanz Vidorreta, Fernando J; Schriver, Emily R; Schubert, Petra; Serrano-Balazote, Pablo; South, Andrew M; Tan, Amelia L M; Tan, Byorn W L; Tibollo, Valentina; Tippmann, Patric.

J Biomed Inform ; 134: 104176, 2022 10.

Article in English | MEDLINE | ID: mdl-36007785

ABSTRACT

OBJECTIVE: For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. MATERIALS AND METHODS: For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. RESULTS: Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. CONCLUSIONS: The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.

Subject(s)

Algorithms , Electronic Health Records , Humans , Privacy , Proportional Hazards Models , Survival Analysis

8.

Cytoplasmic movements of the early human embryo: imaging and artificial intelligence to predict blastocyst development.

Coticchio, Giovanni; Fiorentino, Giulia; Nicora, Giovanna; Sciajno, Raffaella; Cavalera, Federica; Bellazzi, Riccardo; Garagna, Silvia; Borini, Andrea; Zuccotti, Maurizio.

Reprod Biomed Online ; 42(3): 521-528, 2021 Mar.

Article in English | MEDLINE | ID: mdl-33558172

ABSTRACT

RESEARCH QUESTION: Can artificial intelligence and advanced image analysis extract and harness novel information derived from cytoplasmic movements of the early human embryo to predict development to blastocyst? DESIGN: In a proof-of-principle study, 230 human preimplantation embryos were retrospectively assessed using an artificial neural network. After intracytoplasmic sperm injection, embryos underwent time-lapse monitoring for 44 h. For comparison, standard embryo assessment of each embryo by a single embryologist was carried out to predict development to blastocyst stage based on a single picture frame taken at 42 h of development. In the experimental approach, in embryos that developed to blastocyst or destined to arrest, cytoplasm movement velocity was recorded by time-lapse monitoring during the first 44 h of culture and analysed with a Particle Image Velocimetry algorithm to extract quantitative information. Three main artificial intelligence approaches, the k-Nearest Neighbour, the Long-Short Term Memory Neural Network and the hybrid ensemble classifier were used to classify the embryos. RESULTS: Blind operator assessment classified each embryo in terms of ability to develop to blastocyst, with 75.4% accuracy, 76.5% sensitivity, 74.3% specificity, 74.3% precision and 75.4% F1 score. Integration of results from artificial intelligence models with the blind operator classification, resulted in 82.6% accuracy, 79.4% sensitivity, 85.7% specificity, 84.4% precision and 81.8% F1 score. CONCLUSIONS: The present study suggests the possibility of predicting human blastocyst development at early cleavage stages by detection of cytoplasm movement velocity and artificial intelligence analysis. This indicates the importance of the dynamics of the cytoplasm as a novel and valuable source of data to assess embryo viability.

Subject(s)

Blastocyst/physiology , Cytoplasm/physiology , Embryonic Development , Neural Networks, Computer , Time-Lapse Imaging , Humans , Proof of Concept Study , Retrospective Studies

9.

What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.

Kohane, Isaac S; Aronow, Bruce J; Avillach, Paul; Beaulieu-Jones, Brett K; Bellazzi, Riccardo; Bradford, Robert L; Brat, Gabriel A; Cannataro, Mario; Cimino, James J; García-Barrio, Noelia; Gehlenborg, Nils; Ghassemi, Marzyeh; Gutiérrez-Sacristán, Alba; Hanauer, David A; Holmes, John H; Hong, Chuan; Klann, Jeffrey G; Loh, Ne Hooi Will; Luo, Yuan; Mandl, Kenneth D; Daniar, Mohamad; Moore, Jason H; Murphy, Shawn N; Neuraz, Antoine; Ngiam, Kee Yuan; Omenn, Gilbert S; Palmer, Nathan; Patel, Lav P; Pedrera-Jiménez, Miguel; Sliz, Piotr; South, Andrew M; Tan, Amelia Li Min; Taylor, Deanne M; Taylor, Bradley W; Torti, Carlo; Vallejos, Andrew K; Wagholikar, Kavishwar B; Weber, Griffin M; Cai, Tianxi.

J Med Internet Res ; 23(3): e22219, 2021 03 02.

Article in English | MEDLINE | ID: mdl-33600347

ABSTRACT

Coincident with the tsunami of COVID-19-related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.

Subject(s)

COVID-19/epidemiology , Data Collection/methods , Electronic Health Records , Data Collection/standards , Humans , Peer Review, Research/standards , Publishing/standards , Reproducibility of Results , SARS-CoV-2/isolation & purification

10.

International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study.

Weber, Griffin M; Zhang, Harrison G; L'Yi, Sehi; Bonzel, Clara-Lea; Hong, Chuan; Avillach, Paul; Gutiérrez-Sacristán, Alba; Palmer, Nathan P; Tan, Amelia Li Min; Wang, Xuan; Yuan, William; Gehlenborg, Nils; Alloni, Anna; Amendola, Danilo F; Bellasi, Antonio; Bellazzi, Riccardo; Beraghi, Michele; Bucalo, Mauro; Chiovato, Luca; Cho, Kelly; Dagliati, Arianna; Estiri, Hossein; Follett, Robert W; García Barrio, Noelia; Hanauer, David A; Henderson, Darren W; Ho, Yuk-Lam; Holmes, John H; Hutch, Meghan R; Kavuluru, Ramakanth; Kirchoff, Katie; Klann, Jeffrey G; Krishnamurthy, Ashok K; Le, Trang T; Liu, Molei; Loh, Ne Hooi Will; Lozano-Zahonero, Sara; Luo, Yuan; Maidlow, Sarah; Makoudjou, Adeline; Malovini, Alberto; Martins, Marcelo Roberto; Moal, Bertrand; Morris, Michele; Mowery, Danielle L; Murphy, Shawn N; Neuraz, Antoine; Ngiam, Kee Yuan; Okoshi, Marina P; Omenn, Gilbert S.

J Med Internet Res ; 23(10): e31400, 2021 10 11.

Article in English | MEDLINE | ID: mdl-34533459

ABSTRACT

BACKGROUND: Many countries have experienced 2 predominant waves of COVID-19-related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. OBJECTIVE: In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. METHODS: Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. RESULTS: Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. CONCLUSIONS: Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.

Subject(s)

COVID-19 , Pandemics , Adult , Aged , Female , Hospitalization , Hospitals , Humans , Male , Middle Aged , Retrospective Studies , SARS-CoV-2

11.

Authorship Correction: International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study.

Weber, Griffin M; Zhang, Harrison G; L'Yi, Sehi; Bonzel, Clara-Lea; Hong, Chuan; Avillach, Paul; Gutiérrez-Sacristán, Alba; Palmer, Nathan P; Tan, Amelia Li Min; Wang, Xuan; Yuan, William; Gehlenborg, Nils; Alloni, Anna; Amendola, Danilo F; Bellasi, Antonio; Bellazzi, Riccardo; Beraghi, Michele; Bucalo, Mauro; Chiovato, Luca; Cho, Kelly; Dagliati, Arianna; Estiri, Hossein; Follett, Robert W; García Barrio, Noelia; Hanauer, David A; Henderson, Darren W; Ho, Yuk-Lam; Holmes, John H; Hutch, Meghan R; Kavuluru, Ramakanth; Kirchoff, Katie; Klann, Jeffrey G; Krishnamurthy, Ashok K; Le, Trang T; Liu, Molei; Loh, Ne Hooi Will; Lozano-Zahonero, Sara; Luo, Yuan; Maidlow, Sarah; Makoudjou, Adeline; Malovini, Alberto; Martins, Marcelo Roberto; Moal, Bertrand; Morris, Michele; Mowery, Danielle L; Murphy, Shawn N; Neuraz, Antoine; Ngiam, Kee Yuan; Okoshi, Marina P; Omenn, Gilbert S.

J Med Internet Res ; 23(11): e34625, 2021 Nov 30.

Article in English | MEDLINE | ID: mdl-34889759

ABSTRACT

[This corrects the article DOI: 10.2196/31400.].

12.

A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks.

Sauta, Elisabetta; Demartini, Andrea; Vitali, Francesca; Riva, Alberto; Bellazzi, Riccardo.

BMC Bioinformatics ; 21(1): 219, 2020 May 29.

Article in English | MEDLINE | ID: mdl-32471360

ABSTRACT

BACKGROUND: Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number of interacting variables and the noise in the available heterogeneous experimental sources of information. RESULTS: In this work, we propose a data fusion approach that exploits the integration of complementary omics-data as prior knowledge within a Bayesian framework, in order to learn and model large-scale transcriptional networks. We develop a hybrid structure-learning algorithm able to jointly combine TFs ChIP-Sequencing data and gene expression compendia to reconstruct TRNs in a genome-wide perspective. Applying our method to high-throughput data, we verified its ability to deal with the complexity of a genomic TRN, providing a snapshot of the synergistic TFs regulatory activity. Given the noisy nature of data-driven prior knowledge, which potentially contains incorrect information, we also tested the method's robustness to false priors on a benchmark dataset, comparing the proposed approach to other regulatory network reconstruction algorithms. We demonstrated the effectiveness of our framework by evaluating structural commonalities of our learned genomic network with other existing networks inferred by different DNA binding information-based methods. CONCLUSIONS: This Bayesian omics-data fusion based methodology allows to gain a genome-wide picture of the transcriptional interplay, helping to unravel key hierarchical transcriptional interactions, which could be subsequently investigated, and it represents a promising learning approach suitable for multi-layered genomic data integration, given its robustness to noisy sources and its tailored framework for handling high dimensional data.

Subject(s)

Gene Regulatory Networks , Algorithms , Bayes Theorem , Chromatin Immunoprecipitation Sequencing , Genomics/methods , Transcription Factors/metabolism

13.

A survey on single and multi omics data mining methods in cancer data classification.

Momeni, Zahra; Hassanzadeh, Esmail; Saniee Abadeh, Mohammad; Bellazzi, Riccardo.

J Biomed Inform ; 107: 103466, 2020 07.

Article in English | MEDLINE | ID: mdl-32525020

ABSTRACT

Data analytics is routinely used to support biomedical research in all areas, with particular focus on the most relevant clinical conditions, such as cancer. Bioinformatics approaches, in particular, have been used to characterize the molecular aspects of diseases. In recent years, numerous studies have been performed on cancer based upon single and multi-omics data. For example, Single-omics-based studies have employed a diverse set of data, such as gene expression, DNA methylation, or miRNA, to name only a few instances. Despite that, a significant part of literature reports studies on gene expression with microarray datasets. Single-omics data have high numbers of attributes and very low sample counts. This characteristic makes them paradigmatic of an under-sampled, small-n large-p machine learning problem. An important goal of single-omics data analysis is to find the most relevant genes, in terms of their potential use in clinics and research, in the batch of available data. This problem has been addressed in gene selection as one of the pre-processing steps in data mining. An analysis that use only one type of data (single-omics) often miss the complexity of the landscape of molecular phenomena underlying the disease. As a result, they provide limited and sometimes poorly reliable information about the disease mechanisms. Therefore, in recent years, researchers have been eager to build models that are more complex, obtaining more reliable results using multi-omics data. However, to achieve this, the most important challenge is data integration. In this paper, we provide a comprehensive overview of the challenges in single and multi-omics data analysis of cancer data, focusing on gene selection and data integration methods.

Subject(s)

Genomics , Neoplasms , Computational Biology , Data Mining , Humans , Machine Learning , Neoplasms/genetics

14.

Deep Learning to Unveil Correlations between Urban Landscape and Population Health.

Pala, Daniele; Caldarone, Alessandro Aldo; Franzini, Marica; Malovini, Alberto; Larizza, Cristiana; Casella, Vittorio; Bellazzi, Riccardo.

Sensors (Basel) ; 20(7)2020 Apr 08.

Article in English | MEDLINE | ID: mdl-32276488

ABSTRACT

The global healthcare landscape is continuously changing throughout the world as technology advances, leading to a gradual change in lifestyle. Several diseases such as asthma and cardiovascular conditions are becoming more diffuse, due to a rise in pollution exposure and a more sedentary lifestyle. Healthcare providers deal with increasing new challenges, and thanks to fast-developing big data technologies, they can be faced with systems that provide direct support to citizens. In this context, within the EU-funded Participatory Urban Living for Sustainable Environments (PULSE) project, we are implementing a data analytic platform designed to provide public health decision makers with advanced approaches, to jointly analyze maps and geospatial information with healthcare and air pollution data. In this paper we describe a component of such platforms, which couples deep learning analysis of urban geospatial images with healthcare indexes collected by the 500 Cities project. By applying a pre-learned deep Neural Network architecture, satellite images of New York City are analyzed and latent feature variables are extracted. These features are used to derive clusters, which are correlated with healthcare indicators by means of a multivariate classification model. Thanks to this pipeline, it is possible to show that, in New York City, health care indexes are significantly correlated to the urban landscape. This pipeline can serve as a basis to ease urban planning, since the same interventions can be organized on similar areas, even if geographically distant.

Subject(s)

Deep Learning , Urban Health , Air Pollution/analysis , Cities , Cluster Analysis , Databases, Factual , Delivery of Health Care , Humans , Satellite Imagery

15.

Supervised methods to extract clinical events from cardiology reports in Italian.

Viani, Natalia; Miller, Timothy A; Napolitano, Carlo; Priori, Silvia G; Savova, Guergana K; Bellazzi, Riccardo; Sacchi, Lucia.

J Biomed Inform ; 95: 103219, 2019 07.

Article in English | MEDLINE | ID: mdl-31150777

ABSTRACT

Clinical narratives are a valuable source of information for both patient care and biomedical research. Given the unstructured nature of medical reports, specific automatic techniques are required to extract relevant entities from such texts. In the natural language processing (NLP) community, this task is often addressed by using supervised methods. To develop such methods, both reliably-annotated corpora and elaborately designed features are needed. Despite the recent advances on corpora collection and annotation, research on multiple domains and languages is still limited. In addition, to compute the features required for supervised classification, suitable language- and domain-specific tools are needed. In this work, we propose a novel application of recurrent neural networks (RNNs) for event extraction from medical reports written in Italian. To train and evaluate the proposed approach, we annotated a corpus of 75 cardiology reports for a total of 4365 mentions of relevant events and their attributes (e.g., the polarity). For the annotation task, we developed specific annotation guidelines, which are provided together with this paper. The RNN-based classifier was trained on a training set including 3335 events (60 documents). The resulting model was integrated into an NLP pipeline that uses a dictionary lookup approach to search for relevant concepts inside the text. A test set of 1030 events (15 documents) was used to evaluate and compare different pipeline configurations. As a main result, using the RNN-based classifier instead of the dictionary lookup approach allowed increasing recall from 52.4% to 88.9%, and precision from 81.1% to 88.2%. Further, using the two methods in combination, we obtained final recall, precision, and F1 score of 91.7%, 88.6%, and 90.1%, respectively. These experiments indicate that integrating a well-performing RNN-based classifier with a standard knowledge-based approach can be a good strategy to extract information from clinical text in non-English languages.

Subject(s)

Data Mining/methods , Electronic Health Records , Natural Language Processing , Heart Diseases , Humans , Italy , Neural Networks, Computer , Semantics

16.

Taste receptors, innate immunity and longevity: the case of TAS2R16 gene.

Malovini, Alberto; Accardi, Giulia; Aiello, Anna; Bellazzi, Riccardo; Candore, Giuseppina; Caruso, Calogero; Ligotti, Mattia Emanuela; Maciag, Anna; Villa, Francesco; Puca, Annibale A.

Immun Ageing ; 16: 5, 2019.

Article in English | MEDLINE | ID: mdl-30833980

ABSTRACT

BACKGROUND: Innate immunity utilizes components of sensory signal transduction such as bitter and sweet taste receptors. In fact, empirical evidence has shown bitter and sweet taste receptors to be an integral component of antimicrobial immune response in upper respiratory tract infections. Since an efficient immune response plays a key role in the attainment of longevity, it is not surprising that the rs978739 polymorphism of the bitter taste receptor TAS2R16 gene has been shown to be associated with longevity in a population of 941 individuals ranging in age from 20 to 106 years from Calabria (Italy). There are many possible candidate genes for human longevity, however of the many genes tested, only APOE and FOXO3 survived to association in replication studies. So, it is necessary to validate in other studies genes proposed to be associated with longevity. Thus, we analysed the association of the quoted polymorphism in a population of long lived individuals (LLIs) and controls from another Italian population from Cilento. METHODS: The analysis has been performed on data previously obtained with genome-wide association study on a population of LLIs (age range 90-109 years) and young controls (age range 18-45 years) from Cilento (Italy). RESULTS: Statistical power calculations showed that the analysed cohort represented by 410 LLIs and 553 young controls was sufficiently powered to replicate the association between rs978739 and the longevity phenotype according to the effect size and frequencies described in the previous paper, under a dominant and additive genetic model. However, no evidence of association between rs978739 and the longevity phenotype was observed according to the additive or dominant model. CONCLUSION: There are several reasons for the failure of the confirmation of a previous study. However, the differences between the two studies in terms of environment of the population adopted and of the criteria of inclusion have made difficult the replication of the findings.

17.

What do healthcare professionals need to turn risk models for type 2 diabetes into usable computerized clinical decision support systems? Lessons learned from the MOSAIC project.

Fico, Giuseppe; Hernanzez, Liss; Cancela, Jorge; Dagliati, Arianna; Sacchi, Lucia; Martinez-Millana, Antonio; Posada, Jorge; Manero, Lidia; Verdú, Jose; Facchinetti, Andrea; Ottaviano, Manuel; Zarkogianni, Konstantia; Nikita, Konstantina; Groop, Leif; Gabriel-Sanchez, Rafael; Chiovato, Luca; Traver, Vicente; Merino-Torres, Juan Francisco; Cobelli, Claudio; Bellazzi, Riccardo; Arredondo, Maria Teresa.

BMC Med Inform Decis Mak ; 19(1): 163, 2019 08 16.

Article in English | MEDLINE | ID: mdl-31419982

ABSTRACT

BACKGROUND: To understand user needs, system requirements and organizational conditions towards successful design and adoption of Clinical Decision Support Systems for Type 2 Diabetes (T2D) care built on top of computerized risk models. METHODS: The holistic and evidence-based CEHRES Roadmap, used to create eHealth solutions through participatory development approach, persuasive design techniques and business modelling, was adopted in the MOSAIC project to define the sequence of multidisciplinary methods organized in three phases, user needs, implementation and evaluation. The research was qualitative, the total number of participants was ninety, about five-seventeen involved in each round of experiment. RESULTS: Prediction models for the onset of T2D are built on clinical studies, while for T2D care are derived from healthcare registries. Accordingly, two set of DSSs were defined: the first, T2D Screening, introduces a novel routine; in the second case, T2D Care, DSSs can support managers at population level, and daily practitioners at individual level. In the user needs phase, T2D Screening and solution T2D Care at population level share similar priorities, as both deal with risk-stratification. End-users of T2D Screening and solution T2D Care at individual level prioritize easiness of use and satisfaction, while managers prefer the tools to be available every time and everywhere. In the implementation phase, three Use Cases were defined for T2D Screening, adapting the tool to different settings and granularity of information. Two Use Cases were defined around solutions T2D Care at population and T2D Care at individual, to be used in primary or secondary care. Suitable filtering options were equipped with "attractive" visual analytics to focus the attention of end-users on specific parameters and events. In the evaluation phase, good levels of user experience versus bad level of usability suggest that end-users of T2D Screening perceived the potential, but they are worried about complexity. Usability and user experience were above acceptable thresholds for T2D Care at population and T2D Care at individual. CONCLUSIONS: By using a holistic approach, we have been able to understand user needs, behaviours and interactions and give new insights in the definition of effective Decision Support Systems to deal with the complexity of T2D care.

Subject(s)

Decision Support Systems, Clinical , Diabetes Mellitus, Type 2/diagnosis , Diabetes Mellitus, Type 2/etiology , Adult , Aged , Computer Simulation , Female , Humans , Male , Mass Screening , Middle Aged , Risk Assessment , Software , Telemedicine

18.

Patient-Generated Health Data Integration and Advanced Analytics for Diabetes Management: The AID-GM Platform.

Salvi, Elisa; Bosoni, Pietro; Tibollo, Valentina; Kruijver, Lisanne; Calcaterra, Valeria; Sacchi, Lucia; Bellazzi, Riccardo; Larizza, Cristiana.

Sensors (Basel) ; 20(1)2019 Dec 24.

Article in English | MEDLINE | ID: mdl-31878195

ABSTRACT

Diabetes is a high-prevalence disease that leads to an alteration in the patient's blood glucose (BG) values. Several factors influence the subject's BG profile over the day, including meals, physical activity, and sleep. Wearable devices are available for monitoring the patient's BG value around the clock, while activity trackers can be used to record his/her sleep and physical activity. However, few tools are available to jointly analyze the collected data, and only a minority of them provide functionalities for performing advanced and personalized analyses. In this paper, we present AID-GM, a web application that enables the patient to share with his/her diabetologist both the raw BG data collected by a flash glucose monitoring device, and the information collected by activity trackers, including physical activity, heart rate, and sleep. AID-GM provides several data views for summarizing the subject's metabolic control over time, and for complementing the BG profile with the information given by the activity tracker. AID-GM also allows the identification of complex temporal patterns in the collected heterogeneous data. In this paper, we also present the results of a real-world pilot study aimed to assess the usability of the proposed system. The study involved 30 pediatric patients receiving care at the Fondazione IRCCS Policlinico San Matteo Hospital in Pavia, Italy.

Subject(s)

Hyperglycemia/therapy , User-Computer Interface , Adolescent , Algorithms , Blood Glucose/analysis , Child , Disease Management , Female , Humans , Hyperglycemia/blood , Hyperglycemia/pathology , Male , Patients/psychology , Physicians/psychology , Telemedicine , Young Adult

19.

CardioVAI: An automatic implementation of ACMG-AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases.

Nicora, Giovanna; Limongelli, Ivan; Gambelli, Patrick; Memmi, Mirella; Malovini, Alberto; Mazzanti, Andrea; Napolitano, Carlo; Priori, Silvia; Bellazzi, Riccardo.

Hum Mutat ; 39(12): 1835-1846, 2018 12.

Article in English | MEDLINE | ID: mdl-30298955

ABSTRACT

Variant interpretation for the diagnosis of genetic diseases is a complex process. The American College of Medical Genetics and Genomics, with the Association for Molecular Pathology, have proposed a set of evidence-based guidelines to support variant pathogenicity assessment and reporting in Mendelian diseases. Cardiovascular disorders are a field of application of these guidelines, but practical implementation is challenging due to the genetic disease heterogeneity and the complexity of information sources that need to be integrated. Decision support systems able to automate variant interpretation in the light of specific disease domains are demanded. We implemented CardioVAI (Cardio Variant Interpreter), an automated system for guidelines based variant classification in cardiovascular-related genes. Different omics-resources were integrated to assess pathogenicity of every genomic variant in 72 cardiovascular diseases related genes. We validated our method on benchmark datasets of high-confident assessed variants, reaching pathogenicity and benignity concordance up to 83 and 97.08%, respectively. We compared CardioVAI to similar methods and analyzed the main differences in terms of guidelines implementation. We finally made available CardioVAI as a web resource (http://cardiovai.engenome.com/) that allows users to further specialize guidelines recommendations.

Subject(s)

Cardiovascular Diseases/genetics , Genetic Variation , Societies, Medical/organization & administration , Evidence-Based Practice , Genetic Testing , Humans , Practice Guidelines as Topic , Software

20.

Incorporating repeating temporal association rules in Naïve Bayes classifiers for coronary heart disease diagnosis.

Orphanou, Kalia; Dagliati, Arianna; Sacchi, Lucia; Stassopoulou, Athena; Keravnou, Elpida; Bellazzi, Riccardo.

J Biomed Inform ; 81: 74-82, 2018 05.

Article in English | MEDLINE | ID: mdl-29555443

ABSTRACT

In this paper, we develop a Naïve Bayes classification model integrated with temporal association rules (TARs). A temporal pattern mining algorithm is used to detect TARs by identifying the most frequent temporal relationships among the derived basic temporal abstractions (TA). We develop and compare three classifiers that use as features the most frequent TARs as follows: (i) representing the most frequent TARs detected within the target class ('Diseaseâ¯=â¯Present'), (ii) representing the most frequent TARs from both classes ('Diseaseâ¯=â¯Present', 'Diseaseâ¯=â¯Absent'), (iii) representing the most frequent TARs, after removing the ones that are low-risk predictors for the disease. These classifiers incorporate the horizontal support of TARs, which defines the number of times that a particular temporal pattern is found in some patient's record, as their features. All of the developed classifiers are applied for diagnosis of coronary heart disease (CHD) using a longitudinal dataset. We compare two ways of feature representation, using horizontal support or the mean duration of each TAR, on a single patient. The results obtained from this comparison show that the horizontal support representation outperforms the mean duration. The main effort of our research is to demonstrate that where long time periods are of significance in some medical domain, such as the CHD domain, the detection of the repeated occurrences of the most frequent TARs can yield better performances. We compared the classifier that uses the horizontal support representation and has the best performance with a Baseline Classifier which uses the binary representation of the most frequent TARs. The results obtained illustrate the comparatively high performance of the classifier representing the horizontal support, over the Baseline Classifier.

Subject(s)

Coronary Disease/diagnosis , Medical Informatics/methods , Adult , Algorithms , Bayes Theorem , Data Mining , Databases, Factual , Decision Trees , Humans , Middle Aged , Neural Networks, Computer , Pattern Recognition, Automated , Reproducibility of Results , Time Factors

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL