Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 123
Filter
2.
Heliyon ; 10(5): e26973, 2024 Mar 15.
Article in English | MEDLINE | ID: mdl-38455555

ABSTRACT

The COVID-19 pandemic presented an unparalleled challenge to global healthcare systems. A central issue revolves around the urgent need to swiftly amass critical biological and medical knowledge concerning the disease, its treatment, and containment. Remarkably, text data remains an underutilized resource in this context. In this paper, we delve into the extraction of COVID-related relations using transformer-based language models, including Bidirectional Encoder Representations from Transformers (BERT) and DistilBERT. Our analysis scrutinizes the performance of five language models, comparing information from both PubMed and Reddit, and assessing their ability to make novel predictions, including the detection of "misinformation." Key findings reveal that, despite inherent differences, both PubMed and Reddit data contain remarkably similar information, suggesting that Reddit can serve as a valuable resource for rapidly acquiring information during times of crisis. Furthermore, our results demonstrate that language models can unveil previously unseen entities and relations, a crucial aspect in identifying instances of misinformation.

3.
Eur J Hum Genet ; 32(4): 371-372, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37407734
4.
Eur J Hum Genet ; 32(4): 377-378, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37582903
7.
Int J Mol Sci ; 23(21)2022 Oct 29.
Article in English | MEDLINE | ID: mdl-36361936

ABSTRACT

The idea of a digital twin has recently gained widespread attention. While, so far, it has been used predominantly for problems in engineering and manufacturing, it is believed that a digital twin also holds great promise for applications in medicine and health. However, a problem that severely hampers progress in these fields is the lack of a solid definition of the concept behind a digital twin that would be directly amenable for such big data-driven fields requiring a statistical data analysis. In this paper, we address this problem. We will see that the term 'digital twin', as used in the literature, is like a Matryoshka doll. For this reason, we unstack the concept via a data-centric machine learning perspective, allowing us to define its main components. As a consequence, we suggest to use the term Digital Twin System instead of digital twin because this highlights its complex interconnected substructure. In addition, we address ethical concerns that result from treatment suggestions for patients based on simulated data and a possible lack of explainability of the underling models.


Subject(s)
Machine Learning , Research Design , Humans , Big Data
8.
NPJ Syst Biol Appl ; 8(1): 40, 2022 10 21.
Article in English | MEDLINE | ID: mdl-36271093

ABSTRACT

High-throughput omics experiments provide a wealth of data for exploring biomedical questions and for advancing translational research. However, despite this great potential, results that enter the clinical practice are scarce even twenty years after the completion of the human genome project. For this reason in this paper, we revisit problems with scientific discovery commonly summarized under the term reproducibility crisis. We will argue that the major problem that hampers progress in translational research is threefold. First, in order to establish biological foundations of disorders or general complex phenotypes, one needs to embrace emergence. Second, there seems to be confusion about the underlying hypotheses tested by omics studies. Third, most contemporary omics studies are designed to perform what can be seen as incremental corroborations of a hypothesis. In order to improve upon these shortcomings, we define a severe testing framework (STF) that can be applied to a large number of omics studies for enhancing scientific discovery in the biomedical sciences. Briefly, STF provides systematic means to trim wild-grown omics studies in a constructive way.


Subject(s)
Translational Research, Biomedical , Humans , Reproducibility of Results , Phenotype
10.
Sci Rep ; 12(1): 8529, 2022 05 20.
Article in English | MEDLINE | ID: mdl-35595821

ABSTRACT

In recent years there is a data surge of industrial and business data. This posses opportunities and challenges at the same time because the wealth of information is usually buried in complex and frequently disconnected data sets. Predictive maintenance utilizes such data for developing prognostic and diagnostic models that allow the optimization of the life cycle of machine components. In this paper, we address the modeling of the prognostics of machine components from mobile work equipment. Specifically, we are estimating survival curves and hazard rates using parametric and non-parametric models to characterize time dependent failure probabilities of machine components. As a result, we find the presence of different types of censoring masking the presence of different populations that can cause severe problems for statistical estimators and the interpretations of results. Furthermore, we show that the obtained hazard functions for different machine components are complex and versatile and are best modeled via non-parametric estimators. However, notable exceptions for individual machine components can be found amenable for a Generalized-gamma and Weibull model.


Subject(s)
Models, Statistical , Probability , Prognosis , Survival Analysis
11.
Cancers (Basel) ; 13(20)2021 Oct 12.
Article in English | MEDLINE | ID: mdl-34680236

ABSTRACT

Prognostic biomarkers can have an important role in the clinical practice because they allow stratification of patients in terms of predicting the outcome of a disorder. Obstacles for developing such markers include lack of robustness when using different data sets and limited concordance among similar signatures. In this paper, we highlight a new problem that relates to the biological meaning of already established prognostic gene expression signatures. Specifically, it is commonly assumed that prognostic markers provide sensible biological information and molecular explanations about the underlying disorder. However, recent studies on prognostic biomarkers investigating 80 established signatures of breast and prostate cancer demonstrated that this is not the case. We will show that this surprising result is related to the distinction between causal models and predictive models and the obfuscating usage of these models in the biomedical literature. Furthermore, we suggest a falsification procedure for studies aiming to establish a prognostic signature to safeguard against false expectations with respect to biological utility.

12.
Front Genet ; 12: 649429, 2021.
Article in English | MEDLINE | ID: mdl-34367234

ABSTRACT

High-throughput technologies do not only provide novel means for basic biological research but also for clinical applications in hospitals. For instance, the usage of gene expression profiles as prognostic biomarkers for predicting cancer progression has found widespread interest. Aside from predicting the progression of patients, it is generally believed that such prognostic biomarkers also provide valuable information about disease mechanisms and the underlying molecular processes that are causal for a disorder. However, the latter assumption has been challenged. In this paper, we study this problem for prostate cancer. Specifically, we investigate a large number of previously published prognostic signatures of prostate cancer based on gene expression profiles and show that none of these can provide unique information about the underlying disease etiology of prostate cancer. Hence, our analysis reveals that none of the studied signatures has a sensible biological meaning. Overall, this shows that all studied prognostic signatures are merely black-box models allowing sensible predictions of prostate cancer outcome but are not capable of providing causal explanations to enhance the understanding of prostate cancer.

14.
Front Artif Intell ; 4: 576892, 2021.
Article in English | MEDLINE | ID: mdl-34195608

ABSTRACT

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.

15.
Front Big Data ; 4: 591749, 2021.
Article in English | MEDLINE | ID: mdl-33969290

ABSTRACT

The ultimate goal of the social sciences is to find a general social theory encompassing all aspects of social and collective phenomena. The traditional approach to this is very stringent by trying to find causal explanations and models. However, this approach has been recently criticized for preventing progress due to neglecting prediction abilities of models that support more problem-oriented approaches. The latter models would be enabled by the surge of big Web-data currently available. Interestingly, this problem cannot be overcome with methods from computational social science (CSS) alone because this field is dominated by simulation-based approaches and descriptive models. In this article, we address this issue and argue that the combination of big social data with social networks is needed for creating prediction models. We will argue that this alliance has the potential for gradually establishing a causal social theory. In order to emphasize the importance of integrating big social data with social networks, we call this approach data-driven computational social network science (DD-CSNS).

16.
PLoS One ; 16(3): e0245728, 2021.
Article in English | MEDLINE | ID: mdl-33735225

ABSTRACT

At the beginning of 2020, the COVID-19 pandemic was able to spread quickly in Wuhan and in the province of Hubei due to a lack of experience with this novel virus. Additionally, authories had no proven experience with applying insufficient medical, communication and crisis management tools. For a considerable period of time, the actual number of people infected was unknown. There were great uncertainties regarding the dynamics and spread of the Covid-19 virus infection. In this paper, we develop a system dynamics model for the three connected regions (Wuhan, Hubei excl. Wuhan, China excl. Hubei) to understand the infection and spread dynamics of the virus and provide a more accurate estimate of the number of infected people in Wuhan and discuss the necessity and effectivity of protective measures against this epidemic, such as the quarantines imposed throughout China. We use the statistics of confirmed cases of China excl. Hubei. Also the daily data on travel activity within China was utilized, in order to determine the actual numerical development of the infected people in Wuhan City and Hubei Province. We used a multivariate Monte Carlo optimization to parameterize the model to match the official statistics. In particular, we used the model to calculate the infections, which had already broken out, but were not diagnosed for various reasons.


Subject(s)
COVID-19/epidemiology , Algorithms , COVID-19/prevention & control , COVID-19/transmission , China/epidemiology , Humans , Models, Statistical , Monte Carlo Method , Pandemics , Quarantine , SARS-CoV-2/isolation & purification , Travel
17.
Sci Rep ; 11(1): 156, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33420139

ABSTRACT

The identification of prognostic biomarkers for predicting cancer progression is an important problem for two reasons. First, such biomarkers find practical application in a clinical context for the treatment of patients. Second, interrogation of the biomarkers themselves is assumed to lead to novel insights of disease mechanisms and the underlying molecular processes that cause the pathological behavior. For breast cancer, many signatures based on gene expression values have been reported to be associated with overall survival. Consequently, such signatures have been used for suggesting biological explanations of breast cancer and drug mechanisms. In this paper, we demonstrate for a large number of breast cancer signatures that such an implication is not justified. Our approach eliminates systematically all traces of biological meaning of signature genes and shows that among the remaining genes, surrogate gene sets can be formed with indistinguishable prognostic prediction capabilities and opposite biological meaning. Hence, our results demonstrate that none of the studied signatures has a sensible biological interpretation or meaning with respect to disease etiology. Overall, this shows that prognostic signatures are black-box models with sensible predictions of breast cancer outcome but no value for revealing causal connections. Furthermore, we show that the number of such surrogate gene sets is not small but very large.


Subject(s)
Biomarkers, Tumor/genetics , Breast Neoplasms/genetics , Biomarkers, Tumor/metabolism , Breast Neoplasms/diagnosis , Breast Neoplasms/metabolism , Breast Neoplasms/mortality , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Prognosis , Transcriptome
19.
Sci Rep ; 10(1): 16672, 2020 10 07.
Article in English | MEDLINE | ID: mdl-33028846

ABSTRACT

Gene ontology (GO) is an eminent knowledge base frequently used for providing biological interpretations for the analysis of genes or gene sets from biological, medical and clinical problems. Unfortunately, the interpretation of such results is challenging due to the large number of GO terms, their hierarchical and connected organization as directed acyclic graphs (DAGs) and the lack of tools allowing to exploit this structural information explicitly. For this reason, we developed the R package GOxploreR. The main features of GOxploreR are (I) easy and direct access to structural features of GO, (II) structure-based ranking of GO-terms, (III) mapping to reduced GO-DAGs including visualization capabilities and (IV) prioritizing of GO-terms. The underlying idea of GOxploreR is to exploit a graph-theoretical perspective of GO as manifested by its DAG-structure and the containing hierarchy levels for cumulating semantic information. That means all these features enhance the utilization of structural information of GO and complement existing analysis tools. Overall, GOxploreR provides exploratory as well as confirmatory tools for complementing any kind of analysis resulting in a list of GO-terms, e.g., from differentially expressed genes or gene sets, GWAS or biomarkers. Our R package GOxploreR is freely available from CRAN.


Subject(s)
Databases, Genetic , Gene Ontology , Software , Humans
20.
Front Cell Dev Biol ; 8: 673, 2020.
Article in English | MEDLINE | ID: mdl-32984300

ABSTRACT

The number of scientific publications in the literature is steadily growing, containing our knowledge in the biomedical, health, and clinical sciences. Since there is currently no automatic archiving of the obtained results, much of this information remains buried in textual details not readily available for further usage or analysis. For this reason, natural language processing (NLP) and text mining methods are used for information extraction from such publications. In this paper, we review practices for Named Entity Recognition (NER) and Relation Detection (RD), allowing, e.g., to identify interactions between proteins and drugs or genes and diseases. This information can be integrated into networks to summarize large-scale details on a particular biomedical or clinical problem, which is then amenable for easy data management and further analysis. Furthermore, we survey novel deep learning methods that have recently been introduced for such tasks.

SELECTION OF CITATIONS
SEARCH DETAIL
...