Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 196
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37248747

ABSTRACT

Human Phenotype Ontology (HPO)-based approaches have gained popularity in recent times as a tool for genomic diagnostics of rare diseases. However, these approaches do not make full use of the available information on disease and patient phenotypes. We present a new method called Phen2Disease, which utilizes the bidirectional maximum matching semantic similarity between two phenotype sets of patients and diseases to prioritize diseases and genes. Our comprehensive experiments have been conducted on six real data cohorts with 2051 cases (Cohort 1, n = 384; Cohort 2, n = 281; Cohort 3, n = 185; Cohort 4, n = 784; Cohort 5, n = 208; and Cohort 6, n = 209) and two simulated data cohorts with 1000 cases. The results of the experiments showed that Phen2Disease outperforms the three state-of-the-art methods when only phenotype information and HPO knowledge base are used, particularly in cohorts with fewer average numbers of HPO terms. We also observed that patients with higher information content scores have more specific information, leading to more accurate predictions. Moreover, Phen2Disease provides high interpretability with ranked diseases and patient HPO terms presented. Our method provides a novel approach to utilizing phenotype data for genomic diagnostics of rare diseases, with potential for clinical impact. Phen2Disease is freely available on GitHub at https://github.com/ZhuLab-Fudan/Phen2Disease.


Subject(s)
Biological Ontologies , Rare Diseases , Humans , Semantics , Genomics , Phenotype
2.
BMC Genomics ; 25(1): 869, 2024 Sep 16.
Article in English | MEDLINE | ID: mdl-39285315

ABSTRACT

BACKGROUND: Bio-ontologies are keys in structuring complex biological information for effective data integration and knowledge representation. Semantic similarity analysis on bio-ontologies quantitatively assesses the degree of similarity between biological concepts based on the semantics encoded in ontologies. It plays an important role in structured and meaningful interpretations and integration of complex data from multiple biological domains. RESULTS: We present simona, a novel R package for semantic similarity analysis on general bio-ontologies. Simona implements infrastructures for ontology analysis by offering efficient data structures, fast ontology traversal methods, and elegant visualizations. Moreover, it provides a robust toolbox supporting over 70 methods for semantic similarity analysis. With simona, we conducted a benchmark against current semantic similarity methods. The results demonstrate methods are clustered based on their mathematical methodologies, thus guiding researchers in the selection of appropriate methods. Additionally, we explored annotation-based versus topology-based methods, revealing that semantic similarities solely based on ontology topology can efficiently reveal semantic similarity structures, facilitating analysis on less-studied organisms and other ontologies. CONCLUSIONS: Simona offers a versatile interface and efficient implementation for processing, visualization, and semantic similarity analysis on bio-ontologies. We believe that simona will serve as a robust tool for uncovering relationships and enhancing the interoperability of biological knowledge systems.


Subject(s)
Biological Ontologies , Semantics , Software , Computational Biology/methods
3.
Hum Brain Mapp ; 45(2): e26603, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38339900

ABSTRACT

Reading, naming, and repetition are classical neuropsychological tasks widely used in the clinic and psycholinguistic research. While reading and repetition can be accomplished by following a direct or an indirect route, pictures can be named only by means of semantic mediation. By means of fMRI multivariate pattern analysis, we evaluated whether this well-established fundamental difference at the cognitive level is associated at the brain level with a difference in the degree to which semantic representations are activated during these tasks. Semantic similarity between words was estimated based on a word association model. Twenty subjects participated in an event-related fMRI study where the three tasks were presented in pseudo-random order. Linear discriminant analysis of fMRI patterns identified a set of regions that allow to discriminate between words at a high level of word-specificity across tasks. Representational similarity analysis was used to determine whether semantic similarity was represented in these regions and whether this depended on the task performed. The similarity between neural patterns of the left Brodmann area 45 (BA45) and of the superior portion of the left supramarginal gyrus correlated with the similarity in meaning between entities during picture naming. In both regions, no significant effects were seen for repetition or reading. The semantic similarity effect during picture naming was significantly larger than the similarity effect during the two other tasks. In contrast, several regions including left anterior superior temporal gyrus and left ventral BA44/frontal operculum, among others, coded for semantic similarity in a task-independent manner. These findings provide new evidence for the dynamic, task-dependent nature of semantic representations in the left BA45 and a more task-independent nature of the representational activation in the lateral temporal cortex and ventral BA44/frontal operculum.


Subject(s)
Reading , Semantics , Humans , Brain Mapping , Temporal Lobe/diagnostic imaging , Temporal Lobe/physiology , Brain , Magnetic Resonance Imaging
4.
Brief Bioinform ; 23(5)2022 09 20.
Article in English | MEDLINE | ID: mdl-35901452

ABSTRACT

Measuring the semantic similarity between Gene Ontology (GO) terms is a fundamental step in numerous functional bioinformatics applications. To fully exploit the metadata of GO terms, word embedding-based methods have been proposed recently to map GO terms to low-dimensional feature vectors. However, these representation methods commonly overlook the key information hidden in the whole GO structure and the relationship between GO terms. In this paper, we propose a novel representation model for GO terms, named GT2Vec, which jointly considers the GO graph structure obtained by graph contrastive learning and the semantic description of GO terms based on BERT encoders. Our method is evaluated on a protein similarity task on a collection of benchmark datasets. The experimental results demonstrate the effectiveness of using a joint encoding graph structure and textual node descriptors to learn vector representations for GO terms.


Subject(s)
Computational Biology , Semantics , Computational Biology/methods , Gene Ontology , Metadata
5.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35136916

ABSTRACT

The gene ontology (GO) provides a hierarchical structure with a controlled vocabulary composed of terms describing functions and localization of gene products. Recent works propose vector representations, also known as embeddings, of GO terms that capture meaningful information about them. Significant performance improvements have been observed when these representations are used on diverse downstream tasks, such as the measurement of semantic similarity between GO terms and functional similarity between proteins. Despite the success shown by these approaches, existing embeddings of GO terms still fail to capture crucial structural features of the GO. Here, we present anc2vec, a novel protocol based on neural networks for constructing vector representations of GO terms by preserving three important ontological features: its ontological uniqueness, ancestors hierarchy and sub-ontology membership. The advantages of using anc2vec are demonstrated by systematic experiments on diverse tasks: visualization, sub-ontology prediction, inference of structurally related terms, retrieval of terms from aggregated embeddings, and prediction of protein-protein interactions. In these tasks, experimental results show that the performance of anc2vec representations is better than those of recent approaches. This demonstrates that higher performances on diverse tasks can be achieved by embeddings when the structure of the GO is better represented. Full source code and data are available at https://github.com/sinc-lab/anc2vec.


Subject(s)
Semantics , Software , Computational Biology/methods , Gene Ontology , Neural Networks, Computer , Proteins/genetics
6.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35731990

ABSTRACT

BACKGROUND: Angiogenesis is regulated by multiple genes whose variants can lead to different disorders. Among them, rare diseases are a heterogeneous group of pathologies, most of them genetic, whose information may be of interest to determine the still unknown genetic and molecular causes of other diseases. In this work, we use the information on rare diseases dependent on angiogenesis to investigate the genes that are associated with this biological process and to determine if there are interactions between the genes involved in its deregulation. RESULTS: We propose a systemic approach supported by the use of pathological phenotypes to group diseases by semantic similarity. We grouped 158 angiogenesis-related rare diseases in 18 clusters based on their phenotypes. Of them, 16 clusters had traceable gene connections in a high-quality interaction network. These disease clusters are associated with 130 different genes. We searched for genes associated with angiogenesis througth ClinVar pathogenic variants. Of the seven retrieved genes, our system confirms six of them. Furthermore, it allowed us to identify common affected functions among these disease clusters. AVAILABILITY: https://github.com/ElenaRojano/angio_cluster. CONTACT: seoanezonjic@uma.es and elenarojano@uma.es.


Subject(s)
Computational Biology , Rare Diseases , Algorithms , Cluster Analysis , Humans , Phenotype , Rare Diseases/genetics , Semantics
7.
BMC Bioinformatics ; 24(1): 171, 2023 Apr 26.
Article in English | MEDLINE | ID: mdl-37101154

ABSTRACT

BACKGROUND: Complex diseases such as neurodevelopmental disorders (NDDs) exhibit multiple etiologies. The multi-etiological nature of complex-diseases emerges from distinct but functionally similar group of genes. Different diseases sharing genes of such groups show related clinical outcomes that further restrict our understanding of disease mechanisms, thus, limiting the applications of personalized medicine approaches to complex genetic disorders. RESULTS: Here, we present an interactive and user-friendly application, called DGH-GO. DGH-GO allows biologists to dissect the genetic heterogeneity of complex diseases by stratifying the putative disease-causing genes into clusters that may contribute to distinct disease outcome development. It can also be used to study the shared etiology of complex-diseases. DGH-GO creates a semantic similarity matrix for the input genes by using Gene Ontology (GO). The resultant matrix can be visualized in 2D plots using different dimension reduction methods (T-SNE, Principal component analysis, umap and Principal coordinate analysis). In the next step, clusters of functionally similar genes are identified from genes functional similarities assessed through GO. This is achieved by employing four different clustering methods (K-means, Hierarchical, Fuzzy and PAM). The user may change the clustering parameters and explore their effect on stratification immediately. DGH-GO was applied to genes disrupted by rare genetic variants in Autism Spectrum Disorder (ASD) patients. The analysis confirmed the multi-etiological nature of ASD by identifying four clusters of genes that were enriched for distinct biological mechanisms and clinical outcome. In the second case study, the analysis of genes shared by different NDDs showed that genes causing multiple disorders tend to aggregate in similar clusters, indicating a possible shared etiology. CONCLUSION: DGH-GO is a user-friendly application that allows biologists to study the multi-etiological nature of complex diseases by dissecting their genetic heterogeneity. In summary, functional similarities, dimension reduction and clustering methods, coupled with interactive visualization and control over analysis allows biologists to explore and analyze their datasets without requiring expert knowledge on these methods. The source code of proposed application is available at https://github.com/Muh-Asif/DGH-GO.


Subject(s)
Autism Spectrum Disorder , Genetic Heterogeneity , Humans , Gene Ontology , Autism Spectrum Disorder/genetics , Software
8.
Brief Bioinform ; 22(4)2021 07 20.
Article in English | MEDLINE | ID: mdl-33049044

ABSTRACT

Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.


Subject(s)
Biological Ontologies , Machine Learning , Models, Biological , Semantics
9.
Cogn Neuropsychol ; 40(3-4): 167-185, 2023.
Article in English | MEDLINE | ID: mdl-38006205

ABSTRACT

Feature generation tasks and feature databases are important for understanding how knowledge is organized in semantic memory, as they reflect not only the kinds of information that individuals hold about objects but also how objects are conceptually represented. Traditionally, semantic norms focus on a variety of object categories and, as a result, have a small number of concepts per semantic category. Here, our main goal is to create a more fine-grained feature database exclusively for one category of objects-manipulable objects. This database contributes to the understanding of within-category, content-specific processing. To achieve this, we asked 130 participants to freely generate features for 80 manipulable objects and another group of 32 participants to generate action features for the same objects. We then compared our databases with other published semantic norms and found high similarity between them. In our databases, we calculated the similarity between objects in terms of visual, functional, encyclopaedic, and action feature types using Spearman correlation, Baker's gamma index, and cophenetic correlation. We discovered that objects were grouped in a distinctive and meaningful way according to feature type. Finally, we tested the validity of our databases by asking three groups of participants to perform a feature verification experiment while manipulating production frequency. Our results demonstrate that participants can recognize and associate the features of our databases with specific manipulable objects. Participants were faster to verify high-frequency features than low-frequency features. Overall, our data provide important insights into how we process manipulable objects and can be used to further inform cognitive and neural theories of object processing and identification.


Subject(s)
Memory , Semantics , Humans
10.
Memory ; 31(10): 1306-1319, 2023 11.
Article in English | MEDLINE | ID: mdl-37743561

ABSTRACT

It is widely assumed that autobiographical memory relies on an integration of episodic memory with the self-model. We hypothesise that self-memory integration depends critically on self-congruence. More specifically, self-incongruent experiences such as those that elicit shame or guilt may be more difficult to integrate. Self-incongruence may affect both the semantic reports of memories and their phenomenological characteristics, in particular their visual perspective (1PP or 3PP, i.e., field or observer perspective), their affective valence, and their perceived centrality. Diary based memories were assigned to 4 categories (shame, guilt, negative, neutral) and were rated for the different phenomenological dimensions. We used a deep neural network, univariate and multilevel models to assess differences and relationships between different variables. We found that memories that elicited shame (but not guilt) showed more pronounced 3PP as compared to other experiences. Shameful episodes also elicited the most pronounced negative affect. A multilevel analysis revealed that the amount of shame that an episode elicited, and its semantic similarity with shame episodes, predicted higher 3PP, while affective valence did not. Our results show that self-incongruence affects memories both at the level of their semantic reports and their phenomenology, and thus contributes to a mechanistic understanding of self-memory integration.


Subject(s)
Memory, Episodic , Humans , Emotions , Mental Recall
11.
Behav Res Methods ; 55(7): 3416-3432, 2023 10.
Article in English | MEDLINE | ID: mdl-36131199

ABSTRACT

Experimental design and computational modelling across the cognitive sciences often rely on measures of semantic similarity between concepts. Traditional measures of semantic similarity are typically derived from distance in taxonomic databases (e.g. WordNet), databases of participant-produced semantic features, or corpus-derived linguistic distributional similarity (e.g. CBOW), all of which are theoretically problematic in their lack of grounding in sensorimotor experience. We present a new measure of sensorimotor distance between concepts, based on multidimensional comparisons of their experiential strength across 11 perceptual and action-effector dimensions in the Lancaster Sensorimotor Norms. We demonstrate that, in modelling human similarity judgements, sensorimotor distance has comparable explanatory power to other measures of semantic similarity, explains variance in human judgements which is missed by other measures, and does so with the advantages of remaining both grounded and computationally efficient. Moreover, sensorimotor distance is equally effective for both concrete and abstract concepts. We further introduce a web-based tool ( https://lancaster.ac.uk/psychology/smdistance ) for easily calculating and visualising sensorimotor distance between words, featuring coverage of nearly 800 million word pairs. Supplementary materials are available at https://osf.io/d42q6/ .


Subject(s)
Linguistics , Semantics , Humans , Concept Formation , Cognitive Science , Data Management
12.
BMC Bioinformatics ; 23(1): 56, 2022 Feb 01.
Article in English | MEDLINE | ID: mdl-35105306

ABSTRACT

BACKGROUND: Besides Boolean retrieval with medical subject headings (MeSH), PubMed provides users with an alternative way called "Related Articles" to access and collect relevant documents based on semantic similarity. To explore the functionality more efficiently and more accurately, we proposed an improved algorithm by measuring the semantic similarity of PubMed citations based on the MeSH-concept network model. RESULTS: Three article similarity networks are obtained using MeSH-concept random walk with restart (MCRWR), MeSH random walk with restart (MRWR) and PubMed related article (PMRA) respectively. The area under receiver operating characteristic (ROC) curve of MCRWR, MRWR and PMRA is 0.93, 0.90, and 0.67 respectively. Precisions of MCRWR and MRWR under various similarity thresholds are higher than that of PMRA. Mean value of P5 of MCRWR is 0.742, which is much higher than those of MRWR (0.692) and PMRA (0.223). In the article semantic similarity network of "Genes & Function of organ & Disease" based on MCRWR algorithm, four topics are identified according to golden standards. CONCLUSION: MeSH-concept random walk with restart algorithm has better performance in constructing article semantic similarity network, which can reveal the implicitly semantic association between documents. The efficiency and accuracy of retrieving semantic-related documents have been improved a lot.


Subject(s)
Medical Subject Headings , Semantic Web , Algorithms , PubMed , Semantics
13.
BMC Bioinformatics ; 23(Suppl 2): 433, 2022 Dec 12.
Article in English | MEDLINE | ID: mdl-36510133

ABSTRACT

BACKGROUND: Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure. CONCLUSION: Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions.


Subject(s)
Computational Biology , Semantics , Humans , Gene Ontology , Molecular Sequence Annotation , Computational Biology/methods , Databases, Protein , Proteins/chemistry
14.
BMC Bioinformatics ; 23(1): 23, 2022 Jan 06.
Article in English | MEDLINE | ID: mdl-34991460

ABSTRACT

BACKGROUND: Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS: To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS: We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.


Subject(s)
Biological Ontologies , Semantics , Medical Subject Headings , Reproducibility of Results , Systematized Nomenclature of Medicine
15.
Brief Bioinform ; 21(1): 355-367, 2020 Jan 17.
Article in English | MEDLINE | ID: mdl-30452543

ABSTRACT

Coeliac disease (CD) is a complex, multifactorial pathology caused by different factors, such as nutrition, immunological response and genetic factors. Many autoimmune diseases are comorbidities for CD, and a comprehensive and integrated analysis with bioinformatics approaches can help in evaluating the interconnections among all the selected pathologies. We first performed a detailed survey of gene expression data available in public repositories on CD and less commonly considered comorbidities. Then we developed an innovative pipeline that integrates gene expression, cell-type data and online resources (e.g. a list of comorbidities from the literature), using bioinformatics methods such as gene set enrichment analysis and semantic similarity. Our pipeline is written in R language, available at the following link: http://bioinformatica.isa.cnr.it/COELIAC_DISEASE/SCRIPTS/. We found a list of common differential expressed genes, gene ontology terms and pathways among CD and comorbidities and the closeness among the selected pathologies by means of disease ontology terms. Physicians and other researchers, such as molecular biologists, systems biologists and pharmacologists can use it to analyze pathology in detail, from differential expressed genes to ontologies, performing a comparison with the pathology comorbidities or with other diseases.

16.
J Biomed Inform ; 132: 104109, 2022 08.
Article in English | MEDLINE | ID: mdl-35660521

ABSTRACT

OBJECTIVE: Accurately assigning phenotype information to individual patients via computational phenotyping using Electronic Health Records (EHRs) has been seen as the first step towards enabling EHRs for precision medicine research. Chart review labels annotated by clinical experts, also known as "gold standard" labels, are essential for the development and validation of computational phenotyping algorithms. However, given the complexity of EHR systems, the process of chart review is both labor intensive and time consuming. We propose a fully automated algorithm, referred to as pGUESS, to rank EHR notes according to their relevance to a given phenotype. By identifying the most relevant notes, pGUESS can greatly improve the efficiency and accuracy of chart reviews. METHOD: pGUESS uses prior guided semantic similarity to measure the informativeness of a clinical note to a given phenotype. We first select candidate clinical concepts from a pool of comprehensive medical concepts using public knowledge sources and then derive the semantic embedding vector (SEV) for a reference article (SEVref) and each note (SEVnote). The algorithm scores the relevance of a note as the cosine similarity between SEVnote and SEVref. RESULTS: The algorithm was validated against four sets of 200 notes that were manually annotated by clinical experts to assess their informativeness to one of three disease phenotypes. pGUESS algorithm substantially outperforms existing unsupervised approaches for classifying the relevance status with respect to both accuracy and scalability across phenotypes. Averaging over the three phenotypes, the rank correlation between the algorithm ranking and gold standard label was 0.64 for pGUESS, but only 0.47 and 0.35 for the next two best performing algorithms. pGUESS is also much more computationally scalable compared to existing algorithms. CONCLUSION: pGUESS algorithm can substantially reduce the burden of chart review and holds potential in improving the efficiency and accuracy of human annotation.


Subject(s)
Algorithms , Semantics , Electronic Health Records , Humans , Natural Language Processing , Phenotype , Precision Medicine
17.
BMC Med Inform Decis Mak ; 22(1): 33, 2022 02 05.
Article in English | MEDLINE | ID: mdl-35123470

ABSTRACT

BACKGROUND: Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance 'patient-like me' analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. METHODS: We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). RESULTS: 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. CONCLUSION: We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.


Subject(s)
Rare Diseases , Semantics , Humans , Phenotype , ROC Curve
18.
Behav Res Methods ; 54(5): 2364-2380, 2022 10.
Article in English | MEDLINE | ID: mdl-35088365

ABSTRACT

We collected visual and semantic similarity norms for a set of photographic images comprising 120 recognizable objects/animals and 120 indoor/outdoor scenes. Human observers rated the similarity of pairs of images within four categories of stimuli-inanimate objects, animals, indoor scenes and outdoor scenes-via Amazon's Mechanical Turk. We performed multidimensional scaling (MDS) on the collected similarity ratings to visualize the perceived similarity for each image category, for both visual and semantic ratings. The MDS solutions revealed the expected similarity relationships between images within each category, along with intuitively sensible differences between visual and semantic similarity relationships for each category. Stress tests performed on the MDS solutions indicated that the MDS analyses captured meaningful levels of variance in the similarity data. These stimuli, associated norms and naming data are made available to all researchers, and should provide a useful resource for researchers of vision, memory and conceptual knowledge wishing to run experiments using well-parameterized stimulus sets.


Subject(s)
Pattern Recognition, Visual , Semantics , Humans , Animals
19.
Multimed Syst ; 28(3): 1039-1058, 2022.
Article in English | MEDLINE | ID: mdl-35153387

ABSTRACT

Nowadays, multimedia big data have grown exponentially in diverse applications like social networks, transportation, health, and e-commerce, etc. Accessing preferred data in large-scale datasets needs efficient and sophisticated retrieval approaches. Multimedia big data consists of the most significant features with different types of data. Even though the multimedia supports various data formats with corresponding storage frameworks, similar semantic information is expressed by the multimedia. The overlap of semantic features is most efficient for theory and research related to semantic memory. Correspondingly, in recent years, deep multimodal hashing gets more attention owing to the efficient performance of huge-scale multimedia retrieval applications. On the other hand, the deep multimodal hashing has limited efforts for exploring the complex multilevel semantic structure. The main intention of this proposal is to develop enhanced deep multimedia big data retrieval with the Adaptive Semantic Similarity Function (A-SSF). The proposed model of this research covers several phases "(a) Data collection, (b) deep feature extraction, (c) semantic feature selection and (d) adaptive similarity function for retrieval. The two main processes of multimedia big data retrieval are training and testing. Once after collecting the dataset involved with video, text, images, and audio, the training phase starts. Here, the deep semantic feature extraction is performed by the Convolutional Neural Network (CNN), which is again subjected to the semantic feature selection process by the new hybrid algorithm termed Spider Monkey-Deer Hunting Optimization Algorithm (SM-DHOA). The final optimal semantic features are stored in the feature library. During testing, selected semantic features are added to the map-reduce framework in the Hadoop environment for handling the big data, thus ensuring the proper big data distribution. Here, the main contribution termed A-SSF is introduced to compute the correlation between the multimedia semantics of the testing data and training data, thus retrieving the data with minimum similarity. Extensive experiments on benchmark multimodal datasets demonstrate that the proposed method can outperform the state-of-the-art performance for all types of data.

20.
BMC Bioinformatics ; 22(1): 357, 2021 Jun 30.
Article in English | MEDLINE | ID: mdl-34193046

ABSTRACT

BACKGROUND: An increasing number of studies have shown that lncRNAs are crucial for the control of hormones and the regulation of various physiological processes in the human body, and deletion mutations in RNA are related to many human diseases. LncRNA- disease association prediction is very useful for understanding pathogenesis, diagnosis, and prevention of diseases, and is helpful for labelling relevant biological information. RESULTS: In this manuscript, we propose a computational model named bidirectional generative adversarial network (BiGAN), which consists of an encoder, a generator, and a discriminator to predict new lncRNA-disease associations. We construct features between lncRNA and disease pairs by utilizing the disease semantic similarity, lncRNA sequence similarity, and Gaussian interaction profile kernel similarities of lncRNAs and diseases. The BiGAN maps the latent features of similarity features to predict unverified association between lncRNAs and diseases. The computational results have proved that the BiGAN performs significantly better than other state-of-the-art approaches in cross-validation. We employed the proposed model to predict candidate lncRNAs for renal cancer and colon cancer. The results are promising. Case studies show that almost 70% of lncRNAs in the top 10 prediction lists are verified by recent biological research. CONCLUSION: The experimental results indicated that our proposed model had an accurate predictive ability for the association of lncRNA-disease pairs.


Subject(s)
Neoplasms , RNA, Long Noncoding , Algorithms , Computational Biology , Humans , Neoplasms/genetics , RNA, Long Noncoding/genetics , Semantics
SELECTION OF CITATIONS
SEARCH DETAIL