Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Entropy (Basel) ; 26(8)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39202097

RESUMEN

When a camera lens is directly faced with a strong light source, image flare commonly occurs, significantly reducing the clarity and texture of the photo and interfering with image processing tasks that rely on visual sensors, such as image segmentation and feature extraction. A novel flare removal network, the Sparse-UFormer neural network, has been developed. The network integrates two core components onto the UFormer architecture: the mixed-scale feed-forward network (MSFN) and top-k sparse attention (TKSA), creating the sparse-transformer module. The MSFN module captures rich multi-scale information, enabling the more effective addressing of flare interference in images. The TKSA module, designed with a sparsity strategy, focuses on key features within the image, thereby significantly enhancing the precision and efficiency of flare removal. Furthermore, in the design of the loss function, besides the conventional flare, background, and reconstruction losses, a structural similarity index loss has been incorporated to ensure the preservation of image details and structure while removing the flare. Ensuring the minimal loss of image information is a fundamental premise for effective image restoration. The proposed method has been demonstrated to achieve state-of-the-art performance on the Flare7K++ test dataset and in challenging real-world scenarios, proving its effectiveness in removing flare artefacts from images.

2.
Res Sq ; 2023 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-37502870

RESUMEN

In conversational query answering systems, context plays a significant role in accurately and meaningfully carrying it forward. In many chatbots, such as in Expedia, the discussion quickly degenerates into circling back to restarting the conversation or to inviting a live agent to intervene because the bot could not grasp the context. Contexts shorten interactions by way of implied query constraints to narrow search and to not repeat them in subsequent queries. In this paper, we introduce a novel way of viewing contexts as a distance function via the concept of query relaxation. We demonstrate that a typed domain distance function is sufficient to model context in a conversation. Our approach is based on the idea of non-monotonic constraint inheritance in a context hierarchy.

3.
PeerJ Comput Sci ; 9: e1725, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38192467

RESUMEN

Answer sorting and filtering are two closely related steps for determining the answer to a question. Answer sorting is designed to produce an ordered list of scores based on Top-k and contextual criteria. Answer filtering optimizes the selection according to other criteria, such as the range of time constraints the user expects. However, the unclear number of answers and time constraints, as well as the high score of false positive results, indicate that the traditional sorting and selection methods cannot guarantee the quality of answers to multi-answer questions. Therefore, this study proposes MATQA, a component based on multi-answer temporal question reasoning, using a re-validation framework to convert the Top-k answer list output by the QA system into a clear number of answer combinations, and a new multi-answer based evaluation index is proposed for this output form. First, the highly correlated subgraph is selected by calculating the scores of the boot node and the related fact node. Second, the subgraph attention inference module is introduced to determine the initial answer with the highest probability. Finally, the alternative answers are clustered at the semantic level and the time constraint level. Meanwhile, the candidate answers with similar types and high scores but do not satisfy the semantic constraints or the time constraints are eliminated to ensure the number and accuracy of final answers. Experiments on the multi-answer TimeQuestions dataset demonstrate the effectiveness of the answer combinations output by MATQA.

4.
J Biomed Semantics ; 13(1): 3, 2022 01 24.
Artículo en Inglés | MEDLINE | ID: mdl-35073996

RESUMEN

BACKGROUND: Drug repurposing can improve the return of investment as it finds new uses for existing drugs. Literature-based analyses exploit factual knowledge on drugs and diseases, e.g. from databases, and combine it with information from scholarly publications. Here we report the use of the Open Discovery Process on scientific literature to identify non-explicit ties between a disease, namely epilepsy, and known drugs, making full use of available epilepsy-specific ontologies. RESULTS: We identified characteristics of epilepsy-specific ontologies to create subsets of documents from the literature; from these subsets we generated ranked lists of co-occurring neurological drug names with varying specificity. From these ranked lists, we observed a high intersection regarding reference lists of pharmaceutical compounds recommended for the treatment of epilepsy. Furthermore, we performed a drug set enrichment analysis, i.e. a novel scoring function using an adaptive tuning parameter and comparing top-k ranked lists taking into account the varying length and the current position in the list. We also provide an overview of the pharmaceutical space in the context of epilepsy, including a final combined ranked list of more than 70 drug names. CONCLUSIONS: Biomedical ontologies are a rich resource that can be combined with text mining for the identification of drug names for drug repurposing in the domain of epilepsy. The ranking of the drug names related to epilepsy provides benefits to patients and to researchers as it enables a quick evaluation of statistical evidence hidden in the scientific literature, useful to validate approaches in the drug discovery process.


Asunto(s)
Ontologías Biológicas , Epilepsia , Preparaciones Farmacéuticas , Minería de Datos , Reposicionamiento de Medicamentos , Epilepsia/tratamiento farmacológico , Humanos
5.
Sensors (Basel) ; 21(16)2021 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-34450930

RESUMEN

This paper presents a novel approach for anomaly detection in industrial processes. The system solely relies on unlabeled data and employs a 1D-convolutional neural network-based deep autoencoder architecture. As a core novelty, we split the autoencoder latent space in discriminative and reconstructive latent features and introduce an auxiliary loss based on k-means clustering for the discriminatory latent variables. We employ a Top-K clustering objective for separating the latent space, selecting the most discriminative features from the latent space. We use the approach to the benchmark Tennessee Eastman data set to prove its applicability. We provide different ablation studies and analyze the method concerning various downstream tasks, including anomaly detection, binary and multi-class classification. The obtained results show the potential of the approach to improve downstream tasks compared to standard autoencoder architectures.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Análisis por Conglomerados
6.
PeerJ Comput Sci ; 7: e385, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33817031

RESUMEN

Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset's characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.

7.
J Mol Graph Model ; 100: 107693, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32805559

RESUMEN

DNA sequence similarity analysis is an essential task in computational biology and bioinformatics. In nearly all research that explores evolutionary relationships, gene function analysis, protein structure prediction and sequence retrieving, it is necessary to perform similarity calculations. As an alternative to alignment-based sequence comparison methods, which result in high computational cost, alignment-free methods have emerged that calculate similarity by digitizing the sequence in a different space. In this paper, we proposed an alignment-free DNA sequence similarity analysis method based on top-k n-gram matches, with the prediction that common repeating DNA subsections indicate high similarity between DNA sequences. In our method, we determined DNA sequence similarities by measuring similarity among feature vectors created according to top-k n-gram match-up scores without the use of similarity functions. We applied the similarity calculation for three different DNA data sets of different lengths. The phylogenetic relationships revealed by our method show that our trees coincide almost completely with the results of the MEGA software, which is based on sequence alignment. Our findings show that a certain number of frequently recurring common sequence patterns have the power to characterize DNA sequences.


Asunto(s)
Algoritmos , Programas Informáticos , Secuencia de Bases , Biología Computacional , Filogenia , Alineación de Secuencia , Análisis de Secuencia de ADN
8.
Sensors (Basel) ; 20(3)2020 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-32024087

RESUMEN

The rapid growth of GPS-enabled mobile devices has popularized many location-based applications. Spatial keyword search which finds objects of interest by considering both spatial locations and textual descriptions has become very useful in these applications. The recent integration of social data with spatial keyword search opens a new service horizon for users. Few previous studies have proposed methods to combine spatial keyword queries with social data in Euclidean space. However, most real-world applications constrain the distance between query location and data objects by a road network, where distance between two points is defined by the shortest connecting path. This paper proposes geo-social top-k keyword queries and geo-social skyline keyword queries on road networks. Both queries enrich traditional spatial keyword query semantics by incorporating social relevance component. We formalize the proposed query types and appropriate indexing frameworks and algorithms to efficiently process them. The effectiveness and efficiency of the proposed approaches are evaluated using real datasets.

9.
J Appl Stat ; 47(7): 1191-1207, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-35707028

RESUMEN

In a multivariate framework, ranking a data set can be done by using an aggregation function in order to obtain a global score for each individual, and then by using these scores to rank the individuals. The choice of the aggregation function (e.g. a weighted sum) and the choice of the parameters of the function (e.g. the weights) may have a great influence on the obtained ranking. We introduce in this communication a ratio index that can quantify the sensitivity of the data set ranking up to a change of weights. This index is investigated in the general case and in the restricted case of top-k rankings. We also illustrate the interest to use such an index to analyse ranked data sets.

10.
Ann Stat ; 47(4): 2204-2235, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31598016

RESUMEN

This paper is concerned with the problem of top-K ranking from pairwise comparisons. Given a collection of n items and a few pairwise comparisons across them, one wishes to identify the set of K items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model - the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress towards characterizing the performance (e.g. the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-K ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity - the number of paired comparisons needed to ensure exact top-K identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and non-iterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis-Kahan Θ theorem for symmetric matrices. This also allows us to close the gap between the l 2 error upper bound for the spectral method and the minimax lower limit.

11.
Sensors (Basel) ; 18(9)2018 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-30213034

RESUMEN

This paper proposed a multi-keyword ciphertext search, based on an improved-quality hierarchical clustering (MCS-IQHC) method. MCS-IQHC is a novel technique, which is tailored to work with encrypted data. It has improved search accuracy and can self-adapt when performing multi-keyword ciphertext searches on privacy-protected sensor network cloud platforms. Document vectors are first generated by combining the term frequency-inverse document frequency (TF-IDF) weight factor and the vector space model (VSM). The improved quality hierarchical clustering (IQHC) algorithm then generates document vectors, document indices, and cluster indices, which are encrypted via the k-nearest neighbor algorithm (KNN). MCS-IQHC then returns the top-k search result. A series of experiments proved that the proposed method had better searching efficiency and accuracy in high-privacy sensor cloud network environments, compared to other state-of-the-art methods.

12.
Sensors (Basel) ; 18(3)2018 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-29543745

RESUMEN

A novel network paradigm of mobile edge computing, namely TMWSNs (two-tiered mobile wireless sensor networks), has just been proposed by researchers in recent years for its high scalability and robustness. However, only a few works have considered the security of TMWSNs. In fact, the storage nodes, which are located at the upper layer of TMWSNs, are prone to being attacked by the adversaries because they play a key role in bridging both the sensor nodes and the sink, which may lead to the disclosure of all data stored on them as well as some other potentially devastating results. In this paper, we make a comparative study on two typical schemes, EVTopk and VTMSN, which have been proposed recently for securing Top-k queries in TMWSNs, through both theoretical analysis and extensive simulations, aiming at finding out their disadvantages and advancements. We find that both schemes unsatisfactorily raise communication costs. Specifically, the extra communication cost brought about by transmitting the proof information uses up more than 40% of the total communication cost between the sensor nodes and the storage nodes, and 80% of that between the storage nodes and the sink. We discuss the corresponding reasons and present our suggestions, hoping that it will inspire the researchers researching this subject.

13.
IEEE Trans Knowl Data Eng ; 28(5): 1160-1174, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-30867621

RESUMEN

Top-k proximity query in large graphs is a fundamental problem with a wide range of applications. Various random walk based measures have been proposed to measure the proximity between different nodes. Although these measures are effective, efficiently computing them on large graphs is a challenging task. In this paper, we develop an efficient and exact local search method, FLoS (Fast Local Search), for top-k proximity query in large graphs. FLoS guarantees the exactness of the solution. Moreover, it can be applied to a variety of commonly used proximity measures. FLoS is based on the no local optimum property of proximity measures. We show that many measures have no local optimum. Utilizing this property, we introduce several operations to manipulate transition probabilities and develop tight lower and upper bounds on the proximity values. The lower and upper bounds monotonically converge to the exact proximity value when more nodes are visited. We further extend FLoS to measures having local optimum by utilizing relationship among different measures. We perform comprehensive experiments on real and synthetic large graphs to evaluate the efficiency and effectiveness of the proposed method.

14.
Proc Am Stat Assoc ; 2014: 2754-2758, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26345260

RESUMEN

The Fligner and Verducci (1988) multistage model for rankings is modified to create the moving average maximum likelihood estimator (MAMLE), a locally smooth estimator that measures stage-wise agreement between two long ranked lists, and provides a stopping rule for the detection of the endpoint of agreement. An application of this MAMLE stopping rule to bivariate data set in tau-path order (Yu, Verducci and Blower (2011)) is discussed. Data from the National Cancer Institute measuring associations between gene expression and compound potency are studied using this application, providing insights into the length of the relationship between the variables.

15.
Proc Am Stat Assoc ; 2013: 338-347, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26345348

RESUMEN

For the problem of assessing initial agreement between two rankings of long lists, inference in the Fligner and Verducci (1988) multistage model for rankings is modified to provide a locally smooth estimator of stage-wise agreement. An extension to the case of overlapping but different sets of items in the two lists, and a stopping rule to identify the endpoint of agreement, are also provided. Simulations show that this approach performs very well under several conditions. The methodology is applied to a database of popular names for newborns in the United States and provides insights into trends as well as differences in naming conventions between the two sexes.

16.
Proc Am Stat Assoc ; 2012: 2941-2947, 2012 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26361466

RESUMEN

We propose an innovative approach to the problem recently posed by Hall and Schimek (2012): determining at what point the agreement between two rankings of a long list of items degenerates into noise. We modify the method of estimation in Fligner and Verducci's (1988) multistage model for rankings, from maximum likelihood of conditional agreement over a sample of rankings to a locally smooth estimator of agreement. Through simulations we show that this innovation performs very well under several conditions. Some ramifications are discussed as planned extensions.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA