Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
EPJ Data Sci ; 11(1): 12, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35261872

RESUMO

User-generated contents often contain private information, even when they are shared publicly on social media and on the web in general. Although many filtering and natural language approaches for automatically detecting obscenities or hate speech have been proposed, determining whether a shared post contains sensitive information is still an open issue. The problem has been addressed by assuming, for instance, that sensitive contents are published anonymously, on anonymous social media platforms or with more restrictive privacy settings, but these assumptions are far from being realistic, since the authors of posts often underestimate or overlook their actual exposure to privacy risks. Hence, in this paper, we address the problem of content sensitivity analysis directly, by presenting and characterizing a new annotated corpus with around ten thousand posts, each one annotated as sensitive or non-sensitive by a pool of experts. We characterize our data with respect to the closely-related problem of self-disclosure, pointing out the main differences between the two tasks. We also present the results of several deep neural network models that outperform previous naive attempts of classifying social media posts according to their sensitivity, and show that state-of-the-art approaches based on anonymity and lexical analysis do not work in realistic application scenarios.

2.
Sci Rep ; 11(1): 20645, 2021 10 19.
Artigo em Inglês | MEDLINE | ID: mdl-34667192

RESUMO

For their stability and detectability faecal microRNAs represent promising molecules with potential clinical interest as non-invasive diagnostic and prognostic biomarkers. However, there is no evidence on how stool miRNA profiles change according to an individual's age, sex, and body mass index (BMI) or how lifestyle habits influence the expression levels of these molecules. We explored the relationship between the stool miRNA levels and common traits (sex, age, BMI, and menopausal status) or lifestyle habits (physical activity, smoking status, coffee, and alcohol consumption) as derived by a self-reported questionnaire, using small RNA-sequencing data of samples from 335 healthy subjects. We detected 151 differentially expressed miRNAs associated with one variable and 52 associated with at least two. Differences in miR-638 levels were associated with age, sex, BMI, and smoking status. The highest number of differentially expressed miRNAs was associated with BMI (n = 92) and smoking status (n = 84), with several miRNAs shared between them. Functional enrichment analyses revealed the involvement of the miRNA target genes in pathways coherent with the analysed variables. Our findings suggest that miRNA profiles in stool may reflect common traits and lifestyle habits and should be considered in relation to disease and association studies based on faecal miRNA expression.


Assuntos
Fezes/química , Estilo de Vida , MicroRNAs/análise , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Índice de Massa Corporal , Fumar Cigarros/genética , Feminino , Expressão Gênica/genética , Perfilação da Expressão Gênica/métodos , Voluntários Saudáveis , Humanos , Masculino , MicroRNAs/isolamento & purificação , Pessoa de Meia-Idade , Fatores Sexuais , Transcriptoma
3.
IEEE Trans Neural Netw Learn Syst ; 31(11): 5014-5020, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31870997

RESUMO

Semisupervised learning (SSL) is a family of classification methods conceived to reduce the amount of required labeled information in the training phase. Graph-based methods are among the most popular semisupervised strategies: the nearest neighbor graph is built in such a way that the manifold of the data is captured and the labeled information is propagated to target samples along the structure of the manifold. Research in graph-based SSL has mainly focused on two aspects: 1) the construction of the k -nearest neighbors graph and/or 2) the propagation algorithm providing the classification. Differently from the previous literature, in this article, we focus on the data representation with the aim of incorporating semisupervision earlier in the process. To this end, we propose an algorithm that learns a new knowledge-aware data embedding via an ensemble of semisupervised autoencoders to enhance a graph-based semisupervised classification. The experiments carried out on different classification tasks demonstrate the benefit of our approach.

4.
Appl Netw Sci ; 3(1): 50, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30596143

RESUMO

The success of a film is usually measured through its box-office revenue or through the opinion of professional critics; such measures, however, may be influenced by external factors, such as advertisement or trends, and are not able to capture the impact of a film over time. Thanks to the recent availability of data on references among movies, some researchers have started to use citations patterns as an alternative method for ranking movies. In this paper, we propose a novel ranking method for films based on the network of references among movies, calculated by combining four well known centrality indexes: in-degree, closeness, harmonic and PageRank. Our objective is to measure the success of a movie by accounting how much it has influenced other movies produced after its release, from both the artistic and the economic point of view. We apply our method on a subset of the IMDb (Internet Movie Database) citation network consisting of around 47,000 international movies, and we derive a list of films that can be considered milestones in the history of cinema. For each movie we also collect data on its year of release, genres and countries of production, to analyze trends and patterns in the film industry according to such features. We also collect data on 20,000 directors and almost 400,000 performers (actors and actresses), and we use the network of references and our score of movies for evaluating their career, and for ranking them. Since the IMDb dataset we employ is highly biased toward European and North American movies and personalities, our findings can be considered relevant principally for Western culture.

5.
IEEE Trans Neural Netw Learn Syst ; 28(5): 1017-1029, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-26915139

RESUMO

In this paper, we introduce a new approach of semisupervised anomaly detection that deals with categorical data. Given a training set of instances (all belonging to the normal class), we analyze the relationship among features for the extraction of a discriminative characterization of the anomalous instances. Our key idea is to build a model that characterizes the features of the normal instances and then use a set of distance-based techniques for the discrimination between the normal and the anomalous instances. We compare our approach with the state-of-the-art methods for semisupervised anomaly detection. We empirically show that a specifically designed technique for the management of the categorical data outperforms the general-purpose approaches. We also show that, in contrast with other approaches that are opaque because their decision cannot be easily understood, our proposed approach produces a discriminative model that can be easily interpreted and used for the exploration of the data.

6.
BMC Bioinformatics ; 9: 378, 2008 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-18801154

RESUMO

BACKGROUND: There is an increasing need in transcriptome research for gene expression data and pattern warehouses. It is of importance to integrate in these warehouses both raw transcriptomic data, as well as some properties encoded in these data, like local patterns. DESCRIPTION: We have developed an application called SQUAT (SAGE Querying and Analysis Tools) which is available at: http://bsmc.insa-lyon.fr/squat/. This database gives access to both raw SAGE data and patterns mined from these data, for three species (human, mouse and chicken). This database allows to make simple queries like "In which biological situations is my favorite gene expressed?" as well as much more complex queries like: <>. Connections with external web databases enrich biological interpretations, and enable sophisticated queries. To illustrate the power of SQUAT, we show and analyze the results of three different queries, one of which led to a biological hypothesis that was experimentally validated. CONCLUSION: SQUAT is a user-friendly information retrieval platform, which aims at bringing some of the state-of-the-art mining tools to biologists.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Internet , Software , Fatores de Transcrição/genética , Algoritmos , Animais , Aves , Humanos , Camundongos , Interface Usuário-Computador
7.
In Silico Biol ; 7(4-5): 467-83, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18391238

RESUMO

The production of high-throughput gene expression data has generated a crucial need for bioinformatics tools to generate biologically interesting hypotheses. Whereas many tools are available for extracting global patterns, less attention has been focused on local pattern discovery. We propose here an original way to discover knowledge from gene expression data by means of the so-called formal concepts which hold in derived Boolean gene expression datasets. We first encoded the over-expression properties of genes in human cells using human SAGE data. It has given rise to a Boolean matrix from which we extracted the complete collection of formal concepts, i.e., all the largest sets of over-expressed genes associated to a largest set of biological situations in which their over-expression is observed. Complete collections of such patterns tend to be huge. Since their interpretation is a time-consuming task, we propose a new method to rapidly visualize clusters of formal concepts. This designates a reasonable number of Quasi-Synexpression-Groups (QSGs) for further analysis. The interest of our approach is illustrated using human SAGE data and interpreting one of the extracted QSGs. The assessment of its biological relevancy leads to the formulation of both previously proposed and new biological hypotheses.


Assuntos
Biologia Computacional/instrumentação , Expressão Gênica , Reconhecimento Automatizado de Padrão/métodos , Análise por Conglomerados , Genoma Humano , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA