Pesquisa | Portal Regional da BVS

1.

A new Apache Spark-based framework for big data streaming forecasting in IoT networks.

Fernández-Gómez, Antonio M; Gutiérrez-Avilés, David; Troncoso, Alicia; Martínez-Álvarez, Francisco.

J Supercomput ; 79(10): 11078-11100, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36845222

RESUMO

Analyzing time-dependent data acquired in a continuous flow is a major challenge for various fields, such as big data and machine learning. Being able to analyze a large volume of data from various sources, such as sensors, networks, and the internet, is essential for improving the efficiency of our society's production processes. Additionally, this vast amount of data is collected dynamically in a continuous stream. The goal of this research is to provide a comprehensive framework for forecasting big data streams from Internet of Things networks and serve as a guide for designing and deploying other third-party solutions. Hence, a new framework for time series forecasting in a big data streaming scenario, using data collected from Internet of Things networks, is presented. This framework comprises of five main modules: Internet of Things network design and deployment, big data streaming architecture, stream data modeling method, big data forecasting method, and a comprehensive real-world application scenario, consisting of a physical Internet of Things network feeding the big data streaming architecture, being the linear regression the algorithm used for illustrative purposes. Comparison with other frameworks reveals that this is the first framework that incorporates and integrates all the aforementioned modules.

2.

Time-Series Clustering Based on the Characterization of Segment Typologies.

Guijo-Rubio, David; Duran-Rosal, Antonio Manuel; Gutierrez, Pedro Antonio; Troncoso, Alicia; Hervas-Martinez, Cesar.

IEEE Trans Cybern ; 51(11): 5409-5422, 2021 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-31945011

RESUMO

Time-series clustering is the process of grouping time series with respect to their similarity or characteristics. Previous approaches usually combine a specific distance measure for time series and a standard clustering method. However, these approaches do not take the similarity of the different subsequences of each time series into account, which can be used to better compare the time-series objects of the dataset. In this article, we propose a novel technique of time-series clustering consisting of two clustering stages. In a first step, a least-squares polynomial segmentation procedure is applied to each time series, which is based on a growing window technique that returns different-length segments. Then, all of the segments are projected into the same dimensional space, based on the coefficients of the model that approximates the segment and a set of statistical features. After mapping, a first hierarchical clustering phase is applied to all mapped segments, returning groups of segments for each time series. These clusters are used to represent all time series in the same dimensional space, after defining another specific mapping process. In a second and final clustering stage, all the time-series objects are grouped. We consider internal clustering quality to automatically adjust the main parameter of the algorithm, which is an error threshold for the segmentation. The results obtained on 84 datasets from the UCR Time Series Classification Archive have been compared against three state-of-the-art methods, showing that the performance of this methodology is very promising, especially on larger datasets.

Assuntos

Algoritmos , Análise por Conglomerados , Fatores de Tempo

3.

Deep Learning for Time Series Forecasting: A Survey.

Torres, José F; Hadjout, Dalil; Sebaa, Abderrazak; Martínez-Álvarez, Francisco; Troncoso, Alicia.

Big Data ; 9(1): 3-21, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33275484

RESUMO

Time series forecasting has become a very intensive field of research, which is even increasing in recent years. Deep neural networks have proved to be powerful and are achieving high accuracy in many application fields. For these reasons, they are one of the most widely used methods of machine learning to solve problems dealing with big data nowadays. In this work, the time series forecasting problem is initially formulated along with its mathematical fundamentals. Then, the most common deep learning architectures that are currently being successfully applied to predict time series are described, highlighting their advantages and limitations. Particular attention is given to feed forward networks, recurrent neural networks (including Elman, long-short term memory, gated recurrent units, and bidirectional networks), and convolutional neural networks. Practical aspects, such as the setting of values for hyper-parameters and the choice of the most suitable frameworks, for the successful application of deep learning to time series are also provided and discussed. Several fruitful research fields in which the architectures analyzed have obtained a good performance are reviewed. As a result, research gaps have been identified in the literature for several domains of application, thus expecting to inspire new and better forms of knowledge.

Assuntos

Aprendizado Profundo , Big Data , Previsões , Aprendizado de Máquina , Redes Neurais de Computação

4.

Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S.

BioData Min ; 11: 4, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29610579

RESUMO

BACKGROUND: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. RESULTS: The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. CONCLUSIONS: It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.

5.

Applications of Computational Intelligence in Time Series.

Martínez-Álvarez, Francisco; Troncoso, Alicia; Reyes, Jorge; Martínez-Ballesteros, María; Riquelme, José C.

Comput Intell Neurosci ; 2017: 9361749, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28611841

Assuntos

Inteligência Artificial , Previsões/métodos , Humanos , Fatores de Tempo

6.

Integrating biological knowledge based on functional annotations for biclustering of gene expression data.

Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S.

Comput Methods Programs Biomed ; 119(3): 163-80, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-25843807

RESUMO

Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/estatística & dados numéricos , Anotação de Sequência Molecular/estatística & dados numéricos , Aprendizado de Máquina não Supervisionado/estatística & dados numéricos , Análise por Conglomerados , Mineração de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Ontologia Genética/estatística & dados numéricos , Genes Fúngicos , Bases de Conhecimento , Leveduras/genética

7.

Biclustering of gene expression data by correlation-based scatter search.

Nepomuceno, Juan A; Troncoso, Alicia; Aguilar-Ruiz, Jesús S.

BioData Min ; 4(1): 3, 2011 Jan 24.

Artigo em Inglês | MEDLINE | ID: mdl-21261986

RESUMO

BACKGROUND: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes. METHODS: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes. RESULTS: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA