Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Data Brief ; 51: 109653, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37869625

RESUMEN

This article presents a dataset comprising signal data collected from Inertial Measurement Unit (IMU) sensors during the administration of the Time Up and Go (TUG) test for assessing fall risk in older adults. The dataset is divided into two main sections. The first section contains personal, behavioral, and health-related data from 34 participants. The second section contains signal data from tri-axial acceleration and tri-axial gyroscope sensors embedded in an IMU sensor, which was affixed to the participants' waist area to capture signal data while they walked. The chosen assessment method for fall risk analysis is the TUG test, requiring participants to walk a 3-meter distance back and forth. To prepare the dataset for subsequent analysis, the raw signal data underwent processing to extract only the walking periods during the TUG test. Additionally, a low-pass filter technique was employed to reduce noise interference. This dataset holds the potential for the development of effective models for fall risk detection based on insights garnered from questionnaires administered to specialists who observed the experiments. The dataset also contains anonymized participant information that can be explored to investigate fall risk, along with other health-related conditions or behaviors that could influence the risk of falling. This information is invaluable for devising tailored treatment or rehabilitation plans for individual older adults. The complete dataset is accessible through the Mendeley repository."

2.
Sci Rep ; 13(1): 6795, 2023 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-37100806

RESUMEN

The COVID-19 pandemic has put a huge challenge on the Indian health infrastructure. With a larger number of people getting affected during the second wave, hospitals were overburdened, running out of supplies and oxygen. Hence, predicting new COVID-19 cases, new deaths, and total active cases multiple days in advance can aid better utilization of scarce medical resources and prudent pandemic-related decision-making. The proposed method uses gated recurrent unit networks as the main predicting model. A study is conducted by building four models pre-trained on COVID-19 data from four different countries (United States of America, Brazil, Spain, and Bangladesh) and fine-tuned on India's data. Since the four countries chosen have experienced different types of infection curves, the pre-training provides a transfer learning to the models incorporating diverse situations into account. Each of the four models then gives 7-day ahead predictions using the recursive learning method for the Indian test data. The final prediction comes from an ensemble of the predictions of the different models. This method with two countries, Spain and Bangladesh, is seen to achieve the best performance amongst all the combinations as well as compared to other traditional regression models.


Asunto(s)
COVID-19 , Pandemias , Humanos , COVID-19/epidemiología , India/epidemiología , Redes Neurales de la Computación , Aprendizaje Automático
3.
Heliyon ; 9(1): e13025, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36820176

RESUMEN

Employees who have legitimate access to an organization's data may occasionally put sensitive corporate data at risk, either carelessly or maliciously. Ideally, potential breaches should be detected as soon as they occur, but in practice there may be delays, because human analysts are not able to recognize data exfiltration behaviors quickly enough with the tools available to them. Visualization may improve cybersecurity situation awareness. In this paper, we present a dashboard application for investigating file activity, as a way to improve situation awareness. We developed this dashboard for a wide range of stakeholders within a large financial services company. Cybersecurity experts/analysts, data owners, team leaders/managers, high level administrators, and other investigators all provided input to its design. The use of a co-design approach helped to create trust between users and the new visualization tools, which were built to be compatible with existing work processes. We discuss the user-centered design process that informed the development of the dashboard, and the functionality of its three inter-operable monitoring dashboards. In this case three dashboards were developed covering high-level overview, file volume/type comparison, and individual activity, but the appropriate number and type of dashboards to use will likely vary according to the nature of the detection task). We also present two use cases with usability results and preliminary usage data. The results presented examined the amount of use that the dashboards received as well as measures obtained using the Technology Acceptance Model (TAM). We also report user comments about the dashboards and how to improve them.

4.
SN Comput Sci ; 3(4): 267, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35531568

RESUMEN

Voice assistants (VA) are an emerging technology that have become an essential tool of the twenty-first century. The VA ease of access and use has resulted in high usability curiosity in voice assistants. Usability is an essential aspect of any emerging technology, with every technology having a standardized usability measure. Despite the high acceptance rate on the use of VA, to the best of our knowledge, not many studies were carried out on voice assistants' usability. We reviewed studies that used voice assistants for various tasks in this context. Our study highlighted the usability measures currently used for voice assistants. Moreover, our study also highlighted the independent variables used and their context of use. We employed the ISO 9241-11 framework as the measuring tool in our study. We highlighted voice assistant's usability measures currently used; both within the ISO 9241-11 framework, as well as outside of it to provide a comprehensive view. A range of diverse independent variables are identified that were used to measure usability. We also specified that the independent variables still not used to measure some usability experience. We currently concluded what was carried out on voice assistant usability measurement and what research gaps were present. We also examined if the ISO 9241-11 framework can be used as a standard measurement tool for voice assistants.

5.
Methods ; 202: 31-39, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-34090971

RESUMEN

The trendy task of digital medical image analysis has been continually evolving. It has been an area of prominent and growing importance from both research and deployment perspectives. Nonetheless, it is necessary to realize that the use of algorithms, methodology, as well as the source of medical image data, must be strictly scrutinized. As the COVID-19 pandemic has been gripping much of the world recently, there has been much efforts gone into developing affordable testing for the masses, and it has been shown that the established and widely available chest X-rays (CXR) images may be used as a screening criteria for assistive diagnosis purpose. Thanks to the dedicated work by various individuals and organizations, publicly available CXR of COVID-19 subjects are available for analytic usage. We have also provided a publicly available CXR dataset on the Kaggle platform. As a case study, this paper presents a systematic approach to learn from a typically imbalanced set of CXR images, which consists of a limited number of publicly available COVID-19 images. Our results show that we are able to outperform the top finishers in a related Kaggle multi-class CXR challenge. The proposed methodology should be able to help guide medical personnel in obtaining a robust diagnosis model to discern COVID-19 from other conditions confidently.


Asunto(s)
COVID-19 , Aprendizaje Profundo , COVID-19/diagnóstico por imagen , Humanos , Pandemias , SARS-CoV-2 , Tomografía Computarizada por Rayos X/métodos , Rayos X
6.
PeerJ ; 8: e9470, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32704450

RESUMEN

Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10-7). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition.

7.
J Biomed Inform ; 93: 103156, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30902595

RESUMEN

To extract and generate a valid metabolic pathway from research articles, biologists need substantial amounts of time to digest unstructured text. Text mining currently plays a central role in this research area, because it provides the ability to automatically discover useful information in a reasonable time. A text mining model can be built using a training data or a corpus in supervised manner. Unfortunately, a corpus of the domain of interest may not be always available or insufficient in practice, because a corpus construction is a labor-intensive task and needs specialist annotation. In this paper, we developed an event extraction system, a text-mining task, to extract metabolic interactions from research literature and then reconstruct metabolic pathways. The proposed system consists of the pipeline of four supervised-learning steps: named entity recognition, trigger detection, edge detection, and event reconstruction. We also introduced a multitask-learning algorithm, a transfer-learning paradigm, that can leverage additional resources of an existing source domain to facilitate a classification of the metabolic event extraction in the target domain. To demonstrate a proof of concept, edge detection, a core step in our event extraction system, was used as a case study in multitask-learning classification. The experimental results showed that the proposed event extraction system provided competitive performance against those of state-of-the-art related system. In particular, the proposed multitask-learning can improve the performance of edge detection, therefore the overall performance of the event extraction system was also improved accordingly.


Asunto(s)
Metabolismo , Formación de Concepto , Minería de Datos/métodos , Humanos , Aprendizaje Automático
8.
J Bioinform Comput Biol ; 14(4): 1650015, 2016 08.
Artículo en Inglés | MEDLINE | ID: mdl-27102089

RESUMEN

Cancer is a complex disease that cannot be diagnosed reliably using only single gene expression analysis. Using gene-set analysis on high throughput gene expression profiling controlled by various environmental factors is a commonly adopted technique used by the cancer research community. This work develops a comprehensive gene expression analysis tool (gene-set activity toolbox: (GAT)) that is implemented with data retriever, traditional data pre-processing, several gene-set analysis methods, network visualization and data mining tools. The gene-set analysis methods are used to identify subsets of phenotype-relevant genes that will be used to build a classification model. To evaluate GAT performance, we performed a cross-dataset validation study on three common cancers namely colorectal, breast and lung cancers. The results show that GAT can be used to build a reasonable disease diagnostic model and the predicted markers have biological relevance. GAT can be accessed from http://gat.sit.kmutt.ac.th where GAT's java library for gene-set analysis, simple classification and a database with three cancer benchmark datasets can be downloaded.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/diagnóstico , Neoplasias Colorrectales/diagnóstico , Neoplasias Pulmonares/diagnóstico , Programas Informáticos , Análisis de Varianza , Neoplasias de la Mama/genética , Neoplasias Colorrectales/genética , Minería de Datos , Bases de Datos Genéticas , Femenino , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Pulmonares/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos
9.
PeerJ ; 4: e1811, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27019783

RESUMEN

Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module-MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module-MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme-metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available at www.sbi.kmutt.ac.th/ preecha/metrecon.

10.
BMC Med Genomics ; 9(Suppl 3): 70, 2016 12 05.
Artículo en Inglés | MEDLINE | ID: mdl-28117655

RESUMEN

BACKGROUND: Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results. METHODS: The two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms. RESULTS: The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method. CONCLUSIONS: The proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.


Asunto(s)
Algoritmos , Biomarcadores de Tumor/genética , Redes Reguladoras de Genes , Neoplasias Pulmonares/genética , Transcriptoma , Recolección de Datos , Humanos
11.
BMC Med Genomics ; 8 Suppl 1: S7, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25783485

RESUMEN

BACKGROUND: A substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. In particular, rare copy number variation (CNV) contributes to ASD risk in up to 10% of ASD subjects. Despite the striking degree of genetic heterogeneity, case-control studies have detected specific burden of rare disruptive CNV for neuronal and neurodevelopmental pathways. Here, we used machine learning methods to classify ASD subjects and controls, based on rare CNV data and comprehensive gene annotations. We investigated performance of different methods and estimated the percentage of ASD subjects that could be reliably classified based on presumed etiologic CNV they carry. RESULTS: We analyzed 1,892 Caucasian ASD subjects and 2,342 matched controls. Rare CNVs (frequency 1% or less) were detected using Illumina 1M and 1M-Duo BeadChips. Conditional Inference Forest (CF) typically performed as well as or better than other classification methods. We found a maximum AUC (area under the ROC curve) of 0.533 when considering all ASD subjects with rare genic CNVs, corresponding to 7.9% correctly classified ASD subjects and less than 3% incorrectly classified controls; performance was significantly higher when considering only subjects harboring de novo or pathogenic CNVs. We also found rare losses to be more predictive than gains and that curated neurally-relevant annotations (brain expression, synaptic components and neurodevelopmental phenotypes) outperform Gene Ontology and pathway-based annotations. CONCLUSIONS: CF is an optimal classification approach for case-control rare CNV data and it can be used to prioritize subjects with variants potentially contributing to ASD risk not yet recognized. The neurally-relevant annotations used in this study could be successfully applied to rare CNV case-control data-sets for other neuropsychiatric disorders.


Asunto(s)
Trastorno Autístico/clasificación , Trastorno Autístico/genética , Biología Computacional/métodos , Variaciones en el Número de Copia de ADN/genética , Anotación de Secuencia Molecular , Estudios de Casos y Controles , Femenino , Ontología de Genes , Humanos , Aprendizaje Automático , Masculino
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...