Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
1.
BMC Bioinformatics ; 24(1): 419, 2023 Nov 07.
Article in English | MEDLINE | ID: mdl-37936066

ABSTRACT

BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). METHOD: A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. RESULTS: The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. CONCLUSIONS: The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains.


Subject(s)
Deep Learning , Machine Learning
2.
BMC Bioinformatics ; 22(1): 575, 2021 Nov 30.
Article in English | MEDLINE | ID: mdl-34847877

ABSTRACT

BACKGROUND: Deep learning is an active bioinformatics artificial intelligence field that is useful in solving many biological problems, including predicting altered epigenetics such as DNA methylation regions. Deep learning (DL) can learn an informative representation that addresses the need for defining relevant features. However, deep learning models are computationally expensive, and they require large training datasets to achieve good classification performance. RESULTS: One approach to addressing these challenges is to use a less complex deep learning network for feature selection and Machine Learning (ML) for classification. In the current study, we introduce a hybrid DL-ML approach that uses a deep neural network for extracting molecular features and a non-DL classifier to predict environmentally responsive transgenerational differential DNA methylated regions (DMRs), termed epimutations, based on the extracted DL-based features. Various environmental toxicant induced epigenetic transgenerational inheritance sperm epimutations were used to train the model on the rat genome DNA sequence and use the model to predict transgenerational DMRs (epimutations) across the entire genome. CONCLUSION: The approach was also used to predict potential DMRs in the human genome. Experimental results show that the hybrid DL-ML approach outperforms deep learning and traditional machine learning methods.


Subject(s)
Artificial Intelligence , DNA Methylation , Animals , DNA , Epigenesis, Genetic , Genome, Human , Humans , Machine Learning , Rats
3.
Sensors (Basel) ; 19(15)2019 Jul 24.
Article in English | MEDLINE | ID: mdl-31344811

ABSTRACT

IoT sensor networks have an inherent graph structure that can be used to extract graphical features for improving performance in a variety of prediction tasks. We propose a framework that represents IoT sensor network data as a graph, extracts graphical features, and applies feature selection methods to identify the most useful features that are to be used by a classifier for prediction tasks. We show that a set of generic graph-based features can improve performance of sensor network predictions without the need for application-specific and task-specific feature engineering. We apply this approach to three different prediction tasks: activity recognition from motion sensors in a smart home, demographic prediction from GPS sensor data in a smart phone, and activity recognition from GPS sensor data in a smart phone. Our approach produced comparable results with most of the state-of-the-art methods, while maintaining the additional advantage of general applicability to IoT sensor networks without using sophisticated and application-specific feature generation techniques or background knowledge. We further investigate the impact of using edge-transition times, categorical features, different sensor window sizes, and normalization in the smart home domain. We also consider deep learning approaches, including the Graph Convolutional Network (GCN), for the elimination of feature engineering in the smart home domain, but our approach provided better performance in most cases. We conclude that the graphical feature-based framework that is based on IoT sensor categorization, nodes and edges as features, and feature selection techniques provides superior results when compared to the non-graph-based features.

4.
Sensors (Basel) ; 18(6)2018 Jun 01.
Article in English | MEDLINE | ID: mdl-29865149

ABSTRACT

Numerous applications rely on data obtained from a wireless sensor network where application performance is of utmost importance. However, energy usage is also important, and oftentimes, a subset of sensors can be selected to maximize application performance. We cast the problem of sensor selection as a local search optimization problem and solve it using a variant of stochastic hill climbing extended with novel heuristics. This paper introduces sensor network configuration learning, a feedback-based heuristic algorithm that dynamically reconfigures the sensor network to maximize the performance of the target application. The proposed algorithm is described in detail, along with experiments conducted and a scalability study. A quick method for launching the algorithm from a better starting point than random is also detailed. The performance of the algorithm is compared to that of two other well-known algorithms and randomness. Our simulation results obtained from running sensor network configuration learning on a number of scenarios show the effectiveness and scalability of our approach.

5.
BMC Genomics ; 17: 418, 2016 06 01.
Article in English | MEDLINE | ID: mdl-27245821

ABSTRACT

BACKGROUND: A variety of environmental factors have been shown to promote the epigenetic transgenerational inheritance of disease and phenotypic variation in numerous species. Exposure to environmental factors such as toxicants can promote epigenetic changes (epimutations) involving alterations in DNA methylation to produce specific differential DNA methylation regions (DMRs). The germline (e.g. sperm) transmission of epimutations is associated with epigenetic transgenerational inheritance phenomena. The current study was designed to determine the genomic locations of environmentally induced transgenerational DMRs and assess their potential clustering. RESULTS: The exposure specific DMRs (epimutations) from a number of different studies were used. The clustering approach identified areas of the genome that have statistically significant over represented numbers of epimutations. The location of DMR clusters was compared to the gene clusters of differentially expressed genes found in tissues and cells associated with the transgenerational inheritance of disease. Such gene clusters, termed epigenetic control regions (ECRs), have been previously suggested to regulate gene expression in regions spanning up to 2-5 million bases. DMR clusters were often found to associate with inherent gene clusters within the genome. CONCLUSION: The current study used a number of epigenetic datasets from previous studies to identify novel DMR clusters across the genome. Observations suggest these clustered DMR within an ECR may be susceptible to epigenetic reprogramming and dramatically influence genome activity.


Subject(s)
Cluster Analysis , DNA Methylation , Epigenesis, Genetic , Genetic Association Studies , Genetic Diseases, Inborn/genetics , Genomics , Phenotype , Chromosome Mapping , Computational Biology/methods , Databases, Genetic , Environment , Female , Genomics/methods , Humans , Male , Mutation , Organ Specificity/genetics
6.
Environ Epigenet ; 9(1): dvad007, 2023.
Article in English | MEDLINE | ID: mdl-38130880

ABSTRACT

Exposure to environmental toxicants can lead to epimutations in the genome and an increase in differential DNA methylated regions (DMRs) that have been linked to increased susceptibility to various diseases. However, the unique effect of particular toxicants on the genome in terms of leading to unique DMRs for the toxicants has been less studied. One hurdle to such studies is the low number of observed DMRs per toxicants. To address this hurdle, a previously validated hybrid deep-learning cross-exposure prediction model is trained per exposure and used to predict exposure-specific DMRs in the genome. Given these predicted exposure-specific DMRs, a set of unique DMRs per exposure can be identified. Analysis of these unique DMRs through visualization, DNA sequence motif matching, and gene association reveals known and unknown links between individual exposures and their unique effects on the genome. The results indicate the potential ability to define exposure-specific epigenetic markers in the genome and the potential relative impact of different exposures. Therefore, a computational approach to predict exposure-specific transgenerational epimutations was developed, which supported the exposure specificity of ancestral toxicant actions and provided epigenome information on the DMR sites predicted.

7.
Environ Epigenet ; 9(1): dvad006, 2023.
Article in English | MEDLINE | ID: mdl-38162685

ABSTRACT

Three successive multiple generations of rats were exposed to different toxicants and then bred to the transgenerational F5 generation to assess the impacts of multiple generation different exposures. The current study examines the actions of the agricultural fungicide vinclozolin on the F0 generation, followed by jet fuel hydrocarbon mixture exposure of the F1 generation, and then pesticide dichlorodiphenyltrichloroethane on the F2 generation gestating females. The subsequent F3 and F4 generations and F5 transgenerational generation were obtained and F1-F5 generations examined for male sperm epigenetic alterations and pathology in males and females. Significant impacts on the male sperm differential DNA methylation regions were observed. The F3-F5 generations were similar in ∼50% of the DNA methylation regions. The pathology of each generation was assessed in the testis, ovary, kidney, and prostate, as well as the presence of obesity and tumors. The pathology used a newly developed Deep Learning, artificial intelligence-based histopathology analysis. Observations demonstrated compounded disease impacts in obesity and metabolic parameters, but other pathologies plateaued with smaller increases at the F5 transgenerational generation. Observations demonstrate that multiple generational exposures, which occur in human populations, appear to increase epigenetic impacts and disease susceptibility.

8.
Methods Inf Med ; 61(3-04): 99-110, 2022 09.
Article in English | MEDLINE | ID: mdl-36220111

ABSTRACT

BACKGROUND: Behavior and health are inextricably linked. As a result, continuous wearable sensor data offer the potential to predict clinical measures. However, interruptions in the data collection occur, which create a need for strategic data imputation. OBJECTIVE: The objective of this work is to adapt a data generation algorithm to impute multivariate time series data. This will allow us to create digital behavior markers that can predict clinical health measures. METHODS: We created a bidirectional time series generative adversarial network to impute missing sensor readings. Values are imputed based on relationships between multiple fields and multiple points in time, for single time points or larger time gaps. From the complete data, digital behavior markers are extracted and are mapped to predicted clinical measures. RESULTS: We validate our approach using continuous smartwatch data for n = 14 participants. When reconstructing omitted data, we observe an average normalized mean absolute error of 0.0197. We then create machine learning models to predict clinical measures from the reconstructed, complete data with correlations ranging from r = 0.1230 to r = 0.7623. This work indicates that wearable sensor data collected in the wild can be used to offer insights on a person's health in natural settings.


Subject(s)
Algorithms , Machine Learning , Humans , Time Factors , Data Collection , Cognition
9.
Article in English | MEDLINE | ID: mdl-36381500

ABSTRACT

New modes of technology are offering unprecedented opportunities to unobtrusively collect data about people's behavior. While there are many use cases for such information, we explore its utility for predicting multiple clinical assessment scores. Because clinical assessments are typically used as screening tools for impairment and disease, such as mild cognitive impairment (MCI), automatically mapping behavioral data to assessment scores can help detect changes in health and behavior across time. In this paper, we aim to extract behavior markers from two modalities, a smart home environment and a custom digital memory notebook app, for mapping to ten clinical assessments that are relevant for monitoring MCI onset and changes in cognitive health. Smart home-based behavior markers reflect hourly, daily, and weekly activity patterns, while app-based behavior markers reflect app usage and writing content/style derived from free-form journal entries. We describe machine learning techniques for fusing these multimodal behavior markers and utilizing joint prediction. We evaluate our approach using three regression algorithms and data from 14 participants with MCI living in a smart home environment. We observed moderate to large correlations between predicted and ground-truth assessment scores, ranging from r = 0.601 to r = 0.871 for each clinical assessment.

10.
J Alzheimers Dis ; 85(1): 73-90, 2022.
Article in English | MEDLINE | ID: mdl-34776442

ABSTRACT

BACKGROUND: Compensatory aids can help mitigate the impact of progressive cognitive impairment on daily living. OBJECTIVE: We evaluate whether the learning and sustained use of an Electronic Memory and Management Aid (EMMA) application can be augmented through a partnership with real-time, activity-aware transition-based prompting delivered by a smart home. METHODS: Thirty-two adults who met criteria for amnestic mild cognitive impairment (aMCI) were randomized to learn to use the EMMA app on its own (N = 17) or when partnered with smart home prompting (N = 15). The four-week, five-session manualized EMMA training was conducted individually in participant homes by trained clinicians. Monthly questionnaires were completed by phone with trained personnel blind to study hypotheses. EMMA data metrics were collected continuously for four months. For the partnered condition, activity-aware prompting was on during training and post-training months 1 and 3, and off during post-training month 2. RESULTS: The analyzed aMCI sample included 15 EMMA-only and 14 partnered. Compared to the EMMA-only condition, by week four of training, participants in the partnered condition were engaging with EMMA more times daily and using more basic and advanced features. These advantages were maintained throughout the post-training phase with less loss of EMMA app use over time. There was little differential impact of the intervention on self-report primary (everyday functioning, quality of life) and secondary (coping, satisfaction with life) outcomes. CONCLUSION: Activity-aware prompting technology enhanced acquisition, habit formation and long-term use of a digital device by individuals with aMCI. (ClinicalTrials.gov NCT03453554).


Subject(s)
Cognitive Dysfunction/rehabilitation , Quality of Life , Reminder Systems , Supervised Machine Learning , Activities of Daily Living , Aged , Female , Humans , Independent Living , Male , Middle Aged , Outcome Assessment, Health Care , Pilot Projects , Self Efficacy , Surveys and Questionnaires , Technology Assessment, Biomedical
11.
IEEE Trans Knowl Data Eng ; 23(4): 527-539, 2011.
Article in English | MEDLINE | ID: mdl-21617742

ABSTRACT

The machine learning and pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track activities that people normally perform as part of their daily routines. Although approaches do exist for recognizing activities, the approaches are applied to activities that have been pre-selected and for which labeled training data is available. In contrast, we introduce an automated approach to activity tracking that identifies frequent activities that naturally occur in an individual's routine. With this capability we can then track the occurrence of regular activities to monitor functional health and to detect changes in an individual's patterns and lifestyle. In this paper we describe our activity mining and tracking approach and validate our algorithms on data collected in physical smart environments.

13.
J Med Chem ; 51(3): 648-54, 2008 Feb 14.
Article in English | MEDLINE | ID: mdl-18211009

ABSTRACT

Four different models are used to predict whether a compound will bind to 2C9 with a K(i) value of less than 10 microM. A training set of 276 compounds and a diverse validation set of 50 compounds were used to build and assess each model. The modeling methods are chosen to exploit the differences in how training sets are used to develop the predictive models. Two of the four methods develop partitioning trees based on global descriptions of structure using nine descriptors. A third method uses the same descriptors to develop local descriptions that relate activity to structures with similar descriptor characteristics. The fourth method uses a graph-theoretic approach to predict activity based on molecular structure. When all of these methods agree, the predictive accuracy is 94%. An external validation set of 11 compounds gives a predictive accuracy of 91% when all methods agree.


Subject(s)
Cytochrome P-450 Enzyme System/chemistry , Drug Interactions , Models, Molecular , Pharmaceutical Preparations/chemistry , Drug Design , Molecular Structure , Protein Binding , Quantitative Structure-Activity Relationship
14.
Epigenetics ; 12(7): 505-514, 2017 07 03.
Article in English | MEDLINE | ID: mdl-28524769

ABSTRACT

Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review.


Subject(s)
Epigenesis, Genetic , Epigenomics/methods , Genetics, Medical/methods , Machine Learning , Animals , Humans
15.
PLoS One ; 10(11): e0142274, 2015.
Article in English | MEDLINE | ID: mdl-26571271

ABSTRACT

Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility.


Subject(s)
DDT/toxicity , Epigenesis, Genetic , Genome-Wide Association Study , Machine Learning , Methoxychlor/toxicity , Mutation , Bayes Theorem , Chromosomes/ultrastructure , Cluster Analysis , Computational Biology/methods , CpG Islands , DNA Methylation , Databases, Genetic , Environmental Exposure , Female , Genetic Predisposition to Disease , Granulosa Cells/drug effects , Granulosa Cells/metabolism , Humans , Male , Phenotype , Reproducibility of Results , Sequence Analysis, DNA , Sertoli Cells/drug effects , Sertoli Cells/metabolism , Spermatozoa/drug effects
16.
J Comput Biol ; 21(7): 492-507, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24798423

ABSTRACT

In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological data, sensor data, medical diagnostics, or any other domain where labeling any instances of the minority class can be time-consuming or costly or the data may not be easily available. The current study investigates a number of imbalanced class algorithms for solving the imbalanced class distribution present in epigenetic datasets. Epigenetic (DNA methylation) datasets inherently come with few differentially DNA methylated regions (DMR) and with a higher number of non-DMR sites. For this class imbalance problem, a number of algorithms are compared, including the TAN+AdaBoost algorithm. Experiments performed on four epigenetic datasets and several known datasets show that an imbalanced dataset can have similar accuracy as a regular learner on a balanced dataset.


Subject(s)
Algorithms , Artificial Intelligence , Computational Biology/methods , DNA Methylation , Databases, Genetic , Epigenomics , Humans
17.
Gerontechnology ; 11(4): 534-544, 2013 Jan 01.
Article in English | MEDLINE | ID: mdl-24077428

ABSTRACT

Performing daily activities without assistance is important to maintaining an independent functional lifestyle. As a result, automated activity prompting systems can potentially extend the period of time that adults can age in place. In this paper we introduce AP, an algorithm to automate activity prompting based on smart home technology. AP learns prompt rules based on the time when activities are typically performed as well as the relationship between activities that normally occur in a sequence. We evaluate the AP algorithm based on smart home datasets and demonstrate its ability to operate within a physical smart environment.

18.
Article in English | MEDLINE | ID: mdl-24091397

ABSTRACT

Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.


Subject(s)
Artificial Intelligence , Computational Biology/methods , DNA Methylation/genetics , DNA/chemistry , DNA/genetics , Algorithms , Databases, Genetic , Humans
19.
Data Min Knowl Discov ; 1(4): 339-351, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21760755

ABSTRACT

The data mining and pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track activities that people normally perform as part of their daily routines. One question that frequently arises, however, is how many smart home sensors are needed and where should they be placed in order to accurately recognize activities? We employ data mining techniques to look at the problem of sensor selection for activity recognition in smart homes. We analyze the results based on six data sets collected in five distinct smart home environments.

SELECTION OF CITATIONS
SEARCH DETAIL