Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
J Biomed Inform ; 105: 103411, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32234546

RESUMO

Ensemble learning uses multiple algorithms to obtain better predictive performance than any single one of its constituent algorithms could. With the growing popularity of deep learning technologies, researchers have started to ensemble these technologies for various purposes. Few, if any, however, have used the deep learning approach as a means to ensemble Alzheimer's disease classification algorithms. This paper presents a deep ensemble learning framework that aims to harness deep learning algorithms to integrate multisource data and tap the 'wisdom of experts'. At the voting layer, two sparse autoencoders are trained for feature learning to reduce the correlation of attributes and diversify the base classifiers ultimately. At the stacking layer, a nonlinear feature-weighted method based on a deep belief network is proposed to rank the base classifiers, which may violate the conditional independence. The neural network is used as a meta classifier. At the optimizing layer, over-sampling and threshold-moving are used to cope with the cost-sensitive problem. Optimized predictions are obtained based on an ensemble of probabilistic predictions by similarity calculation. The proposed deep ensemble learning framework is used for Alzheimer's disease classification. Experiments with the clinical dataset from National Alzheimer's Coordinating Center demonstrate that the classification accuracy of our proposed framework is 4% better than six well-known ensemble approaches, including the standard stacking algorithm as well. Adequate coverage of more accurate diagnostic services can be provided by utilizing the wisdom of averaged physicians. This paper points out a new way to boost the primary care of Alzheimer's disease from the view of machine learning.


Assuntos
Doença de Alzheimer , Algoritmos , Doença de Alzheimer/diagnóstico , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
2.
BMC Bioinformatics ; 18(1): 47, 2017 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-28103789

RESUMO

BACKGROUND: Visualizing data by dimensionality reduction is an important strategy in Bioinformatics, which could help to discover hidden data properties and detect data quality issues, e.g. data noise, inappropriately labeled data, etc. As crowdsourcing-based synthetic biology databases face similar data quality issues, we propose to visualize biobricks to tackle them. However, existing dimensionality reduction methods could not be directly applied on biobricks datasets. Hereby, we use normalized edit distance to enhance dimensionality reduction methods, including Isomap and Laplacian Eigenmaps. RESULTS: By extracting biobricks from synthetic biology database Registry of Standard Biological Parts, six combinations of various types of biobricks are tested. The visualization graphs illustrate discriminated biobricks and inappropriately labeled biobricks. Clustering algorithm K-means is adopted to quantify the reduction results. The average clustering accuracy for Isomap and Laplacian Eigenmaps are 0.857 and 0.844, respectively. Besides, Laplacian Eigenmaps is 5 times faster than Isomap, and its visualization graph is more concentrated to discriminate biobricks. CONCLUSIONS: By combining normalized edit distance with Isomap and Laplacian Eigenmaps, synthetic biology biobircks are successfully visualized in two dimensional space. Various types of biobricks could be discriminated and inappropriately labeled biobricks could be determined, which could help to assess crowdsourcing-based synthetic biology databases' quality, and make biobricks selection.


Assuntos
Biologia Computacional/métodos , Modelos Teóricos , Dinâmica não Linear , Análise de Sequência de DNA , Biologia Sintética , Algoritmos , Análise por Conglomerados , Bases de Dados Factuais , Reprodutibilidade dos Testes
3.
Adv Exp Med Biol ; 1028: 55-78, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29058216

RESUMO

With the rapid increase in global aging, Alzheimer's disease has become a major burden in both social and economic costs. Substantial resources have been devoted to researching this disease, and rich multimodal data resources have been generated. In this chapter, we discuss an ongoing effort to build a data platform to harness these data to help research and prevention of Alzheimer's disease. We will detail this data platform in terms of its architecture, its data integration strategy, and its data services. Then, we will consider how to leverage this data platform to accelerate risk factor identification and pathogenesis study with its data analytics capability. This chapter will provide a concrete pathway for developing a data platform for studying and preventing insidious onset chronic diseases in this data era.


Assuntos
Doença de Alzheimer/prevenção & controle , Pesquisa Biomédica , Doença de Alzheimer/etiologia , Biomarcadores , Cognição , Humanos , Armazenamento e Recuperação da Informação , Fatores de Risco , Estatística como Assunto
4.
BMC Bioinformatics ; 16: 192, 2015 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-26063651

RESUMO

BACKGROUND: As the next-generation sequencing (NGS) technologies producing hundreds of millions of reads every day, a tremendous computational challenge is to map NGS reads to a given reference genome efficiently. However, existing methods of all-mappers, which aim at finding all mapping locations of each read, are very time consuming. The majority of existing all-mappers consist of 2 main parts, filtration and verification. This work significantly reduces verification time, which is the dominant part of the running time. RESULTS: An efficient all-mapper, BitMapper, is developed based on a new vectorized bit-vector algorithm, which simultaneously calculates the edit distance of one read to multiple locations in a given reference genome. Experimental results on both simulated and real data sets show that BitMapper is from several times to an order of magnitude faster than the current state-of-the-art all-mappers, while achieving higher sensitivity, i.e., better quality solutions. CONCLUSIONS: We present BitMapper, which is designed to return all mapping locations of raw reads containing indels as well as mismatches. BitMapper is implemented in C under a GPL license. Binaries are freely available at http://home.ustc.edu.cn/%7Echhy.


Assuntos
Algoritmos , Metodologias Computacionais , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Animais , Arabidopsis/genética , Caenorhabditis elegans/genética , Humanos , Mutação INDEL
5.
IEEE Trans Knowl Data Eng ; 26(11): 2599-2609, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25400485

RESUMO

The multiple longest common subsequence (MLCS) problem, related to the identification of sequence similarity, is an important problem in many fields. As an NP-hard problem, its exact algorithms have difficulty in handling large-scale data and time- and space-efficient algorithms are required in real-world applications. To deal with time constraints, anytime algorithms have been proposed to generate good solutions with a reasonable time. However, there exists little work on space-efficient MLCS algorithms. In this paper, we formulate the MLCS problem into a graph search problem and present two space-efficient anytime MLCS algorithms, SA-MLCS and SLA-MLCS. SA-MLCS uses an iterative beam widening search strategy to reduce space usage during the iterative process of finding better solutions. Based on SA-MLCS, SLA-MLCS, a space-bounded algorithm, is developed to avoid space usage from exceeding available memory. SLA-MLCS uses a replacing strategy when SA-MLCS reaches a given space bound. Experimental results show SA-MLCS and SLA-MLCS use an order of magnitude less space and time than the state-of-the-art approximate algorithm MLCS-APP while finding better solutions. Compared to the state-of-the-art anytime algorithm Pro-MLCS, SA-MLCS and SLA-MLCS can solve an order of magnitude larger size instances. Furthermore, SLA-MLCS can find much better solutions than SA-MLCS on large size instances.

6.
JMIR Aging ; 6: e50037, 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37962517

RESUMO

Background: Various older adult care settings have embraced the use of the life story approach to enhance the development of comprehensive care plans. However, organizing life stories and extracting useful information is labor-intensive, primarily due to the repetitive, fragmented, and redundant nature of life stories gathered from everyday communication scenarios. Existing life story systems, while available, do not adequately fulfill the requirements of users, especially in the application of care services. Objective: The objective of this study is to design, develop, and evaluate a digital system that provides caregivers with the necessary tools to view and manage the life stories of older adults, enabling expedited access to pertinent information effectively and visually. Methods: This study used a multidisciplinary, user-centered design approach across 4 phases: initial design requirements, prototyping, prototype refinement workshops, and usability testing. During the initial phase, we conducted field research in the Hefei Tianyu Senior Living Service Nursing Home, China, to discover how caregivers currently store and use life stories and their needs, challenges, and obstacles in organizing and retrieving information. Subsequently, we designed a low-fidelity prototype according to the users' requirements. A prototyping workshop involving 6 participants was held to collaboratively design and discuss the prototype's function and interaction. User feedback from the workshops was used to optimize the prototype, leading to the development of the system. We then designed 2 rounds of usability testing with 7 caregivers to evaluate the system's usability and effectiveness. Results: We identified 3 categories of functionalities that are necessary to include in the design of our initial low-fidelity prototype of life story visualizations: life story input, life story organization, and timeline generation. Subsequently, through the workshops, we identified 3 categories for functional optimization: feedback on user interface and usability, optimization suggestions for existing features, and the request for additional functionalities. Next, we designed a medium-fidelity prototype based on human-centered design. The Story Mosaic system underwent usability testing in the Hefei Tianyu Senior Living Service Nursing Home. Overall, 7 users recorded and organized 1123 life stories of 16 older adults. The usability testing results indicated that the system was accessible and easy to use for caregivers. Based on the feedback from the usability testing, we finalized the high-fidelity prototype. Conclusions: We designed, developed, and evaluated the Story Mosaic system to support the visual management of older adults' life stories. This system empowers caregivers through digital technology and innovative design, pioneering personal narrative integration in caregiving. This system can expand to include informal caregivers and family members for continued adaptability and empathy.

7.
Comput Biol Med ; 115: 103524, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31698234

RESUMO

Causal graphs play an essential role in the determination of causalities and have been applied in many domains including biology and medicine. Traditional causal graph construction methods are usually data-driven and may not deliver the desired accuracy of a graph. Considering the vast number of publications with causality knowledge, extracting causal relations from the literature to help to establish causal graphs becomes possible. Current supervised-learning-based causality extraction methods requires sufficient labeled data to train a model, and rule-based causality extraction methods are limited by the predefined patterns. This paper proposes a causality extraction framework by integrating rule-based methods and unsupervised learning models to overcome these limitations. The proposed method consists of three modules, including data preprocessing, syntactic pattern matching, and causality determination. In data preprocessing, abstracts are crawled based on attribute names before sentences are extracted and simplified. In syntactic pattern matching, these simplified sentences are parsed to obtain the part-of-speech tags, and triples are achieved based on these tags by matching the two designed syntactic patterns. In causality determination, four verb seed sets are initialized, and word vectors are constructed for the verbs in both the seed sets and the triples by applying an unsupervised machine learning model. Causal relations are identified by comparing the similarity between the verbs in each triple and that in each seed set to overcome the limitation of the seed sets. Causality extraction results on the attributes from the risk factors for Alzheimer's disease show that our method outperforms Bui's method and Alashri's method in terms of precision, recall, specificity, accuracy and F-score, with increases in the F-score of 8.29% and 5.37%, respectively.


Assuntos
Mineração de Dados , Semântica , Máquina de Vetores de Suporte , Humanos
8.
Comput Biol Med ; 86: 31-39, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-28499216

RESUMO

Synthetic biology databases have collected numerous biobricks to accelerate genetic circuit design. However, selecting biobricks is a tough task. Here, we leverage the fact that these manually designed circuits can provide underlying knowledge to support biobrick selection. We propose to design a recommendation system based on the analysis of available genetic circuits, which can narrow down the biobrick selection range and provide candidate biobricks for users to choose. A recommendation strategy based on a Markov model is established to tackle this issue. Furthermore, a biobrick chain recommendation algorithm Sira is proposed that applies a dynamic programming process on a layered state transition graph to obtain the top k recommendation results. In addition, a weighted filtering strategy, WFSira, is proposed to augment the performance of Sira. The experimental results on the Registry of Standard Biological Parts show that Sira outperforms other algorithms significantly for biobrick recommendations, with approximately 30% improvement in terms of recall rate. It is also able to make biobrick chain recommendations. WFSira can further improve the recall rate of Sira by an average of 7.5% for the top 5 recommendations.


Assuntos
Bases de Dados Genéticas , Modelos Genéticos , Linguagens de Programação , Cadeias de Markov
9.
Artigo em Inglês | MEDLINE | ID: mdl-23929861

RESUMO

An enormous amount of sequence data has been generated with the development of new DNA sequencing technologies, which presents great challenges for computational biology problems such as haplotype phasing. Although arduous efforts have been made to address this problem, the current methods still cannot efficiently deal with the incoming flood of large-scale data. In this paper, we propose a flow network model to tackle haplotype phasing problem, and explain some classical haplotype phasing rules based on this model. By incorporating the heuristic knowledge obtained from these classical rules, we design an algorithm FNphasing based on the flow network model. Theoretically, the time complexity of our algorithm is (O(n(2)m+m(2)), which is better than that of 2SNP, one of the most efficient algorithms currently. After testing the performance of FNphasing with several simulated data sets, the experimental results show that when applied on large-scale data sets, our algorithm is significantly faster than the state-of-the-art Beagle algorithm. FNphasing also achieves an equal or superior accuracy compared with other approaches.


Assuntos
Algoritmos , Biologia Computacional/métodos , Haplótipos/genética , Modelos Genéticos , Simulação por Computador , Genótipo , Projeto HapMap , Humanos , Peptidil Dipeptidase A/genética , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA