Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nature ; 624(7991): 378-389, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38092917

RESUMO

Recent advances in single-cell technologies have led to the discovery of thousands of brain cell types; however, our understanding of the gene regulatory programs in these cell types is far from complete1-4. Here we report a comprehensive atlas of candidate cis-regulatory DNA elements (cCREs) in the adult mouse brain, generated by analysing chromatin accessibility in 2.3 million individual brain cells from 117 anatomical dissections. The atlas includes approximately 1 million cCREs and their chromatin accessibility across 1,482 distinct brain cell populations, adding over 446,000 cCREs to the most recent such annotation in the mouse genome. The mouse brain cCREs are moderately conserved in the human brain. The mouse-specific cCREs-specifically, those identified from a subset of cortical excitatory neurons-are strongly enriched for transposable elements, suggesting a potential role for transposable elements in the emergence of new regulatory programs and neuronal diversity. Finally, we infer the gene regulatory networks in over 260 subclasses of mouse brain cells and develop deep-learning models to predict the activities of gene regulatory elements in different brain cell types from the DNA sequence alone. Our results provide a resource for the analysis of cell-type-specific gene regulation programs in both mouse and human brains.


Assuntos
Encéfalo , Cromatina , Análise de Célula Única , Animais , Humanos , Camundongos , Encéfalo/citologia , Encéfalo/metabolismo , Córtex Cerebral/citologia , Cromatina/química , Cromatina/genética , Cromatina/metabolismo , Aprendizado Profundo , Elementos de DNA Transponíveis/genética , Redes Reguladoras de Genes/genética , Neurônios/metabolismo
2.
Science ; 382(6667): eadf7044, 2023 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-37824643

RESUMO

Recent advances in single-cell transcriptomics have illuminated the diverse neuronal and glial cell types within the human brain. However, the regulatory programs governing cell identity and function remain unclear. Using a single-nucleus assay for transposase-accessible chromatin using sequencing (snATAC-seq), we explored open chromatin landscapes across 1.1 million cells in 42 brain regions from three adults. Integrating this data unveiled 107 distinct cell types and their specific utilization of 544,735 candidate cis-regulatory DNA elements (cCREs) in the human genome. Nearly a third of the cCREs demonstrated conservation and chromatin accessibility in the mouse brain cells. We reveal strong links between specific brain cell types and neuropsychiatric disorders including schizophrenia, bipolar disorder, Alzheimer's disease (AD), and major depression, and have developed deep learning models to predict the regulatory roles of noncoding risk variants in these disorders.


Assuntos
Atlas como Assunto , Encéfalo , Cromatina , Animais , Humanos , Camundongos , Encéfalo/citologia , Encéfalo/metabolismo , Cromatina/metabolismo , DNA/metabolismo , Neurônios/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Célula Única
3.
Curr Neurovasc Res ; 19(2): 171-180, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35652392

RESUMO

BACKGROUND: Spinal cord injury (SCI) is regarded as an acute neurological disorder, and astrocytes play a role in the progression of SCI. OBJECTIVE: Herein, we investigated the roles of homeodomain-interacting protein kinase 2 (HIPK2)- modified rat spinal astrocytes in neurofunctional recovery after SCI. METHODS: Rat spinal astrocytes were cultured, isolated, and then identified through microscopic observation and immunofluorescence staining. Astrocytes were infected with the adenovirus vector overexpressing HIPK2 for modification, and proliferation and apoptosis of astrocytes were examined using Cell Counting Kit-8 method and flow cytometry. SCI rat models were established and treated with astrocytes or HIPK2-modified astrocytes. Subsequently, rat motor ability was analyzed via the Basso-Beattie-Bresnahan (BBB) scoring and inclined-plane test, and the damage to spinal cord tissues and neuronal survival were observed via Hematoxylin-eosin staining and Nissl staining. The levels of HIPK2, brain-derived neurotrophic factor (BDNF), glial cell line-derived neurotrophic factor (GDNF), interleukin (IL)-1ß, tumor necrosis factor (TNF)-α, and nuclear factor erythroid 2- related transcription factor 2 (Nrf2)/antioxidant response element (ARE) pathway-related proteins were detected. RESULTS: Rat spinal astrocytes were harvested successfully. HIPK2 overexpression accelerated the proliferation and repressed the apoptosis of rat spinal astrocytes. Rat spinal astrocytes treatment increased BBB points and the maximum angle at which SCI rats remained stable, ameliorated damage to spinal cord tissues, increased the number of neurons, and attenuated neural damage and inflammation, while the treatment of HIPK2-modified rat spinal astrocytes imparted more pronounced effects to the neurofunctional recovery of SCI rats. Meanwhile, HIPK2-modified rat spinal astrocytes further activated the Nrf2/ARE pathway. CONCLUSION: HIPK2-modified rat spinal astrocytes facilitated neurofunctional recovery and activated the Nrf2/ARE pathway after SCI.


Assuntos
Traumatismos da Medula Espinal , Ratos , Animais , Ratos Sprague-Dawley , Traumatismos da Medula Espinal/terapia , Traumatismos da Medula Espinal/patologia , Astrócitos/metabolismo , Medula Espinal/metabolismo , Apoptose , Recuperação de Função Fisiológica , Proteínas Serina-Treonina Quinases/metabolismo
4.
NAR Genom Bioinform ; 4(2): lqac032, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35493723

RESUMO

DNA viruses are important infectious agents known to mediate a large number of human diseases, including cancer. Viral integration into the host genome and the formation of hybrid transcripts are also associated with increased pathogenicity. The high variability of viral genomes, however requires the use of sensitive ensemble hidden Markov models that add to the computational complexity, often requiring > 40 CPU-hours per sample. Here, we describe FastViFi, a fast 2-stage filtering method that reduces the computational burden. On simulated and cancer genomic data, FastViFi improved the running time by 2 orders of magnitude with comparable accuracy on challenging data sets. Recently published methods have focused on identification of location of viral integration into the human host genome using local assembly, but do not extend to RNA. To identify human viral hybrid transcripts, we additionally developed ensemble Hidden Markov Models for the Epstein Barr virus (EBV) to add to the models for Hepatitis B (HBV), Hepatitis C (HCV) viruses and the Human Papillomavirus (HPV), and used FastViFi to query RNA-seq data from Gastric cancer (EBV) and liver cancer (HBV/HCV). FastViFi ran in <10 minutes per sample and identified multiple hybrids that fuse viral and human genes suggesting new mechanisms for oncoviral pathogenicity. FastViFi is available at https://github.com/sara-javadzadeh/FastViFi.

5.
Physiol Int ; 108(3): 317-341, 2021 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-34529586

RESUMO

OBJECTIVE: To evaluate in-vivo and in-vitro effects of ferulic acid (FA) on glucocorticoid-induced osteoarthritis (GIO) to establish its possible underlying mechanisms. METHODS: The effects of FA on cell proliferation, cell viability (MTT assay), ALP activity, and mineralization assay, and oxidative stress markers (ROS, SOD, GSH LDH and MDA levels) were investigated by MC3T3-E1 cell line. Wistar rats received standard saline (control group) or dexamethasone (GC, 2 mg-1 kg) or DEX+FA (50 and 100 mg-1 kg) orally for 8 weeks. Bone density, micro-architecture, bio-mechanics, bone turnover markers and histo-morphology were determined. The expression of OPG, RANKL, osteogenic markers, and other signalling proteins was assessed employing quantitative RT-PCR and Western blotting. RESULTS: The findings indicated the elevation of ALP mRNA expressions, osteogenic markers (Runx-2, OSX, Col-I, and OSN), and the ß-Catenin, Lrp-5 and GSK-3ß protein expressions. FA showed the potential to increase MC3T3-E1 cell differentiation, proliferation, and mineralization. FA increased oxidative stress markers (SOD, MDA, and GSH) while decreasing ROS levels and lactate dehydrogenase release in GIO rats. The OPG/RANKL mRNA expression ratio was increased by FA, followed by improved GSK-3ß and ERK phosphorylation with enhanced mRNA expressions of Lrp-5 and ß-catenin. CONCLUSION: These findings showed that FA improved osteoblasts proliferation with oxidative stress suppression by controlling the Lrp-5/GSK-3ß/ERK pathway in GIO, demonstrating the potential pathways involved in the mechanism of actions of FA in GIO therapy.


Assuntos
Glucocorticoides , Osteoporose , Animais , Diferenciação Celular , Ácidos Cumáricos , Glucocorticoides/farmacologia , Glicogênio Sintase Quinase 3 beta , Sistema de Sinalização das MAP Quinases , Osteoblastos , Osteoporose/induzido quimicamente , Osteoporose/tratamento farmacológico , Ratos , Ratos Wistar
6.
Bioinformatics ; 35(10): 1745-1752, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-30307536

RESUMO

MOTIVATION: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases. Although recent studies explored using neural network models for BioNER to free experts from manual feature engineering, the performance remains limited by the available training data for each entity type. RESULTS: We propose a multi-task learning framework for BioNER to collectively use the training data of different types of entities and improve the performance on each of them. In experiments on 15 benchmark BioNER datasets, our multi-task model achieves substantially better performance compared with state-of-the-art BioNER systems and baseline neural sequence labeling models. Further analysis shows that the large performance gains come from sharing character- and word-level information among relevant biomedical entities across differently labeled corpora. AVAILABILITY AND IMPLEMENTATION: Our source code is available at https://github.com/yuzhimanhua/lm-lstm-crf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Benchmarking , Software
7.
Biomed Pharmacother ; 103: 127-134, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29649627

RESUMO

HIPK2 is considered to be a tumor suppressor. It also has been implicated in several functions such as apoptosis and inflammation that are linked to spinal cord injury (SCI). However, whether HIPK2 ameliorates the neurological pain of SCI remains unclear. Here, we investigated the effects of HIPK2 on neurological function, oxidative stress, levels of inflammatory cytokines and expression of Bcl-2/Bax in an SCI model. Firstly, we evaluated the therapeutic effects of HIPK2 on neurological pain in the SCI rat using the Basso, Beattie and Bresnahan scores and H & E staining. Overexpression of HIPK2 significantly elevated the levels of brain-derived neurotrophic factor (BDNF) and glial cell line-derived neurotrophic factor (GDNF), and reduced the mRNA expression of Nogo-A and RhoA in SCI rats. Furthermore, terminal deoxynucleotidyl transferase-mediated dUTP nick end labeling (TUNEL) assays showed that overexpression of HIPK2 significantly reduced the number of apoptotic cells. Overexpression of HIPK2 also decreased expression of Bax and Caspase-3 and elevated expression of Bcl-2 in the SCI model, indicating that HIPK2 exhibited its protective activity by inhibiting SCI-induced apoptosis. Then, we measured the serum concentrations of malondialdehyde (MDA), superoxide dismutase (SOD), catalase (CAT), and glutathione peroxidase (GSH-PX). We also determined the mRNA and protein levels of nuclear factor-κB p65 unit, tumor necrosis factor-α (TNF-α), and interleukin (IL)-1ß. HIPK2 overexpression reduced oxidative stress and the levels of inflammatory cytokines compared with SCI control animals. Additionally, acetylation of HIPK2 was reduced in SCI rats. Overexpression of HIPK2 could enhance autophagy by elevating the expression of Beclin-1 and LC3-II while autophagy is regarded as a beneficial regulator to improve spinal cord injury. Together, overexpression of HIPK2 improved contusive SCI induced pain by modulating oxidative stress, Bcl­2 and Bax signaling, and inflammation, and also regulating autophagy.


Assuntos
Apoptose , Inflamação/patologia , Estresse Oxidativo , Proteínas Serina-Treonina Quinases/metabolismo , Traumatismos da Medula Espinal/enzimologia , Traumatismos da Medula Espinal/patologia , Animais , Anti-Inflamatórios/metabolismo , Antioxidantes/metabolismo , Comportamento Animal , Fator Neurotrófico Derivado do Encéfalo/metabolismo , Contagem de Células , Fator Neurotrófico Derivado de Linhagem de Célula Glial/metabolismo , Masculino , Proteínas Nogo/metabolismo , Células PC12 , Ratos , Ratos Sprague-Dawley , Medula Espinal/metabolismo , Medula Espinal/patologia , Proteína rhoA de Ligação ao GTP/metabolismo
8.
IEEE Trans Knowl Data Eng ; 30(7): 1226-1239, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30745791

RESUMO

In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical and high dimensional features into a comprehensive structure with rich interpretable information in the data. In this paper, we propose a novel Discriminative Pattern-based Prediction framework (DPPred) to accomplish the prediction tasks by taking their advantages of both effectiveness and interpretability. Specifically, DPPred adopts the concise discriminative patterns that are on the prefix paths from the root to leaf nodes in the tree-based models. DPPred selects a limited number of the useful discriminative patterns by searching for the most effective pattern combination to fit generalized linear models. Extensive experiments show that in many scenarios, DPPred provides competitive accuracy with the state-of-the-art as well as the valuable interpretability for developers and experts. In particular, taking a clinical application dataset as a case study, our DPPred outperforms the baselines by using only 40 concise discriminative patterns out of a potentially exponentially large set of patterns.

9.
IEEE Trans Knowl Data Eng ; 30(10): 1825-1837, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31105412

RESUMO

As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus and has various downstream applications including information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and genres without extra but expensive adaption. None of the state-of-the-art models, even data-driven models, is fully automated because they require human experts for designing rules or labeling phrases. In this paper, we propose a novel framework for automated phrase mining, AutoPhrase, which supports any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger. Compared to the state-of-the-art methods, AutoPhrase has shown significant improvements in both effectiveness and efficiency on five real-world datasets across different domains and languages. Besides, AutoPhrase can be extend to model single-word quality phrases.

10.
Drug Des Devel Ther ; 11: 3491-3495, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29255350

RESUMO

OBJECTIVE: We aimed to evaluate whether the polymorphism of poly(ADP-ribose) polymerase-1 (PARP-1) is involved as potential risk factor in the development of spinal cord injury (SCI) among Chinese individuals. PATIENTS AND METHODS: Patients with a confirmed diagnosis of SCI (other than traumatic injury) and healthy individuals with no clinical symptoms of SCI were enrolled at Spinal Cord Injury Care Center, The Third People's Hospital of Dalian, China. Genetic polymorphisms were studied in plasma samples by polymerase chain reaction-restriction fragment length polymorphism assay. RESULTS: A total of 130 Chinese patients with SCI and 130 healthy Chinese individuals were included. We found that patients with the GG genotype (odds ratio [OR]: 4.09, 95% confidence interval [CI] 2.42-6.90, P<0.001) and carriers of the G allele (OR 3.96, 95% CI 2.33-6.74, P<0.0001) were at high risk of developing SCI. A del/ins polymorphism of the NF-κB1 gene (OR 3.32, 95% CI 1.96-5.61, P<0.001) was also found to be associated with SCI. CONCLUSION: Our study suggests that PARP-1 polymorphisms are involved in the development of SCI in Chinese individuals. Thus, PARP-1 polymorphisms can be considered as one of the potential risk factors for developing SCI.


Assuntos
Poli(ADP-Ribose) Polimerase-1/genética , Traumatismos da Medula Espinal/genética , Adolescente , Adulto , Idoso , Povo Asiático/genética , China , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , NF-kappa B/metabolismo , Poli(ADP-Ribose) Polimerase-1/sangue , Polimorfismo Genético/genética , Fatores de Risco , Traumatismos da Medula Espinal/sangue , Traumatismos da Medula Espinal/diagnóstico , Adulto Jovem
11.
J Comput Biol ; 24(9): 923-941, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28570104

RESUMO

Protein-protein interaction (PPI) networks, providing a comprehensive landscape of protein interaction patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform a job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks. Identification of such densely interconnected clusters or protein complexes from PPI networks enables us to obtain a better understanding of the hierarchy and organization of biological processes and cellular components. However, most existing graph clustering algorithms on PPI networks often cannot effectively detect densely connected subgraphs and overlapped subgraphs. In this article, we formulate the problem of complex detection as diversified dense subgraph mining and introduce a novel approximation algorithm to efficiently enumerate putative protein complexes from biological networks. The key insight of our algorithm is that instead of enumerating all dense subgraphs, we only need to find a small diverse subset of subgraphs that cover as many proteins as possible. The problem is modeled as finding a diverse set of maximal dense subgraphs where we develop highly effective pruning techniques to guarantee efficiency. To scale up to large networks, we devise a divide-and-conquer approach to speed up the algorithm in a distributed manner. By comparing with existing clustering and dense subgraph-based algorithms on several yeast and human PPI networks, we demonstrate that our method can detect more putative protein complexes and achieves better prediction accuracy.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Software , Humanos , Mapeamento de Interação de Proteínas/normas , Leveduras/genética , Leveduras/metabolismo
12.
Gigascience ; 6(5): 1-10, 2017 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-28327993

RESUMO

Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics.


Assuntos
Crowdsourcing , Estudo de Associação Genômica Ampla , Software , Genótipo , Humanos , Invenções , Modelos Logísticos , Fenótipo
13.
Proc SIAM Int Conf Data Min ; 2016: 567-575, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-28163983

RESUMO

Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.

14.
Proc ACM Int Conf Inf Knowl Manag ; 2016: 939-948, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28232874

RESUMO

Users often write reviews on different themes involving linguistic structures with complex sentiments. The sentiment polarity of a word can be different across themes. Moreover, contextual valence shifters may change sentiment polarity depending on the contexts that they appear in. Both challenges cannot be modeled effectively and explicitly in traditional sentiment analysis. Studying both phenomena requires multi-theme sentiment analysis at the word level, which is very interesting but significantly more challenging than overall polarity classification. To simultaneously resolve the multi-theme and sentiment shifting problems, we propose a data-driven framework to enable both capabilities: (1) polarity predictions of the same word in reviews of different themes, and (2) discovery and quantification of contextual valence shifters. The framework formulates multi-theme sentiment by factorizing the review sentiments with theme/word embeddings and then derives the shifter effect learning problem as a logistic regression. The improvement of sentiment polarity classification accuracy demonstrates not only the importance of multi-theme and sentiment shifting, but also effectiveness of our framework. Human evaluations and case studies further show the success of multi-theme word sentiment predictions and automatic effect quantification of contextual valence shifters.

15.
Proc SIAM Int Conf Data Min ; 2016: 558-566, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-28174677

RESUMO

Consecutive pattern mining aiming at finding sequential patterns substrings, is a special case of frequent pattern mining and has been played a crucial role in many real world applications, especially in biological sequence analysis, time series analysis, and network log mining. Approximations, including insertions, deletions, and substitutions, between strings are widely used in biological sequence comparisons. However, most existing string pattern mining methods only consider hamming distance without insertions/deletions (indels). Little attention has been paid to the general approximate consecutive frequent pattern mining under edit distance, potentially due to the high computational complexity, particularly on DNA sequences with billions of base pairs. In this paper, we introduce an efficient solution to this problem. We first formulate the Maximal Approximate Consecutive Frequent Pattern Mining (MACFP) problem that identifies substring patterns under edit distance in a long query sequence. Then, we propose a novel algorithm with linear time complexity to check whether the support of a substring pattern is above a predefined threshold in the query sequence, thus greatly reducing the computational complexity of MACFP. With this fast decision algorithm, we can efficiently solve the original pattern discovery problem with several indexing and searching techniques. Comprehensive experiments on sequence pattern analysis and a study on cancer genomics application demonstrate the effectiveness and efficiency of our algorithm, compared to several existing methods.

16.
ACM BCB ; 2016: 41-49, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28174760

RESUMO

Smartphones are ubiquitous now, but it is still unclear what physiological functions they can monitor at clinical quality. Pulmonary function is a standard measure of health status for cardiopulmonary patients. We have shown that predictive models can accurately classify cardiopulmonary conditions from healthy status, as well as different severity levels within cardiopulmonary disease, the GOLD stages. Here we propose several universal models to monitor cardiopulmonary conditions, including DPClass, a novel learning approach we designed. We carefully prepare motion dataset covering status from GOLD 0 (healthy), GOLD 1 (mild), GOLD 2 (moderate), all the way to GOLD 3 (severe). Sixty-six subjects participate in this study. After de-identification, their walking data are applied to train the predictive models. The RBF-SVM model yields the highest accuracy while the DPClass model provides better interpretation of the model mechanisms. We not only provide promising solutions to monitor health status by simply carrying a smartphone, but also demonstrate how demographics influences predictive models of cardiopulmonary disease.

17.
F1000Res ; 5: 2806, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28299178

RESUMO

Reliable predictions on the risk and survival time of prostate cancer patients based on their clinical records can help guide their treatment and provide hints about the disease mechanism. The Cox regression is currently a commonly accepted approach for such tasks in clinical applications. More complex methods, like ensemble approaches, have the potential of reaching better prediction accuracy at the cost of increased training difficulty and worse result interpretability. Better performance on a specific data set may also be obtained by extensive manual exploration in the data space, but such developed models are subject to overfitting and usually not directly applicable to a different data set. We propose DWCox, a density-weighted Cox model that has improved robustness against outliers and thus can provide more accurate predictions of prostate cancer survival. DWCox assigns weights to the training data according to their local kernel density in the feature space, and incorporates those weights into the partial likelihood function. A linear regression is then used to predict the actual survival times from the predicted risks. In the 2015 Prostate Cancer DREAM Challenge, DWCox obtained the best average ranking in prediction accuracy on the risk and survival time. The success of DWCox is remarkable given that it is one of the smallest and most interpretable models submitted to the challenge. In simulations, DWCox performed consistently better than a standard Cox model when the training data contained many sparsely distributed outliers. Although developed for prostate cancer patients, DWCox can be easily re-trained and applied to other survival analysis problems. DWCox is implemented in R and can be downloaded from https://github.com/JinfengXiao/DWCox.

18.
Proc Int World Wide Web Conf ; 2016: 1057-1067, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28229132

RESUMO

Many text mining approaches adopt bag-of-words or n-grams models to represent documents. Looking beyond just the words, i.e., the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate other related concepts in the document representation. But these methods are not desirable when applied to vertical domains (e.g., literature, enterprise, etc.) due to low coverage of in-domain concepts in the general knowledge base and interference from out-of-domain concepts. In this paper, we propose a data-driven model named Latent Keyphrase Inference (LAKI) that represents documents with a vector of closely related domain keyphrases instead of single words or existing concepts in the knowledge base. We show that given a corpus of in-domain documents, topical content units can be learned for each domain keyphrase, which enables a computer to do smart inference to discover latent document keyphrases, going beyond just explicit mentions. Compared with the state-of-art document representation approaches, LAKI fills the gap between bag-of-words and concept-based models by using domain keyphrases as the basic representation unit. It removes dependency on a knowledge base while providing, with keyphrases, readily interpretable representations. When evaluated against 8 other methods on two text mining tasks over two corpora, LAKI outperformed all.

19.
Proc ACM SIGMOD Int Conf Manag Data ; 2015: 1729-1744, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26705375

RESUMO

Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...