Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
J Healthc Inform Res ; 8(2): 370-399, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38681757

RESUMO

With an increased interest in the production of personal health technologies designed to track user data (e.g., nutrient  intake, step counts), there is now more opportunity than ever to surface meaningful behavioral insights to everyday users in the form of natural language. This knowledge can increase their behavioral awareness and allow them to take action to meet their health goals. It can also bridge the gap between the vast collection of personal health data and the summary generation required to describe an individual's behavioral tendencies. Previous work has focused on rule-based time-series data summarization methods designed to generate natural language summaries of interesting patterns found within temporal personal health data. We examine recurrent, convolutional, and Transformer-based encoder-decoder models to automatically generate natural language summaries from numeric temporal personal health data. We showcase the effectiveness of our models on real user health data logged in MyFitnessPal (Weber and Achananuparp 2016) and show that we can automatically generate high-quality natural language summaries. Our work serves as a first step towards the ambitious goal of automatically generating novel and meaningful temporal summaries from personal health data.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37093721

RESUMO

Knowledge graph (KG) question generation (QG) aims to generate natural language questions from KGs and target answers. Previous works mostly focus on a simple setting that is to generate questions from a single KG triple. In this work, we focus on a more realistic setting where we aim to generate questions from a KG subgraph and target answers. In addition, most previous works built on either RNN-or Transformer-based models to encode a linearized KG subgraph, which totally discards the explicit structure information of a KG subgraph. To address this issue, we propose to apply a bidirectional Graph2Seq model to encode the KG subgraph. Furthermore, we enhance our RNN decoder with a node-level copying mechanism to allow direct copying of node attributes from the KG subgraph to the output question. Both automatic and human evaluation results demonstrate that our model achieves new state-of-the-art scores, outperforming existing methods by a significant margin on two QG benchmarks. Experimental results also show that our QG model can consistently benefit the question-answering (QA) task as a means of data augmentation.

3.
Front Big Data ; 5: 1044709, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36466714

RESUMO

The network embedding task is to represent a node in a network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for associative learning. Our network learns associations between the content of each node and that node's neighbors. These associations serve as memories in the MHN. The recurrent dynamics of the network make it possible to recover the masked node, given that node's neighbors. Our proposed method is evaluated on different benchmark datasets for downstream tasks such as node classification, link prediction, and graph coarsening. The results show competitive performance compared to the common matrix factorization techniques and deep learning based methods.

4.
J Clin Endocrinol Metab ; 107(4): e1390-e1401, 2022 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-34888676

RESUMO

CONTEXT: Fracture risk is underestimated in people with type 2 diabetes (T2D). OBJECTIVE: To investigate the longitudinal relationship of glycated hemoglobin (HbA1c) and common medications on fracture risk in people with T2D. METHODS: This retrospective population-based cohort study was conducted using de-identified claims and electronic health record data obtained from the OptumLabs Data Warehouse for the period January 1, 2007, to September 30, 2015. For each individual, the study was conducted within a 2-year HbA1c observation period and a 2-year fracture follow-up period. A cohort of 157 439 individuals with T2D [age ≥ 55 years with mean HbA1c value ≥ 6%] were selected from 4 018 250 US Medicare Advantage/Commercial enrollees with a T2D diagnosis. All fractures and fragility fractures were measured. RESULTS: With covariates adjusted, poor glycemic control in T2D individuals was associated with an 29% increase of all fracture risk, compared with T2D individuals who had adequate glycemic control (HR: 1.29; 95% CI, 1.22-1.36). Treatment with metformin (HR: 0.88; 95% CI, 0.85-0.92) and DPP4 inhibitors (HR: 0.93; 95% CI, 0.88-0.98) was associated with a reduced all fracture risk, while insulin (HR: 1.26; 95% CI, 1.21-1.32), thiazolidinediones (HR: 1.23; 95% CI, 1.18-1.29), and meglitinides (HR: 1.12; 95% CI, 1.00-1.26) were associated with an increased all fracture risk (All P value < 0.05). Bisphosphonates were associated similarly with increased fracture risk in the T2D and nondiabetic groups. CONCLUSION: Longitudinal 2-year HbA1c is independently associated with elevated all fracture risk in T2D individuals during a 2-year follow-up period. Metformin and DPP4 inhibitors can be used for management of T2D fracture risk.


Assuntos
Diabetes Mellitus Tipo 2 , Inibidores da Dipeptidil Peptidase IV , Fraturas Ósseas , Metformina , Idoso , Glicemia , Estudos de Coortes , Diabetes Mellitus Tipo 2/complicações , Diabetes Mellitus Tipo 2/tratamento farmacológico , Inibidores da Dipeptidil Peptidase IV/uso terapêutico , Fraturas Ósseas/epidemiologia , Fraturas Ósseas/etiologia , Hemoglobinas Glicadas/análise , Hemoglobinas , Humanos , Hipoglicemiantes/uso terapêutico , Medicare , Metformina/uso terapêutico , Pessoa de Meia-Idade , Estudos Retrospectivos , Estados Unidos/epidemiologia
7.
PLoS One ; 9(9): e108011, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25264906

RESUMO

Novel sequences are DNA sequences present in an individual's genome but absent in the human reference assembly. They are predicted to be biologically important, both individual and population specific, and consistent with the known human migration paths. Recent works have shown that an average person harbors 2-5 Mb of such sequences and estimated that the human pan-genome contains as high as 19-40 Mb of novel sequences. To identify them in a de novo genome assembly, some existing sequence aligners have been used but no computational method has been specifically proposed for this task. In this work, we developed NSIT (Novel Sequence Identification Tool), a software that can accurately and efficiently identify novel sequences in an individual's de novo whole genome assembly. We identified and characterized 1.1 Mb, 1.2 Mb, and 1.0 Mb of novel sequences in NA18507 (African), YH (Asian), and NA12878 (European) de novo genome assemblies, respectively. Our results show very high concordance with the previous work using the respective reference assembly. In addition, our results using the latest human reference assembly suggest that the amount of novel sequences per individual may not be as high as previously reported. We additionally developed a graphical viewer for comparisons of novel sequence contents. The viewer also helped in identifying sequence contamination; we found 130 kb of Epstein-Barr virus sequence in the previously published NA18507 novel sequences as well as 287 kb of zebrafish repeats in NA12878 de novo assembly. NSIT requires [Formula: see text]2GB of RAM and 1.5-2 hrs on a commodity desktop. The program is applicable to input assemblies with varying contig/scaffold sizes, ranging from 100 bp to as high as 50 Mb. It works in both 32-bit and 64-bit systems and outperforms, by large margins, other fast sequence aligners previously applied to this task. To our knowledge, NSIT is the first software designed specifically for novel sequence identification in a de novo human genome assembly.


Assuntos
Análise de Sequência de DNA/métodos , Alinhamento de Sequência
8.
Algorithms Mol Biol ; 5: 12, 2010 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-20047669

RESUMO

BACKGROUND: Proteins have evolved subject to energetic selection pressure for stability and flexibility. Structural similarity between proteins that have gone through conformational changes can be captured effectively if flexibility is considered. Topologically unrelated proteins that preserve secondary structure packing interactions can be detected if both flexibility and Sequential permutations are considered. We propose the FlexSnap algorithm for flexible non-topological protein structural alignment. RESULTS: The effectiveness of FlexSnap is demonstrated by measuring the agreement of its alignments with manually curated non-sequential structural alignments. FlexSnap showed competitive results against state-of-the-art algorithms, like DALI, SARF2, MultiProt, FlexProt, and FATCAT. Moreover on the DynDom dataset, FlexSnap reported longer alignments with smaller rmsd. CONCLUSIONS: We have introduced FlexSnap, a greedy chaining algorithm that reports both sequential and non-sequential alignments and allows twists (hinges). We assessed the quality of the FlexSnap alignments by measuring its agreements with manually curated non-sequential alignments. On the FlexProt dataset, FlexSnap was competitive to state-of-the-art flexible alignment methods. Moreover, we demonstrated the benefits of introducing hinges by showing significant improvements in the alignments reported by FlexSnap for the structure pairs for which rigid alignment methods reported alignments with either low coverage or large rmsd. AVAILABILITY: An implementation of the FlexSnap algorithm will be made available online at http://www.cs.rpi.edu/~zaki/software/flexsnap.

9.
PLoS One ; 4(10): e7627, 2009 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-19859549

RESUMO

During atherogenesis and vascular inflammation quiescent platelets are activated to increase the surface expression and ligand affinity of the integrin alphaIIbbeta3 via inside-out signaling. Diverse signals such as thrombin, ADP and epinephrine transduce signals through their respective GPCRs to activate protein kinases that ultimately lead to the phosphorylation of the cytoplasmic tail of the integrin alphaIIbbeta3 and augment its function. The signaling pathways that transmit signals from the GPCR to the cytosolic domain of the integrin are not well defined. In an effort to better understand these pathways, we employed a combination of proteomic profiling and computational analyses of isolated human platelets. We analyzed ten independent human samples and identified a total of 1507 unique proteins in platelets. This is the most comprehensive platelet proteome assembled to date and includes 190 membrane-associated and 262 phosphorylated proteins, which were identified via independent proteomic and phospho-proteomic profiling. We used this proteomic dataset to create a platelet protein-protein interaction (PPI) network and applied novel contextual information about the phosphorylation step to introduce limited directionality in the PPI graph. This newly developed contextual PPI network computationally recapitulated an integrin signaling pathway. Most importantly, our approach not only provided insights into the mechanism of integrin alphaIIbbeta3 activation in resting platelets but also provides an improved model for analysis and discovery of PPI dynamics and signaling pathways in the future.


Assuntos
Plaquetas/metabolismo , Regulação da Expressão Gênica , Integrinas/metabolismo , Proteômica/métodos , Motivos de Aminoácidos , Biologia Computacional , Citometria de Fluxo/métodos , Humanos , Espectrometria de Massas/métodos , Fosforilação , Agregação Plaquetária , Proteoma , Transdução de Sinais
10.
J Bioinform Comput Biol ; 7(3): 571-96, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19507290

RESUMO

Structural similarity between proteins gives us insights into their evolutionary relationships when there is low sequence similarity. In this paper, we present a novel approach called SNAP for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process consisting of a superposition step and an alignment step, until convergence. We propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of SNAP alignments were assessed by comparing against the manually curated reference alignments in the challenging SISY and RIPC datasets. Moreover, when applied to a dataset of 4410 protein pairs selected from the CATH database, SNAP produced longer alignments with lower rmsd than several state-of-the-art alignment methods. Classification of folds using SNAP alignments was both highly sensitive and highly selective. The SNAP software along with the datasets are available online at http://www.cs.rpi.edu/~zaki/software/SNAP.


Assuntos
Algoritmos , Proteínas/química , Homologia Estrutural de Proteína , Aldose-Cetose Isomerases/química , Proteínas de Bactérias/química , Análise por Conglomerados , Biologia Computacional , Sistema Enzimático do Citocromo P-450/química , Bases de Dados de Proteínas , Leghemoglobina/química , Modelos Moleculares , NADPH-Ferri-Hemoproteína Redutase/química , Conformação Proteica , Curva ROC , Software , Tiamina Pirofosfoquinase/química
12.
Pac Symp Biocomput ; : 90-101, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18229678

RESUMO

UNLABELLED: With advances in high-throughput sequencing methods, and the corresponding exponential growth in sequence data, it has become critical to develop scalable data management techniques for sequence storage, retrieval and analysis. In this paper we present a novel disk-based suffix tree approach, called TRELLIS+, that effectively scales to massive amount of sequence data using only a limited amount of main-memory, based on a novel string buffering strategy. We show experimentally that TRELLIS+ outperforms existing suffix tree approaches; it is able to index genome-scale sequences (e.g., the entire Human genome), and it also allows rapid query processing over the disk-based index. AVAILABILITY: TRELLIS+ source code is available online at http://www.cs.rpi.edu/-zaki/software/trellis


Assuntos
Algoritmos , Genômica/estatística & dados numéricos , Indexação e Redação de Resumos , Animais , Biologia Computacional , Sistemas de Gerenciamento de Base de Dados , Genoma Humano , Humanos , Análise de Sequência/estatística & dados numéricos , Software
13.
Methods Mol Biol ; 413: 147-69, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18075165

RESUMO

Approaches for indexing proteins and fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this chapter, we describe a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Calpha atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results show classification accuracy up to 97.8 and 99.4% at the superfamily and class level according to the SCOP classification and show that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results outperform the best previous methods.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Conformação Proteica , Indexação e Redação de Resumos , Proteínas/química
14.
Proteins ; 70(3): 1056-73, 2008 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-17847098

RESUMO

We describe an efficient method for partial complementary shape matching for use in rigid protein-protein docking. The local shape features of a protein are represented using boolean data structures called Context Shapes. The relative orientations of the receptor and ligand surfaces are searched using precalculated lookup tables. Energetic quantities are derived from shape complementarity and buried surface area computations, using efficient boolean operations. Preliminary results indicate that our context shapes approach outperforms state-of-the-art geometric shape-based rigid-docking algorithms.


Assuntos
Algoritmos , Simulação por Computador , Proteínas/química , Ligantes , Conformação Proteica , Termodinâmica
15.
Artigo em Inglês | MEDLINE | ID: mdl-19642279

RESUMO

Structural similarity between proteins gives us insights on the evolutionary relationship between proteins which have low sequence similarity. In this paper, we present a novel approach called STSA for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process, a superposition step and an alignment step, until convergence. Given two superposed structures, we propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of STSA alignments is evident in the high agreement it has with the reference alignments in the challenging-to-align RPIC set. Moreover, on a dataset of 4410 protein pairs selected from the CATH database, STSA has a high sensitivity and high specificity values and is competitive with state-of-the-art alignment methods and gives longer alignments with lower rmsd. The STSA software along with the data sets will be made available on line at http://www.cs.rpi.edu/-zaki/software/STSA.


Assuntos
Algoritmos , Proteínas/química , Proteínas/ultraestrutura , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos
16.
Algorithms Mol Biol ; 2: 4, 2007 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-17428327
17.
Algorithms Mol Biol ; 1: 22, 2006 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-17118189

RESUMO

BACKGROUND: A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. RESULTS: SMOTIF can search for both pattern and profile motifs, and it is efficient in terms of both time and space; it outperforms SMARTFINDER, a state-of-the-art algorithm for structured motif search. Experimental results show that SMOTIF is about 7 times faster and consumes 100 times less memory than SMARTFINDER. It can effectively search for LTR retrotransposons and is well suited to searching for motifs with long range gaps. It is also successful in finding potential composite transcription factor binding sites. CONCLUSION: SMOTIF is a useful and efficient tool in searching for structured pattern and profile motifs. The algorithm is available as open-source at: http://www.cs.rpi.edu/~zaki/software/sMotif/.

18.
Algorithms Mol Biol ; 1: 21, 2006 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-17109757

RESUMO

BACKGROUND: Extracting motifs from sequences is a mainstay of bioinformatics. We look at the problem of mining structured motifs, which allow variable length gaps between simple motif components. We propose an efficient algorithm, called EXMOTIF, that given some sequence(s), and a structured motif template, extracts all frequent structured motifs that have quorum q. Potential applications of our method include the extraction of single/composite regulatory binding sites in DNA sequences. RESULTS: EXMOTIF is efficient in terms of both time and space and is shown empirically to outperform RISO, a state-of-the-art algorithm. It is also successful in finding potential single/composite transcription factor binding sites. CONCLUSION: EXMOTIF is a useful and efficient tool in discovering structured motifs, especially in DNA sequences. The algorithm is available as open-source at: http://www.cs.rpi.edu/~zaki/software/exMotif/.

19.
Artigo em Inglês | MEDLINE | ID: mdl-16447979

RESUMO

Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Calpha atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results shows classification accuracy up to 97.8% and 99.4% at the superfamily and class level according to the SCOP classification, and shows that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results are competitive with the best previous methods.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Proteínas/química , Proteínas/ultraestrutura , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Armazenamento e Recuperação da Informação/métodos , Dados de Sequência Molecular , Conformação Proteica
20.
Bioinformatics ; 20 Suppl 1: i386-93, 2004 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-15262824

RESUMO

A structured folding pathway, which is a time ordered sequence of folding events, plays an important role in the protein folding process and hence, in the conformational search. Pathway prediction, thus gives more insight into the folding process and is a valuable guiding tool to search the conformation space. In this paper, we propose a novel 'unfolding' approach to predict the folding pathway. We apply graph-based methods on a weighted secondary structure graph of a protein to predict the sequence of unfolding events. When viewed in reverse this yields the folding pathway. We demonstrate the success of our approach on several proteins whose pathway is partially known.


Assuntos
Modelos Químicos , Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Conformação Molecular , Conformação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...