Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 92
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36648314

RESUMO

MOTIVATION: Timetrees depict evolutionary relationships between species and the geological times of their divergence. Hundreds of research articles containing timetrees are published in scientific journals every year. The TimeTree (TT) project has been manually locating, curating and synthesizing timetrees from these articles for almost two decades into a TimeTree of Life, delivered through a unique, user-friendly web interface (timetree.org). The manual process of finding articles containing timetrees is becoming increasingly expensive and time-consuming. So, we have explored the effectiveness of text-mining approaches and developed optimizations to find research articles containing timetrees automatically. RESULTS: We have developed an optimized machine learning system to determine if a research article contains an evolutionary timetree appropriate for inclusion in the TT resource. We found that BERT classification fine-tuned on whole-text articles achieved an F1 score of 0.67, which we increased to 0.88 by text-mining article excerpts surrounding the mentioning of figures. The new method is implemented in the TimeTreeFinder (TTF) tool, which automatically processes millions of articles to discover timetree-containing articles. We estimate that the TTF tool would produce twice as many timetree-containing articles as those discovered manually, whose inclusion in the TT database would potentially double the knowledge accessible to a wider community. Manual inspection showed that the precision on out-of-distribution recently published articles is 87%. This automation will speed up the collection and curation of timetrees with much lower human and time costs. AVAILABILITY AND IMPLEMENTATION: https://github.com/marija-stanojevic/time-tree-classification. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Evolução Biológica , Mineração de Dados , Humanos , Filogenia , Bases de Dados Factuais , Aprendizado de Máquina
2.
Nucleic Acids Res ; 49(D1): D298-D308, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33119734

RESUMO

We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.


Assuntos
Aminoácidos/química , Bases de Dados de Proteínas , Genoma , Proteínas/genética , Proteoma/genética , Software , Sequência de Aminoácidos , Aminoácidos/metabolismo , Animais , Archaea/genética , Archaea/metabolismo , Bactérias/genética , Bactérias/metabolismo , Sítios de Ligação , Sequência Conservada , Fungos/genética , Fungos/metabolismo , Humanos , Internet , Plantas/genética , Plantas/metabolismo , Células Procarióticas/metabolismo , Ligação Proteica , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/classificação , Proteínas/metabolismo , Proteoma/química , Proteoma/metabolismo , Análise de Sequência de Proteína , Vírus/genética , Vírus/metabolismo
3.
BMC Nephrol ; 23(1): 414, 2022 12 29.
Artigo em Inglês | MEDLINE | ID: mdl-36581930

RESUMO

BACKGROUND: Hemodialysis clinic patient social networks may reinforce positive and negative attitudes towards kidney transplantation. We examined whether a patient's position within the hemodialysis clinic social network could improve machine learning classification of the patient's positive or negative attitude towards kidney transplantation when compared to sociodemographic and clinical variables. METHODS: We conducted a cross-sectional social network survey of hemodialysis patients in two geographically and demographically different hemodialysis clinics. We evaluated whether machine learning logistic regression models using sociodemographic or network data best predicted the participant's transplant attitude. Models were evaluated for accuracy, precision, recall, and F1-score. RESULTS: The 110 surveyed participants' mean age was 60 ± 13 years old. Half (55%) identified as male, and 74% identified as Black. At facility 1, 69% of participants had a positive attitude towards transplantation whereas at facility 2, 45% of participants had a positive attitude. The machine learning logistic regression model using network data alone obtained a higher accuracy and F1 score than the sociodemographic and clinical data model (accuracy 65% ± 5% vs. 61% ± 7%, F1 score 76% ± 2% vs. 70% ± 7%). A model with a combination of both sociodemographic and network data had a higher accuracy of 74% ± 3%, and an F1-score of 81% ± 2%. CONCLUSION: Social network data improved the machine learning algorithm's ability to classify attitudes towards kidney transplantation, further emphasizing the importance of hemodialysis clinic social networks on attitudes towards transplant.


Assuntos
Transplante de Rim , Humanos , Masculino , Pessoa de Meia-Idade , Idoso , Estudos Transversais , Diálise Renal , Aprendizado de Máquina , Algoritmos , Atitude , Rede Social
4.
Hum Brain Mapp ; 41(9): 2263-2280, 2020 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-32034846

RESUMO

Detection of the relevant brain regions for characterizing the distinction between cognitive conditions is one of the most sought after objectives in neuroimaging research. A popular approach for achieving this goal is the multivariate pattern analysis which is currently conducted through a number of approaches such as the popular searchlight procedure. This is due to several advantages such as being automatic and flexible with regards to size of the search region. However, these approaches suffer from a number of limitations which can lead to misidentification of truly informative regions which in turn results in imprecise information maps. These limitations mainly stem from several factors such as the fact that the information value of the search spheres are assigned to the voxel at the center of them (in case of searchlight), the requirement for manual tuning of parameters such as searchlight radius and shape, and high complexity and low interpretability in commonly used machine learning-based approaches. Other drawbacks include overlooking the structure and interactions within the regions, and the disadvantages of using certain regularization techniques in analysis of datasets with characteristics of common functional magnetic resonance imaging data. In this article, we propose a fully data-driven maximum relevance minimum redundancy search algorithm for detecting precise information value of the clusters within brain regions while alleviating the above-mentioned limitations. Moreover, in order to make the proposed method faster, we propose an efficient algorithmic implementation. We evaluate and compare the proposed algorithm with the searchlight procedure as well as least absolute shrinkage and selection operator regularization-based mapping approach using both real and synthetic datasets. The analysis results of the proposed approach demonstrate higher information detection precision and map specificity compared to the benchmark approaches.


Assuntos
Algoritmos , Mapeamento Encefálico/métodos , Encéfalo/fisiologia , Heurística , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico/normas , Humanos , Imageamento por Ressonância Magnética/normas
5.
J Biomed Inform ; 105: 103409, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32304869

RESUMO

The accurate prediction of progression of Chronic Kidney Disease (CKD) to End Stage Renal Disease (ESRD) is of great importance to clinicians and a challenge to researchers as there are many causes and even more comorbidities that are ignored by the traditional prediction models. We examine whether utilizing a novel low-dimensional embedding model disease2disease (D2D) learned from a large-scale electronic health records (EHRs) could well clusters the causes of kidney diseases and comorbidities and further improve prediction of progression of CKD to ESRD compared to traditional risk factors. The study cohort consists of 2,507 hospitalized Stage 3 CKD patients of which 1,375 (54.8%) progressed to ESRD within 3 years. We evaluated the proposed unsupervised learning framework by applying a regularized logistic regression model and a cox proportional hazard model respectively, and compared the accuracies with the ones obtained by four alternative models. The results demonstrate that the learned low-dimensional disease representations from EHRs can capture the relationship between vast arrays of diseases, and can outperform traditional risk factors in a CKD progression prediction model. These results can be used both by clinicians in patient care and researchers to develop new prediction methods.


Assuntos
Falência Renal Crônica , Insuficiência Renal Crônica , Progressão da Doença , Taxa de Filtração Glomerular , Humanos , Falência Renal Crônica/diagnóstico , Falência Renal Crônica/epidemiologia , Insuficiência Renal Crônica/diagnóstico , Insuficiência Renal Crônica/epidemiologia , Fatores de Risco
6.
PLoS Biol ; 14(4): e1002430, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-27058055

RESUMO

Numerous studies demonstrating that statistical errors are common in basic science publications have led to calls to improve statistical training for basic scientists. In this article, we sought to evaluate statistical requirements for PhD training and to identify opportunities for improving biostatistics education in the basic sciences. We provide recommendations for improving statistics training for basic biomedical scientists, including: 1. Encouraging departments to require statistics training, 2. Tailoring coursework to the students' fields of research, and 3. Developing tools and strategies to promote education and dissemination of statistical knowledge. We also provide a list of statistical considerations that should be addressed in statistics education for basic scientists.


Assuntos
Bioestatística , Ciência/educação
7.
J Biomed Inform ; 93: 103161, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30940598

RESUMO

INTRODUCTION: The objective of this study is to improve the understanding of spatial spreading of complicated cases of influenza that required hospitalizations, by creating heatmaps and social networks. They will allow to identify critical hubs and routes of spreading of Influenza, in specific geographic locations, in order to contain infections and prevent complications, that require hospitalizations. MATERIAL AND METHODS: Data were downloaded from the Healthcare Cost and Utilization Project (HCUP) - SID, New York State database. Patients hospitalized with flu complications, between 2003 and 2012 were included in the research (30,380 cases). A novel approach was designed, by constructing heatmaps for specific geographic regions in New York state and power law networks, in order to analyze distribution of hospitalized flu cases. RESULTS: Heatmaps revealed that distributions of patients follow urban areas and big roads, indicating that flu spreads along routes, that people use to travel. A scale-free network, created from correlations among zip codes, discovered that, the highest populated zip codes didn't have the largest number of patients with flu complications. Among the top five most affected zip codes, four were in Bronx. Demographics of top affected zip codes were presented in results. Normalized numbers of cases per population revealed that, none of zip codes from Bronx were in the top 20. All zip codes with the highest node degrees were in New York City area. DISCUSSION: Heatmaps identified geographic distribution of hospitalized flu patients and network analysis identified hubs of the infection. Our results will enable better estimation of resources for prevention and treatment of hospitalized patients with complications of Influenza. CONCLUSION: Analyses of geographic distribution of hospitalized patients with Influenza and demographic characteristics of populations, help us to make better planning and management of resources for Influenza patients, that require hospitalization. Obtained results could potentially help to save many lives and improve the health of the population.


Assuntos
Influenza Humana/epidemiologia , Rede Social , Hospitalização , Humanos , New York/epidemiologia , Viagem
8.
J Biomed Inform ; 100: 103326, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31678589

RESUMO

The primary goal of a time-to-event estimation model is to accurately infer the occurrence time of a target event. Most existing studies focus on developing new models to effectively utilize the information in the censored observations. In this paper, we propose a model to tackle the time-to-event estimation problem from a completely different perspective. Our model relaxes a fundamental constraint that the target variable, time, is a univariate number which satisfies a partial order. Instead, the proposed model interprets each event occurrence time as a time concept with a vector representation. We hypothesize that the model will be more accurate and interpretable by capturing (1) the relationships between features and time concept vectors and (2) the relationships among time concept vectors. We also propose a scalable framework to simultaneously learn the model parameters and time concept vectors. Rigorous experiments and analysis have been conducted in medical event prediction task on seven gene expression datasets. The results demonstrate the efficiency and effectiveness of the proposed model. Furthermore, similarity information among time concept vectors helped in identifying time regimes, thus leading to a potential knowledge discovery related to the human cancer considered in our experiments.


Assuntos
Modelos Teóricos , Estudos de Tempo e Movimento , Algoritmos
9.
BMC Bioinformatics ; 18(1): 9, 2017 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-28049413

RESUMO

BACKGROUND: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. RESULTS: The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. CONCLUSION: We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features.


Assuntos
Algoritmos , Expressão Gênica , Análise de Variância , Teorema de Bayes , Humanos , Vírus da Influenza A Subtipo H3N2/genética , Vírus da Influenza A Subtipo H3N2/patogenicidade , Vírus Sinciciais Respiratórios/genética , Vírus Sinciciais Respiratórios/patogenicidade , Rhinovirus/genética , Rhinovirus/patogenicidade , Máquina de Vetores de Suporte
10.
Methods ; 111: 45-55, 2016 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-27477211

RESUMO

Data-driven phenotype discoveries on Electronic Health Records (EHR) data have recently drawn benefits across many aspects of clinical practice. In the method described in this paper, we map a very large EHR database containing more than a million inpatient cases into a low dimensional space where diseases with similar phenotypes have similar representation. This embedding allows for an effective segmentation of diseases into more homogeneous categories, an important task of discovering disease types for precision medicine. In particular, many diseases have heterogeneous nature. For instance, sepsis, a systemic and progressive inflammation, can be caused by many factors, and can have multiple manifestations on different human organs. Understanding such heterogeneity of the disease can help in addressing many important issues regarding sepsis, including early diagnosis and treatment, which is of huge importance as sepsis is one of the main causes of in-hospital deaths in the United States. This study analyzes state of the art embedding models that have had huge success in various fields, applying them to disease embedding from EHR databases. Particular interest is given to learning multi-type representation of heterogeneous diseases, which leads to more homogeneous groups. Our results show evidence that such representations have phenotypes of higher quality and also provide benefit when predicting mortality of inpatient visits.


Assuntos
Bases de Dados Factuais , Informática Médica/métodos , Medicina de Precisão , Sepse/epidemiologia , Algoritmos , Registros Eletrônicos de Saúde , Humanos , Pacientes Internados , Sepse/fisiopatologia
11.
BMC Bioinformatics ; 17(1): 359, 2016 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-27612635

RESUMO

BACKGROUND: Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. RESULTS: To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms. CONCLUSION: The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.


Assuntos
Algoritmos , Pesquisa Biomédica , Modelos Teóricos , Bases de Dados como Assunto , Humanos , Curva ROC
12.
BMC Bioinformatics ; 17: 158, 2016 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-27059502

RESUMO

BACKGROUND: Existing feature selection methods typically do not consider prior knowledge in the form of structural relationships among features. In this study, the features are structured based on prior knowledge into groups. The problem addressed in this article is how to select one representative feature from each group such that the selected features are jointly discriminating the classes. The problem is formulated as a binary constrained optimization and the combinatorial optimization is relaxed as a convex-concave problem, which is then transformed into a sequence of convex optimization problems so that the problem can be solved by any standard optimization algorithm. Moreover, a block coordinate gradient descent optimization algorithm is proposed for high dimensional feature selection, which in our experiments was four times faster than using a standard optimization algorithm. RESULTS: In order to test the effectiveness of the proposed formulation, we used microarray analysis as a case study, where genes with similar expressions or similar molecular functions were grouped together. In particular, the proposed block coordinate gradient descent feature selection method is evaluated on five benchmark microarray gene expression datasets and evidence is provided that the proposed method gives more accurate results than the state-of-the-art gene selection methods. Out of 25 experiments, the proposed method achieved the highest average AUC in 13 experiments while the other methods achieved higher average AUC in no more than 6 experiments. CONCLUSION: A method is developed to select a feature from each group. When the features are grouped based on similarity in gene expression, we showed that the proposed algorithm is more accurate than state-of-the-art gene selection methods that are particularly developed to select highly discriminative and less redundant genes. In addition, the proposed method can exploit any grouping structure among features, while alternative methods are restricted to using similarity based grouping.


Assuntos
Algoritmos , Modelos Teóricos , Neovascularização da Córnea/diagnóstico , Neovascularização da Córnea/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica , Ontologia Genética , Variação Genética , Infecções por HIV/diagnóstico , Infecções por HIV/genética , Hemoglobinúria/diagnóstico , Hemoglobinúria/genética , Humanos , Melanoma/diagnóstico , Melanoma/genética , Análise em Microsséries , Mieloma Múltiplo/diagnóstico , Mieloma Múltiplo/genética , Tumores Neuroendócrinos/diagnóstico , Tumores Neuroendócrinos/genética , Nevo/diagnóstico , Nevo/genética , Estresse Fisiológico/genética , Viroses/diagnóstico , Viroses/genética
13.
Nucleic Acids Res ; 41(Database issue): D508-16, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203878

RESUMO

We present the Database of Disordered Protein Prediction (D(2)P(2)), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D(2)P(2) will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life.


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Genoma , Internet , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína
14.
BMC Bioinformatics ; 14 Suppl 12: S5, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24268030

RESUMO

BACKGROUND: In many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources. RESULTS: We proposed a probabilistic classification algorithm based on labels obtained by multiple noisy annotators. The new algorithm is capable of eliminating annotations provided by novice labellers and of providing a more accurate estimate of the ground truth by consensus labelling according to higher quality annotations. The approach is evaluated on text classification and prediction of protein disorder. Our study suggests that the higher levels of accuracy, effectiveness and performance can be achieved by the new method as compared to alternatives. CONCLUSIONS: The proposed method is applicable for meta-learning from multiple existing classification models and noisy annotations obtained by humans. It is particularly beneficial when many annotations are obtained by novice labellers. In addition, the proposed method can provide further characterization of each annotator that can help in developing more accurate classifiers by identifying the most competent annotators for each data instance.


Assuntos
Algoritmos , Biologia Computacional/métodos , Crowdsourcing , Proteínas/química , Proteínas/classificação , Modelos Logísticos
15.
Netw Neurosci ; 7(1): 22-47, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37334006

RESUMO

Representation learning is a core component in data-driven modeling of various complex phenomena. Learning a contextually informative representation can especially benefit the analysis of fMRI data because of the complexities and dynamic dependencies present in such datasets. In this work, we propose a framework based on transformer models to learn an embedding of the fMRI data by taking the spatiotemporal contextual information in the data into account. This approach takes the multivariate BOLD time series of the regions of the brain as well as their functional connectivity network simultaneously as the input to create a set of meaningful features that can in turn be used in various downstream tasks such as classification, feature extraction, and statistical analysis. The proposed spatiotemporal framework uses the attention mechanism as well as the graph convolution neural network to jointly inject the contextual information regarding the dynamics in time series data and their connectivity into the representation. We demonstrate the benefits of this framework by applying it to two resting-state fMRI datasets, and provide further discussion on various aspects and advantages of it over a number of other commonly adopted architectures.

16.
Kidney Med ; 5(6): 100640, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37235041

RESUMO

Rationale & Objective: Most living kidney donors are members of a hemodialysis patient's social network. Network members are divided into core members, those strongly connected to the patient and other members; and peripheral members, those weakly connected to the patient and other members. We identify how many hemodialysis patients' network members offered to become kidney donors, whether these offers were from core or peripheral network members, and whose offers the patients accepted. Study Design: A cross-sectional interviewer-administered hemodialysis patient social network survey. Setting & Participants: Prevalent hemodialysis patients in 2 facilities. Predictors: Network size and constraint, a donation from a peripheral network member. Outcomes: Number of living donor offers, accepting an offer. Analytical Approach: We performed egocentric network analyses for all participants. Poisson regression models evaluated associations between network measures and number of offers. Logistic regression models determined the associations between network factors and accepting a donation offer. Results: The mean age of the 106 participants was 60 years. Forty-five percent were female, and 75% self-identified as Black. Fifty-two percent of participants received at least one living donor offer (range 1-6); 42% of the offers were from peripheral members. Participants with larger networks received more offers (incident rate ratio [IRR], 1.26; 95% CI, 1.12-1.42; P = 0.001), including networks with more peripheral members (constraint, IRR, 0.97; 95% CI, 0.96-0.98; P < 0.001). Participants who received a peripheral member offer had 3.6 times greater odds of accepting an offer (OR, 3.56; 95% CI, 1.15-10.8; P = 0.02) than those who did not receive a peripheral member offer. Limitations: A small sample of only hemodialysis patients. Conclusions: Most participants received at least one living donor offer, often from peripheral network members. Future living donor interventions should focus on both core and peripheral network members.

17.
BMC Bioinformatics ; 13: 195, 2012 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-22873729

RESUMO

BACKGROUND: Early classification of time series is beneficial for biomedical informatics problems such including, but not limited to, disease change detection. Early classification can be of tremendous help by identifying the onset of a disease before it has time to fully take hold. In addition, extracting patterns from the original time series helps domain experts to gain insights into the classification results. This problem has been studied recently using time series segments called shapelets. In this paper, we present a method, which we call Multivariate Shapelets Detection (MSD), that allows for early and patient-specific classification of multivariate time series. The method extracts time series patterns, called multivariate shapelets, from all dimensions of the time series that distinctly manifest the target class locally. The time series were classified by searching for the earliest closest patterns. RESULTS: The proposed early classification method for multivariate time series has been evaluated on eight gene expression datasets from viral infection and drug response studies in humans. In our experiments, the MSD method outperformed the baseline methods, achieving highly accurate classification by using as little as 40%-64% of the time series. The obtained results provide evidence that using conventional classification methods on short time series is not as accurate as using the proposed methods specialized for early classification. CONCLUSION: For the early classification task, we proposed a method called Multivariate Shapelets Detection (MSD), which extracts patterns from all dimensions of the time series. We showed that the MSD method can classify the time series early by using as little as 40%-64% of the time series' length.


Assuntos
Informática Médica/métodos , Algoritmos , Classificação/métodos , Expressão Gênica , Humanos , Influenza Humana/genética , Influenza Humana/metabolismo , Esclerose Múltipla/tratamento farmacológico , Esclerose Múltipla/genética , Esclerose Múltipla/metabolismo , Análise Multivariada
18.
BMC Bioinformatics ; 13 Suppl 3: S1, 2012 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-22536893

RESUMO

BACKGROUND: The relationships between the gene functional similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute. RESULTS: Here, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the functional similarities are increased too. The model predicts the functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction. CONCLUSIONS: By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.


Assuntos
Encéfalo/metabolismo , Transcriptoma/métodos , Algoritmos , Animais , Inteligência Artificial , Camundongos , Análise de Regressão
19.
Proteome Sci ; 10 Suppl 1: S19, 2012 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-22759577

RESUMO

BACKGROUND: Intrinsically disordered proteins (IDPs) and regions (IDRs) perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving per-residue accuracies over 80%. In a genome-wide study of intrinsic disorder in human genome we observed a big difference in predicted disorder content between confirmed and putative human proteins. We investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted disorder content. METHODS: To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. We developed a procedure to create synthetic peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment. RESULTS: Application of the developed predictor to putative human protein sequences showed that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. This partially, albeit not completely, explains the observed discrepancy in predicted disorder content between confirmed and putative human proteins. CONCLUSIONS: Our findings provide the first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates may be biased.

20.
J Biomed Inform ; 45(2): 316-22, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22179054

RESUMO

BACKGROUND: Domperidone treatment for gastroparesis is associated with variable efficacy as well as the potential for side effects. DNA microarray single nucleotide polymorphism (SNP) analysis may help to elucidate the role of genetic variability on the therapeutic effectiveness and toxicity of domperidone. AIM: The aim of this study was to identify SNPs that are associated with clinical efficacy and side effects of domperidone treatment for gastroparesis from DNA microarray experiments. This will help develop a strategy for rational selection of patients for domperidone therapy. METHODS: DNA samples extracted from the saliva of 46 patients treated with domperidone were analyzed using Affymetrix 6.0 SNP microarrays. Then least angle regression (LARS) was used to select SNPs that are related to domperidone efficacy and side effects. Decision tree based prediction models were constructed with the most correlated features selected by LARS. RESULTS: Using the most stable SNP selected by LARS a prediction model for side effects of domperidone achieved (95 ± 0)% true negative rate (TN) and (78 ± 11)% true positive rate (TP) in nested leave-one-out tests. For domperidone efficacy, the prediction based on five most stable SNPs achieved (85 ± 7)% TP and (61±4)% TN. Five identified SNPs are related to ubiquitin mediated proteolysis, epithelial cell signaling, leukocyte, cell adhesion, and tight junction signaling pathways. Genetic polymorphisms in three genes that are related to cancer and hedgehog signaling were found to significantly correlate with efficacy of domperidone. CONCLUSION: LARS was found to be a useful tool for statistical analysis of domperidone-related DNA microarray data generated from a small number of patients.


Assuntos
Domperidona/efeitos adversos , Gastroparesia/tratamento farmacológico , Gastroparesia/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único , Adulto , Domperidona/uso terapêutico , Humanos , Pessoa de Meia-Idade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA