Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 109
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38678587

RESUMO

Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.


Assuntos
Biomarcadores Tumorais , Aprendizado Profundo , Recidiva Local de Neoplasia , Humanos , Biomarcadores Tumorais/metabolismo , Biomarcadores Tumorais/genética , Recidiva Local de Neoplasia/metabolismo , Recidiva Local de Neoplasia/genética , Biologia Computacional/métodos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/patologia , Genômica/métodos , Multiômica
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36611256

RESUMO

Accumulating evidences demonstrate that circular RNA (circRNA) plays an important role in human diseases. Identification of circRNA-disease associations can help for the diagnosis of human diseases, while the traditional method based on biological experiments is time-consuming. In order to address the limitation, a series of computational methods have been proposed in recent years. However, few works have summarized these methods or compared the performance of them. In this paper, we divided the existing methods into three categories: information propagation, traditional machine learning and deep learning. Then, the baseline methods in each category are introduced in detail. Further, 5 different datasets are collected, and 14 representative methods of each category are selected and compared in the 5-fold, 10-fold cross-validation and the de novo experiment. In order to further evaluate the effectiveness of these methods, six common cancers are selected to compare the number of correctly identified circRNA-disease associations in the top-10, top-20, top-50, top-100 and top-200. In addition, according to the results, the observation about the robustness and the character of these methods are concluded. Finally, the future directions and challenges are discussed.


Assuntos
Neoplasias , RNA Circular , Humanos , RNA Circular/genética , Benchmarking , Aprendizado de Máquina , Neoplasias/genética , Biologia Computacional/métodos
3.
Methods ; 226: 89-101, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38642628

RESUMO

Obtaining an accurate segmentation of the pulmonary nodules in computed tomography (CT) images is challenging. This is due to: (1) the heterogeneous nature of the lung nodules; (2) comparable visual characteristics between the nodules and their surroundings. A robust multi-scale feature extraction mechanism that can effectively obtain multi-scale representations at a granular level can improve segmentation accuracy. As the most commonly used network in lung nodule segmentation, UNet, its variants, and other image segmentation methods lack this robust feature extraction mechanism. In this study, we propose a multi-stride residual 3D UNet (MRUNet-3D) to improve the segmentation accuracy of lung nodules in CT images. It incorporates a multi-slide Res2Net block (MSR), which replaces the simple sequence of convolution layers in each encoder stage to effectively extract multi-scale features at a granular level from different receptive fields and resolutions while conserving the strengths of 3D UNet. The proposed method has been extensively evaluated on the publicly available LUNA16 dataset. Experimental results show that it achieves competitive segmentation performance with an average dice similarity coefficient of 83.47 % and an average surface distance of 0.35 mm on the dataset. More notably, our method has proven to be robust to the heterogeneity of lung nodules. It has also proven to perform better at segmenting small lung nodules. Ablation studies have shown that the proposed MSR and RFIA modules are fundamental to improving the performance of the proposed model.


Assuntos
Imageamento Tridimensional , Neoplasias Pulmonares , Tomografia Computadorizada por Raios X , Humanos , Tomografia Computadorizada por Raios X/métodos , Neoplasias Pulmonares/diagnóstico por imagem , Imageamento Tridimensional/métodos , Nódulo Pulmonar Solitário/diagnóstico por imagem , Algoritmos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Pulmão/diagnóstico por imagem
4.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34864877

RESUMO

Increasing evidences have proved that circRNA plays a significant role in the development of many diseases. In addition, many researches have shown that circRNA can be considered as the potential biomarker for clinical diagnosis and treatment of disease. Some computational methods have been proposed to predict circRNA-disease associations. However, the performance of these methods is limited as the sparsity of low-order interaction information. In this paper, we propose a new computational method (KGANCDA) to predict circRNA-disease associations based on knowledge graph attention network. The circRNA-disease knowledge graphs are constructed by collecting multiple relationship data among circRNA, disease, miRNA and lncRNA. Then, the knowledge graph attention network is designed to obtain embeddings of each entity by distinguishing the importance of information from neighbors. Besides the low-order neighbor information, it can also capture high-order neighbor information from multisource associations, which alleviates the problem of data sparsity. Finally, the multilayer perceptron is applied to predict the affinity score of circRNA-disease associations based on the embeddings of circRNA and disease. The experiment results show that KGANCDA outperforms than other state-of-the-art methods in 5-fold cross validation. Furthermore, the case study demonstrates that KGANCDA is an effective tool to predict potential circRNA-disease associations.


Assuntos
MicroRNAs , RNA Circular , Biologia Computacional/métodos , MicroRNAs/genética , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão
5.
Brief Bioinform ; 22(2): 1884-1901, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32349125

RESUMO

Traditional machine learning methods used to detect the side effects of drugs pose significant challenges as feature engineering processes are labor-intensive, expert-dependent, time-consuming and cost-ineffective. Moreover, these methods only focus on detecting the association between drugs and their side effects or classifying drug-drug interaction. Motivated by technological advancements and the availability of big data, we provide a review on the detection and classification of side effects using deep learning approaches. It is shown that the effective integration of heterogeneous, multidimensional drug data sources, together with the innovative deployment of deep learning approaches, helps reduce or prevent the occurrence of adverse drug reactions (ADRs). Deep learning approaches can also be exploited to find replacements for drugs which have side effects or help to diversify the utilization of drugs through drug repurposing.


Assuntos
Aprendizado Profundo , Descoberta de Drogas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Interações Medicamentosas , Reposicionamento de Medicamentos , Humanos , Redes Neurais de Computação
6.
Brief Bioinform ; 21(2): 511-526, 2020 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-30759195

RESUMO

In recent times, the reduced cost of DNA sequencing has resulted in a plethora of genomic data that is being used to advance biomedical research and improve clinical procedures and healthcare delivery. These advances are revolutionizing areas in genome-wide association studies (GWASs), diagnostic testing, personalized medicine and drug discovery. This, however, comes with security and privacy challenges as the human genome is sensitive in nature and uniquely identifies an individual. In this article, we discuss the genome privacy problem and review relevant privacy attacks, classified into identity tracing, attribute disclosure and completion attacks, which have been used to breach the privacy of an individual. We then classify state-of-the-art genomic privacy-preserving solutions based on their application and computational domains (genomic aggregation, GWASs and statistical analysis, sequence comparison and genetic testing) that have been proposed to mitigate these attacks and compare them in terms of their underlining cryptographic primitives, security goals and complexities-computation and transmission overheads. Finally, we identify and discuss the open issues, research challenges and future directions in the field of genomic privacy. We believe this article will provide researchers with the current trends and insights on the importance and challenges of privacy and security issues in the area of genomics.


Assuntos
Segurança Computacional , Privacidade Genética/legislação & jurisprudência , Genômica/métodos , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos
7.
Int J Intell Syst ; 37(3): 2371-2392, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37520859

RESUMO

The coronavirus of 2019 (COVID-19) was declared a global pandemic by World Health Organization in March 2020. Effective testing is crucial to slow the spread of the pandemic. Artificial intelligence and machine learning techniques can help COVID-19 detection using various clinical symptom data. While deep learning (DL) approach requiring centralized data is susceptible to a high risk of data privacy breaches, federated learning (FL) approach resting on decentralized data can preserve data privacy, a critical factor in the health domain. This paper reviews recent advances in applying DL and FL techniques for COVID-19 detection with a focus on the latter. A model FL implementation use case in health systems with a COVID-19 detection using chest X-ray image data sets is studied. We have also reviewed applications of previously published FL experiments for COVID-19 research to demonstrate the applicability of FL in tackling health research issues. Last, several challenges in FL implementation in the healthcare domain are discussed in terms of potential future work.

8.
Int J Intell Syst ; 36(9): 5085-5115, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38607786

RESUMO

The novel coronavirus disease 2019 (COVID-19) is considered to be a significant health challenge worldwide because of its rapid human-to-human transmission, leading to a rise in the number of infected people and deaths. The detection of COVID-19 at the earliest stage is therefore of paramount importance for controlling the pandemic spread and reducing the mortality rate. The real-time reverse transcription-polymerase chain reaction, the primary method of diagnosis for coronavirus infection, has a relatively high false negative rate while detecting early stage disease. Meanwhile, the manifestations of COVID-19, as seen through medical imaging methods such as computed tomography (CT), radiograph (X-ray), and ultrasound imaging, show individual characteristics that differ from those of healthy cases or other types of pneumonia. Machine learning (ML) applications for COVID-19 diagnosis, detection, and the assessment of disease severity based on medical imaging have gained considerable attention. Herein, we review the recent progress of ML in COVID-19 detection with a particular focus on ML models using CT and X-ray images published in high-ranking journals, including a discussion of the predominant features of medical imaging in patients with COVID-19. Deep Learning algorithms, particularly convolutional neural networks, have been utilized widely for image segmentation and classification to identify patients with COVID-19 and many ML modules have achieved remarkable predictive results using datasets with limited sample sizes.

9.
Entropy (Basel) ; 22(7)2020 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-33286530

RESUMO

The main challenge of classification systems is the processing of undesirable data. Filter-based feature selection is an effective solution to improve the performance of classification systems by selecting the significant features and discarding the undesirable ones. The success of this solution depends on the extracted information from data characteristics. For this reason, many research theories have been introduced to extract different feature relations. Unfortunately, traditional feature selection methods estimate the feature significance based on either individually or dependency discriminative ability. This paper introduces a new ensemble feature selection, called fuzzy feature selection based on relevancy, redundancy, and dependency (FFS-RRD). The proposed method considers both individually and dependency discriminative ability to extract all possible feature relations. To evaluate the proposed method, experimental comparisons are conducted with eight state-of-the-art and conventional feature selection methods. Based on 13 benchmark datasets, the experimental results over four well-known classifiers show the outperformance of our proposed method in terms of classification performance and stability.

10.
Brief Bioinform ; 18(2): 291-305, 2017 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-26984617

RESUMO

RNA secondary structure alignment has received more attention since the discovery of the structure-function relationships in some non-protein-encoding RNAs. However, unlike the pure sequence alignment problem, which has been solved in polynomial time, secondary structure alignment incorporates the base pairings as another information dimension in addition to the base sequence. This problem therefore becomes more challenging. In this study, we classify the selected approaches, and algorithmically illustrate how these methods address the alignment problems with different structure types. Other features such as the types of base pair edit operations supported and the time complexity are also compared.


Assuntos
Algoritmos , Sequência de Bases , Conformação de Ácido Nucleico , RNA , Alinhamento de Sequência , Análise de Sequência de RNA
11.
BMC Genomics ; 19(1): 237, 2018 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-29618315

RESUMO

BACKGROUND: There are an exceedingly large number of sequence variants discovered through whole genome sequencing in most populations, including cattle. Deciphering which of these affect complex traits is a major challenge. In this study we hypothesize that variants in some functional classes, such as splice site regions, coding regions, DNA methylated regions and long noncoding RNA will explain more variance in complex traits than others. Two variance component approaches were used to test this hypothesis - the first determines if variants in a functional class capture a greater proportion of the variance, than expected by chance, the second uses the proportion of variance explained when variants in all annotations are fitted simultaneously. RESULTS: Our data set consisted of 28.3 million imputed whole genome sequence variants in 16,581 dairy cattle with records for 6 complex trait phenotypes, including production and fertility. We found that sequence variants in splice site regions and synonymous classes captured the greatest proportion of the variance, explaining up to 50% of the variance across all traits. We also found sequence variants in target sites for DNA methylation (genomic regions that are found be highly methylated in bovine placentas), captured a significant proportion of the variance. Per sequence variant, splice site variants explain the highest proportion of variance in this study. The proportion of variance captured by the missense predicted deleterious (from SIFT) and missense tolerated classes was relatively small. CONCLUSION: The results demonstrate using functional annotations to filter whole genome sequence variants into more informative subsets could be useful for prioritization of the variants that are more likely to be associated with complex traits. In addition to variants found in splice sites and protein coding genes regulatory variants and those found in DNA methylated regions, explained considerable variation in milk production and fertility traits. In our analysis synonymous variants captured a significant proportion of the variance, which raises the possible explanation that synonymous mutations might have some effects, or more likely that these variants are miss-annotated, or alternatively the results reflect imperfect imputation of the actual causative variants.


Assuntos
Redes Reguladoras de Genes , Variação Genética , Locos de Características Quantitativas , Sequenciamento Completo do Genoma/veterinária , Animais , Bovinos , Feminino , Fertilidade , Frequência do Gene , Anotação de Sequência Molecular , Gravidez , Sítios de Splice de RNA
12.
Bioinformatics ; 33(11): 1681-1688, 2017 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-28130237

RESUMO

MOTIVATION: Protein complexes are one of the keys to studying the behavior of a cell system. Many biological functions are carried out by protein complexes. During the past decade, the main strategy used to identify protein complexes from high-throughput network data has been to extract near-cliques or highly dense subgraphs from a single protein-protein interaction (PPI) network. Although experimental PPI data have increased significantly over recent years, most PPI networks still have many false positive interactions and false negative edge loss due to the limitations of high-throughput experiments. In particular, the false negative errors restrict the search space of such conventional protein complex identification approaches. Thus, it has become one of the most challenging tasks in systems biology to automatically identify protein complexes. RESULTS: In this study, we propose a new algorithm, NEOComplex ( NE CC- and O rtholog-based Complex identification by multiple network alignment), which integrates functional orthology information that can be obtained from different types of multiple network alignment (MNA) approaches to expand the search space of protein complex detection. As part of our approach, we also define a new edge clustering coefficient (NECC) to assign weights to interaction edges in PPI networks so that protein complexes can be identified more accurately. The NECC is based on the intuition that there is functional information captured in the common neighbors of the common neighbors as well. Our results show that our algorithm outperforms well-known protein complex identification tools in a balance between precision and recall on three eukaryotic species: human, yeast, and fly. As a result of MNAs of the species, the proposed approach can tolerate edge loss in PPI networks and even discover sparse protein complexes which have traditionally been a challenge to predict. AVAILABILITY AND IMPLEMENTATION: http://acolab.ie.nthu.edu.tw/bionetwork/NEOComplex. CONTACT: bab@csail.mit.edu or csliao@ie.nthu.edu.tw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional/métodos , Mapas de Interação de Proteínas , Animais , Análise por Conglomerados , Drosophila melanogaster/metabolismo , Humanos , Complexos Multiproteicos , Multimerização Proteica , Saccharomyces cerevisiae/metabolismo
13.
J Theor Biol ; 455: 131-139, 2018 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-30036526

RESUMO

Functionally similar non-coding RNAs are expected to be similar in certain regions of their secondary structures. These similar regions are called common structure motifs, and are structurally conserved throughout evolution to maintain their functional roles. Common structure motif identification is one of the critical tasks in RNA secondary structure analysis. Nevertheless, current approaches suffer several limitations, and/or do not scale with both structure size and the number of input secondary structures. In this work, we present a method to transform the conserved base pair stems into transaction items and apply frequent itemset mining to identify common structure motifs existing in a majority of input structures. Our experimental results on telomerase and ribosomal RNA secondary structures report frequent stem patterns that are of biological significance. Moreover, the algorithms utilized in our method are scalable and frequent stem patterns can be identified efficiently among many large structures.


Assuntos
Algoritmos , Simulação por Computador , Conformação de Ácido Nucleico , RNA Ribossômico/química , RNA/química , Análise de Sequência de RNA , Telomerase/química , RNA/genética , RNA Ribossômico/genética , Telomerase/genética
14.
BMC Genomics ; 18(1): 618, 2017 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-28810831

RESUMO

BACKGROUND: Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long. RESULTS: Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies. CONCLUSIONS: The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in large reference populations.


Assuntos
Mapeamento Cromossômico , Genômica , Dinâmica não Linear , Locos de Características Quantitativas/genética , Sequenciamento Completo do Genoma , Algoritmos , Animais , Teorema de Bayes , Bovinos , Feminino , Fertilidade/genética , Genótipo , Cadeias de Markov , Leite/metabolismo , Método de Monte Carlo , Fenótipo , Polimorfismo de Nucleotídeo Único
15.
BMC Genomics ; 17(1): 744, 2016 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-27654580

RESUMO

BACKGROUND: Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times with large genomic data sets. Here, we present an efficient approach (termed HyB_BR), which is a hybrid of an Expectation-Maximisation algorithm, followed by a limited number of MCMC without the requirement for burn-in. RESULTS: To test prediction accuracy from HyB_BR, dairy cattle and human disease trait data were used. In the dairy cattle data, there were four quantitative traits (milk volume, protein kg, fat% in milk and fertility) measured in 16,214 cattle from two breeds genotyped for 632,002 SNPs. Validation of genomic predictions was in a subset of cattle either from the reference set or in animals from a third breeds that were not in the reference set. In all cases, HyB_BR gave almost identical accuracies to Bayesian mixture models implemented with full MCMC, however computational time was reduced by up to 1/17 of that required by full MCMC. The SNPs with high posterior probability of a non-zero effect were also very similar between full MCMC and HyB_BR, with several known genes affecting milk production in this category, as well as some novel genes. HyB_BR was also applied to seven human diseases with 4890 individuals genotyped for around 300 K SNPs in a case/control design, from the Welcome Trust Case Control Consortium (WTCCC). In this data set, the results demonstrated again that HyB_BR performed as well as Bayesian mixture models with full MCMC for genomic predictions and genetic architecture inference while reducing the computational time from 45 h with full MCMC to 3 h with HyB_BR. CONCLUSIONS: The results for quantitative traits in cattle and disease in humans demonstrate that HyB_BR can perform equally well as Bayesian mixture models implemented with full MCMC in terms of prediction accuracy, but with up to 17 times faster than the full MCMC implementations. The HyB_BR algorithm makes simultaneous genomic prediction, QTL mapping and inference of genetic architecture feasible in large genomic data sets.

16.
Bioinformatics ; 31(24): 3914-21, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26275897

RESUMO

MOTIVATION: The regulatory functions performed by non-coding RNAs are related to their 3D structures, which are, in turn, determined by their secondary structures. Pairwise secondary structure alignment gives insight into the functional similarity between a pair of RNA sequences. Numerous exact or heuristic approaches have been proposed for computational alignment. However, the alignment becomes intractable when arbitrary pseudoknots are allowed. Also, since non-coding RNAs are, in general, more conserved in structures than sequences, it is more effective to perform alignment based on the common structural motifs discovered. RESULTS: We devised a method to approximate the true conserved stem pattern for a secondary structure pair, and constructed the alignment from it. Experimental results suggest that our method identified similar RNA secondary structures better than the existing tools, especially for large structures. It also successfully indicated the conservation of some pseudoknot features with biological significance. More importantly, even for large structures with arbitrary pseudoknots, the alignment can usually be obtained efficiently. AVAILABILITY AND IMPLEMENTATION: Our algorithm has been implemented in a tool called PSMAlign. The source code of PSMAlign is freely available at http://homepage.cs.latrobe.edu.au/ypchen/psmalign/.


Assuntos
Algoritmos , RNA não Traduzido/química , Conformação de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de RNA/métodos , Software
17.
Genet Sel Evol ; 47: 34, 2015 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-25926276

RESUMO

BACKGROUND: Genomic prediction of breeding values from dense single nucleotide polymorphisms (SNP) genotypes is used for livestock and crop breeding, and can also be used to predict disease risk in humans. For some traits, the most accurate genomic predictions are achieved with non-linear estimates of SNP effects from Bayesian methods that treat SNP effects as random effects from a heavy tailed prior distribution. These Bayesian methods are usually implemented via Markov chain Monte Carlo (MCMC) schemes to sample from the posterior distribution of SNP effects, which is computationally expensive. Our aim was to develop an efficient expectation-maximisation algorithm (emBayesR) that gives similar estimates of SNP effects and accuracies of genomic prediction than the MCMC implementation of BayesR (a Bayesian method for genomic prediction), but with greatly reduced computation time. METHODS: emBayesR is an approximate EM algorithm that retains the BayesR model assumption with SNP effects sampled from a mixture of normal distributions with increasing variance. emBayesR differs from other proposed non-MCMC implementations of Bayesian methods for genomic prediction in that it estimates the effect of each SNP while allowing for the error associated with estimation of all other SNP effects. emBayesR was compared to BayesR using simulated data, and real dairy cattle data with 632 003 SNPs genotyped, to determine if the MCMC and the expectation-maximisation approaches give similar accuracies of genomic prediction. RESULTS: We were able to demonstrate that allowing for the error associated with estimation of other SNP effects when estimating the effect of each SNP in emBayesR improved the accuracy of genomic prediction over emBayesR without including this error correction, with both simulated and real data. When averaged over nine dairy traits, the accuracy of genomic prediction with emBayesR was only 0.5% lower than that from BayesR. However, emBayesR reduced computing time up to 8-fold compared to BayesR. CONCLUSIONS: The emBayesR algorithm described here achieved similar accuracies of genomic prediction to BayesR for a range of simulated and real 630 K dairy SNP data. emBayesR needs less computing time than BayesR, which will allow it to be applied to larger datasets.


Assuntos
Algoritmos , Cruzamento , Genômica/métodos , Animais , Teorema de Bayes , Bovinos , Masculino , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único
18.
Biochim Biophys Acta ; 1834(1): 317-28, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23457710

RESUMO

Histone deacetylases (HDACs) are important class of enzymes that deacetylate the ε-amino group of the lysine residues in the histone tails to form a closed chromatin configuration resulting in the regulation of gene expression. Inhibition of these HDACs enzymes have been identified as one of the promising approaches for cancer treatment. The type-specific inhibition of class I HDAC enzymes is known to elicit improved therapeutic effects and thus, the search for promising type-specific HDAC inhibitors compunds remains an ongoing research interest in cancer drug discovery. Several different strategies are eployed to identify the features that could identify the isoform specificity factors in these HDAC enzymes. This study combines the insilico docking and energy-optomized pharmacophore (e-pharmacophore) mapping of several known HDACi's to identify the structural variants that are significant for the interactions against each of the four class I HDAC enzymes. Our hybrid approach shows that all the inhibitors with at least one aromatic ring in their linker regions hold higher affinities against target enzymes, while those without any aromatic rings remain as poor binders. We hypothesize the e-pharmacophore models for the HDACi's against all the four Class I HDAC enzymes which are not reported elsewhere. The results from this work will be useful in the rational design and virtual screening of more isoform specific HDACi's against the class I HDAC family of proteins.


Assuntos
Inibidores de Histona Desacetilases/química , Histona Desacetilases/química , Simulação de Acoplamento Molecular , Sítios de Ligação , Humanos , Ligação Proteica , Termodinâmica
19.
BMC Genomics ; 15: 436, 2014 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-24903263

RESUMO

BACKGROUND: In livestock, as in humans, the number of genetic variants that can be tested for association with complex quantitative traits, or used in genomic predictions, is increasing exponentially as whole genome sequencing becomes more common. The power to identify variants associated with traits, particularly those of small effects, could be increased if certain regions of the genome were known a priori to be enriched for associations. Here, we investigate whether twelve genomic annotation classes were enriched or depleted for significant associations in genome wide association studies for complex traits in beef and dairy cattle. We also describe a variance component approach to determine the proportion of genetic variance captured by each annotation class. RESULTS: P-values from large GWAS using 700K SNP in both dairy and beef cattle were available for 11 and 10 traits respectively. We found significant enrichment for trait associated variants (SNP significant in the GWAS) in the missense class along with regions 5 kilobases upstream and downstream of coding genes. We found that the non-coding conserved regions (across mammals) were not enriched for trait associated variants. The results from the enrichment or depletion analysis were not in complete agreement with the results from variance component analysis, where the missense and synonymous classes gave the greatest increase in variance explained, while the upstream and downstream classes showed a more modest increase in the variance explained. CONCLUSION: Our results indicate that functional annotations could assist in prioritization of variants to a subset more likely to be associated with complex traits; including missense variants, and upstream and downstream regions. The differences in two sets of results (GWAS enrichment depletion versus variance component approaches) might be explained by the fact that the variance component approach has greater power to capture the cumulative effect of mutations of small effect, while the enrichment or depletion approach only captures the variants that are significant in GWAS, which is restricted to a limited number of common variants of moderate effects.


Assuntos
Bovinos/genética , Laticínios , Carne , Locos de Características Quantitativas , Animais , Variação Genética , Estudo de Associação Genômica Ampla , Genômica , Polimorfismo de Nucleotídeo Único
20.
J Theor Biol ; 340: 146-54, 2014 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-24056214

RESUMO

Many studies are aimed at identifying dense clusters/subgraphs from protein-protein interaction (PPI) networks for protein function prediction. However, the prediction performance based on the dense clusters is actually worse than a simple guilt-by-association method using neighbor counting ideas. This indicates that the local topological structures and properties of PPI networks are still open to new theoretical investigation and empirical exploration. We introduce a novel topological structure called k-partite cliques of protein interactions-a functionally coherent but not-necessarily dense subgraph topology in PPI networks-to study PPI networks. A k-partite protein clique is a maximal k-partite clique comprising two or more nonoverlapping protein subsets between any two of which full interactions are exhibited. In the detection of PPI's maximal k-partite cliques, we propose to transform PPI networks into induced K-partite graphs where edges exist only between the partites. Then, we present a maximal k-partite clique mining (MaCMik) algorithm to enumerate maximal k-partite cliques from K-partite graphs. Our MaCMik algorithm is then applied to a yeast PPI network. We observed interesting and unusually high functional coherence in k-partite protein cliques-the majority of the proteins in k-partite protein cliques, especially those in the same partites, share the same functions, although k-partite protein cliques are not restricted to be dense compared with dense subgraph patterns or (quasi-)cliques. The idea of k-partite protein cliques provides a novel approach of characterizing PPI networks, and so it will help function prediction for unknown proteins.


Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Simulação por Computador , Proteínas Fúngicas/metabolismo , Modelos Biológicos , Proteínas/metabolismo , Transdução de Sinais , Software , Leveduras/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA