Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Brief Bioinform ; 20(2): 426-435, 2019 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-28673025

RESUMO

We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed.


Assuntos
Evolução Molecular , Genoma , Filogenia , Algoritmos , Animais , Humanos , Microbiota/genética , Modelos Genéticos , Alinhamento de Sequência , Análise de Sequência de DNA , Vírus/genética
2.
Brief Bioinform ; 15(2): 195-211, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23698722

RESUMO

Inference of gene regulatory network from expression data is a challenging task. Many methods have been developed to this purpose but a comprehensive evaluation that covers unsupervised, semi-supervised and supervised methods, and provides guidelines for their practical application, is lacking. We performed an extensive evaluation of inference methods on simulated and experimental expression data. The results reveal low prediction accuracies for unsupervised techniques with the notable exception of the Z-SCORE method on knockout data. In all other cases, the supervised approach achieved the highest accuracies and even in a semi-supervised setting with small numbers of only positive samples, outperformed the unsupervised techniques.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Algoritmos , Inteligência Artificial , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Escherichia coli/genética , Perfilação da Expressão Gênica/estatística & dados numéricos , Genes Bacterianos , Genes Fúngicos , Saccharomyces cerevisiae/genética , Software , Máquina de Vetores de Suporte , Biologia de Sistemas
3.
Bioinformatics ; 30(9): 1273-9, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24407221

RESUMO

MOTIVATION: Cancer is a heterogeneous progressive disease caused by perturbations of the underlying gene regulatory network that can be described by dynamic models. These dynamics are commonly modeled as Boolean networks or as ordinary differential equations. Their inference from data is computationally challenging, and at least partial knowledge of the regulatory network and its kinetic parameters is usually required to construct predictive models. RESULTS: Here, we construct Hopfield networks from static gene-expression data and demonstrate that cancer subtypes can be characterized by different attractors of the Hopfield network. We evaluate the clustering performance of the network and find that it is comparable with traditional methods but offers additional advantages including a dynamic model of the energy landscape and a unification of clustering, feature selection and network inference. We visualize the Hopfield attractor landscape and propose a pruning method to generate sparse networks for feature selection and improved understanding of feature relationships.


Assuntos
Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias/genética , Algoritmos , Análise por Conglomerados , Humanos , Cinética , Software
4.
BMC Bioinformatics ; 14 Suppl 16: S14, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564496

RESUMO

BACKGROUND: Cell survival and development are orchestrated by complex interlocking programs of gene activation and repression. Understanding how this gene regulatory network (GRN) functions in normal states, and is altered in cancers subtypes, offers fundamental insight into oncogenesis and disease progression, and holds great promise for guiding clinical decisions. Inferring a GRN from empirical microarray gene expression data is a challenging task in cancer systems biology. In recent years, module-based approaches for GRN inference have been proposed to address this challenge. Despite the demonstrated success of module-based approaches in uncovering biologically meaningful regulatory interactions, their application remains limited a single condition, without supporting the comparison of multiple disease subtypes/conditions. Also, their use remains unnecessarily restricted to computational biologists, as accurate inference of modules and their regulators requires integration of diverse tools and heterogeneous data sources, which in turn requires scripting skills, data infrastructure and powerful computational facilities. New analytical frameworks are required to make module-based GRN inference approach more generally useful to the research community. RESULTS: We present the RMaNI (Regulatory Module Network Inference) framework, which supports cancer subtype-specific or condition specific GRN inference and differential network analysis. It combines both transcriptomic as well as genomic data sources, and integrates heterogeneous knowledge resources and a set of complementary bioinformatic methods for automated inference of modules, their condition specific regulators and facilitates downstream network analyses and data visualization. To demonstrate its utility, we applied RMaNI to a hepatocellular microarray data containing normal and three disease conditions. We demonstrate that how RMaNI can be employed to understand the genetic architecture underlying three disease conditions. RMaNI is freely available at http://inspect.braembl.org.au/bi/inspect/rmani CONCLUSION: RMaNI makes available a workflow with comprehensive set of tools that would otherwise be challenging for non-expert users to install and apply. The framework presented in this paper is flexible and can be easily extended to analyse any dataset with multiple disease conditions.


Assuntos
Carcinoma Hepatocelular/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Neoplasias Hepáticas/genética , Análise por Conglomerados , Expressão Gênica , Humanos , Internet , Biologia de Sistemas/métodos
5.
Bioinformatics ; 28(6): 851-7, 2012 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-22219205

RESUMO

MOTIVATION: Phylogenetic profiling methods can achieve good accuracy in predicting protein-protein interactions, especially in prokaryotes. Recent studies have shown that the choice of reference taxa (RT) is critical for accurate prediction, but with more than 2500 fully sequenced taxa publicly available, identifying the most-informative RT is becoming increasingly difficult. Previous studies on the selection of RT have provided guidelines for manual taxon selection, and for eliminating closely related taxa. However, no general strategy for automatic selection of RT is currently available. RESULTS: We present three novel methods for automating the selection of RT, using machine learning based on known protein-protein interaction networks. One of these methods in particular, Tree-Based Search, yields greatly improved prediction accuracies. We further show that different methods for constituting phylogenetic profiles often require very different RT sets to support high prediction accuracy.


Assuntos
Archaea/genética , Inteligência Artificial , Bactérias/genética , Eucariotos/genética , Filogenia , Mapas de Interação de Proteínas , Proteínas/genética , Archaea/classificação , Archaea/metabolismo , Bactérias/classificação , Bactérias/metabolismo , Eucariotos/classificação , Eucariotos/metabolismo , Proteínas/química , Proteínas/metabolismo
6.
Bioinformatics ; 28(1): 69-75, 2012 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22057159

RESUMO

MOTIVATION: Protein-protein interactions (PPIs) are pivotal for many biological processes and similarity in Gene Ontology (GO) annotation has been found to be one of the strongest indicators for PPI. Most GO-driven algorithms for PPI inference combine machine learning and semantic similarity techniques. We introduce the concept of inducers as a method to integrate both approaches more effectively, leading to superior prediction accuracies. RESULTS: An inducer (ULCA) in combination with a Random Forest classifier compares favorably to several sequence-based methods, semantic similarity measures and multi-kernel approaches. On a newly created set of high-quality interaction data, the proposed method achieves high cross-species prediction accuracies (Area under the ROC curve ≤ 0.88), rendering it a valuable companion to sequence-based methods. AVAILABILITY: Software and datasets are available at http://bioinformatics.org.au/go2ppi/ CONTACT: m.ragan@uq.edu.au.


Assuntos
Algoritmos , Anotação de Sequência Molecular , Proteínas/genética , Software , Vocabulário Controlado , Bases de Dados de Proteínas , Humanos , Mapas de Interação de Proteínas , Curva ROC , Leveduras/genética , Leveduras/metabolismo
7.
Bioinformatics ; 26(6): 737-44, 2010 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-20130028

RESUMO

MOTIVATION: Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution. RESULTS: Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis. AVAILABILITY: A software implementation (MOSAIC) of the framework described here can be downloaded from http://bioinformatics.org.au/mosaic/ CONTACT: m.ragan@uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência de Proteína/métodos , Gráficos por Computador , Bases de Dados de Proteínas , Proteínas/química , Alinhamento de Sequência
8.
Ophthalmol Glaucoma ; 4(1): 102-112, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32826205

RESUMO

PURPOSE: To evaluate the accuracy at which visual field global indices could be estimated from OCT scans of the retina using deep neural networks and to quantify the contributions to the estimates by the macula (MAC) and the optic nerve head (ONH). DESIGN: Observational cohort study. PARTICIPANTS: A total of 10 370 eyes from 109 healthy patients, 697 glaucoma suspects, and 872 patients with glaucoma over multiple visits (median = 3). METHODS: Three-dimensional convolutional neural networks were trained to estimate global visual field indices derived from automated Humphrey perimetry (SITA 24-2) tests (Zeiss, Dublin, CA), using OCT scans centered on MAC, ONH, or both (MAC + ONH) as inputs. MAIN OUTCOME MEASURES: Spearman's rank correlation coefficients, Pearson's correlation coefficient, and absolute errors calculated for 2 indices: visual field index (VFI) and mean deviation (MD). RESULTS: The MAC + ONH achieved 0.76 Spearman's correlation coefficient and 0.87 Pearson's correlation for VFI and MD. Median absolute error was 2.7 for VFI and 1.57 decibels (dB) for MD. Separate MAC or ONH estimates were significantly less correlated and less accurate. Accuracy was dependent on the OCT signal strength and the stage of glaucoma severity. CONCLUSIONS: The accuracy of global visual field indices estimate is improved by integrating information from MAC and ONH in advanced glaucoma, suggesting that structural changes of the 2 regions have different time courses in the disease severity spectrum.


Assuntos
Glaucoma , Disco Óptico , Glaucoma/diagnóstico , Humanos , Redes Neurais de Computação , Disco Óptico/diagnóstico por imagem , Tomografia de Coerência Óptica , Campos Visuais
9.
BMC Bioinformatics ; 10: 341, 2009 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-19835626

RESUMO

BACKGROUND: RNA-protein interactions are important for a wide range of biological processes. Current computational methods to predict interacting residues in RNA-protein interfaces predominately rely on sequence data. It is, however, known that interface residue propensity is closely correlated with structural properties. In this paper we systematically study information obtained from sequences and structures and compare their contributions in this prediction problem. Particularly, different geometrical and network topological properties of protein structures are evaluated to improve interface residue prediction accuracy. RESULTS: We have quantified the impact of structural information on the prediction accuracy in comparison to the purely sequence based approach using two machine learning techniques: Naïve Bayes classifiers and Support Vector Machines. The highest AUC of 0.83 was achieved by a Support Vector Machine, exploiting PSI-BLAST profile, accessible surface area, betweenness-centrality and retention coefficient as input features. Taking into account that our results are based on a larger non-redundant data set, the prediction accuracy is considerably higher than reported in previous, comparable studies. A protein-RNA interface predictor (PRIP) and the data set have been made available at http://www.qfab.org/PRIP. CONCLUSION: Graph-theoretic properties of residue contact maps derived from protein structures such as betweenness-centrality can supplement sequence or structure features to improve the prediction accuracy for binding residues in RNA-protein interactions. While Support Vector Machines perform better on this task, Naïve Bayes classifiers also have been found to achieve good prediction accuracies but require much less training time and are an attractive choice for large scale predictions.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a RNA/química , RNA/química , Sítios de Ligação , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo
10.
Cancer Inform ; 13: 59-66, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24653643

RESUMO

The emergence of transcriptomics, fuelled by high-throughput sequencing technologies, has changed the nature of cancer research and resulted in a massive accumulation of data. Computational analysis, integration, and data visualization are now major bottlenecks in cancer biology and translational research. Although many tools have been brought to bear on these problems, their use remains unnecessarily restricted to computational biologists, as many tools require scripting skills, data infrastructure, and powerful computational facilities. New user-friendly, integrative, and automated analytical approaches are required to make computational methods more generally useful to the research community. Here we present INsPeCT (INtegrative Platform for Cancer Transcriptomics), which allows users with basic computer skills to perform comprehensive in-silico analyses of microarray, ChIP-seq, and RNA-seq data. INsPeCT supports the selection of interesting genes for advanced functional analysis. Included in its automated workflows are (i) a novel analytical framework, RMaNI (regulatory module network inference), which supports the inference of cancer subtype-specific transcriptional module networks and the analysis of modules; and (ii) WGCNA (weighted gene co-expression network analysis), which infers modules of highly correlated genes across microarray samples, associated with sample traits, eg survival time. INsPeCT is available free of cost from Bioinformatics Resource Australia-EMBL and can be accessed at http://inspect.braembl.org.au.

11.
Genome Med ; 4(5): 41, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-22548828

RESUMO

BACKGROUND: Altered networks of gene regulation underlie many complex conditions, including cancer. Inferring gene regulatory networks from high-throughput microarray expression data is a fundamental but challenging task in computational systems biology and its translation to genomic medicine. Although diverse computational and statistical approaches have been brought to bear on the gene regulatory network inference problem, their relative strengths and disadvantages remain poorly understood, largely because comparative analyses usually consider only small subsets of methods, use only synthetic data, and/or fail to adopt a common measure of inference quality. METHODS: We report a comprehensive comparative evaluation of nine state-of-the art gene regulatory network inference methods encompassing the main algorithmic approaches (mutual information, correlation, partial correlation, random forests, support vector machines) using 38 simulated datasets and empirical serous papillary ovarian adenocarcinoma expression-microarray data. We then apply the best-performing method to infer normal and cancer networks. We assess the druggability of the proteins encoded by our predicted target genes using the CancerResource and PharmGKB webtools and databases. RESULTS: We observe large differences in the accuracy with which these methods predict the underlying gene regulatory network depending on features of the data, network size, topology, experiment type, and parameter settings. Applying the best-performing method (the supervised method SIRENE) to the serous papillary ovarian adenocarcinoma dataset, we infer and rank regulatory interactions, some previously reported and others novel. For selected novel interactions we propose testable mechanistic models linking gene regulation to cancer. Using network analysis and visualization, we uncover cross-regulation of angiogenesis-specific genes through three key transcription factors in normal and cancer conditions. Druggabilty analysis of proteins encoded by the 10 highest-confidence target genes, and by 15 genes with differential regulation in normal and cancer conditions, reveals 75% to be potential drug targets. CONCLUSIONS: Our study represents a concrete application of gene regulatory network inference to ovarian cancer, demonstrating the complete cycle of computational systems biology research, from genome-scale data analysis via network inference, evaluation of methods, to the generation of novel testable hypotheses, their prioritization for experimental validation, and discovery of potential drug targets.

12.
J Clin Bioinforma ; 2(1): 22, 2012 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-23216803

RESUMO

BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. RESULTS: We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. CONCLUSIONS: We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA