Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
EBioMedicine ; 66: 103275, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33745882

RESUMO

BACKGROUND: Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data. METHODS: Scalp EEG data of 365 patients containing 171,745 s ictal and 2,185,864 s interictal samples obtained from clinical monitoring systems were analysed as part of a crowdsourced artificial intelligence (AI) challenge. Participants were tasked to develop an ictal/interictal classifier with high sensitivity and low false alarm rates. We built a challenge platform that prevented participants from downloading or directly accessing the data while allowing crowdsourced model development. FINDINGS: The automatic detection system achieved tunable sensitivities between 75.00% and 91.60% allowing a reduction in the amount of raw EEG data to be reviewed by a human annotator by factors between 142x, and 22x respectively. The algorithm enables instantaneous reviewer-managed optimization of the balance between sensitivity and the amount of raw EEG data to be reviewed. INTERPRETATION: This study demonstrates the utility of deep learning for patient-specific seizure detection in EEG data. Furthermore, deep learning in combination with a human reviewer can provide the basis for an assistive data labelling system lowering the time of manual review while maintaining human expert annotation performance. FUNDING: IBM employed all IBM Research authors. Temple University employed all Temple University authors. The Icahn School of Medicine at Mount Sinai employed Eren Ahsen. The corresponding authors Stefan Harrer and Gustavo Stolovitzky declare that they had full access to all the data in the study and that they had final responsibility for the decision to submit for publication.


Assuntos
Inteligência Artificial , Encéfalo/fisiopatologia , Eletroencefalografia , Neurologistas , Convulsões/diagnóstico , Algoritmos , Análise de Dados , Aprendizado Profundo , Eletroencefalografia/métodos , Eletroencefalografia/normas , Epilepsia/diagnóstico , Humanos , Reprodutibilidade dos Testes
2.
Ophthalmol Glaucoma ; 4(1): 102-112, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32826205

RESUMO

PURPOSE: To evaluate the accuracy at which visual field global indices could be estimated from OCT scans of the retina using deep neural networks and to quantify the contributions to the estimates by the macula (MAC) and the optic nerve head (ONH). DESIGN: Observational cohort study. PARTICIPANTS: A total of 10 370 eyes from 109 healthy patients, 697 glaucoma suspects, and 872 patients with glaucoma over multiple visits (median = 3). METHODS: Three-dimensional convolutional neural networks were trained to estimate global visual field indices derived from automated Humphrey perimetry (SITA 24-2) tests (Zeiss, Dublin, CA), using OCT scans centered on MAC, ONH, or both (MAC + ONH) as inputs. MAIN OUTCOME MEASURES: Spearman's rank correlation coefficients, Pearson's correlation coefficient, and absolute errors calculated for 2 indices: visual field index (VFI) and mean deviation (MD). RESULTS: The MAC + ONH achieved 0.76 Spearman's correlation coefficient and 0.87 Pearson's correlation for VFI and MD. Median absolute error was 2.7 for VFI and 1.57 decibels (dB) for MD. Separate MAC or ONH estimates were significantly less correlated and less accurate. Accuracy was dependent on the OCT signal strength and the stage of glaucoma severity. CONCLUSIONS: The accuracy of global visual field indices estimate is improved by integrating information from MAC and ONH in advanced glaucoma, suggesting that structural changes of the 2 regions have different time courses in the disease severity spectrum.


Assuntos
Glaucoma , Disco Óptico , Glaucoma/diagnóstico , Humanos , Redes Neurais de Computação , Disco Óptico/diagnóstico por imagem , Tomografia de Coerência Óptica , Campos Visuais
3.
PLoS One ; 14(7): e0219126, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31260494

RESUMO

Optical coherence tomography (OCT) based measurements of retinal layer thickness, such as the retinal nerve fibre layer (RNFL) and the ganglion cell with inner plexiform layer (GCIPL) are commonly employed for the diagnosis and monitoring of glaucoma. Previously, machine learning techniques have relied on segmentation-based imaging features such as the peripapillary RNFL thickness and the cup-to-disc ratio. Here, we propose a deep learning technique that classifies eyes as healthy or glaucomatous directly from raw, unsegmented OCT volumes of the optic nerve head (ONH) using a 3D Convolutional Neural Network (CNN). We compared the accuracy of this technique with various feature-based machine learning algorithms and demonstrated the superiority of the proposed deep learning based method. Logistic regression was found to be the best performing classical machine learning technique with an AUC of 0.89. In direct comparison, the deep learning approach achieved a substantially higher AUC of 0.94 with the additional advantage of providing insight into which regions of an OCT volume are important for glaucoma detection. Computing Class Activation Maps (CAM), we found that the CNN identified neuroretinal rim and optic disc cupping as well as the lamina cribrosa (LC) and its surrounding areas as the regions significantly associated with the glaucoma classification. These regions anatomically correspond to the well established and commonly used clinical markers for glaucoma diagnosis such as increased cup volume, cup diameter, and neuroretinal rim thinning at the superior and inferior segments.


Assuntos
Aprendizado Profundo , Glaucoma/diagnóstico por imagem , Tomografia de Coerência Óptica/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Feminino , Glaucoma/classificação , Glaucoma/patologia , Humanos , Modelos Logísticos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação , Disco Óptico/diagnóstico por imagem , Disco Óptico/patologia , Células Ganglionares da Retina/patologia , Tomografia de Coerência Óptica/estatística & dados numéricos , Adulto Jovem
4.
PLoS One ; 14(5): e0203726, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31083678

RESUMO

Spectral-domain optical coherence tomography (SDOCT) is a non-invasive imaging modality that generates high-resolution volumetric images. This modality finds widespread usage in ophthalmology for the diagnosis and management of various ocular conditions. The volumes generated can contain 200 or more B-scans. Manual inspection of such large quantity of scans is time consuming and error prone in most clinical settings. Here, we present a method for the generation of visual summaries of SDOCT volumes, wherein a small set of B-scans that highlight the most clinically relevant features in a volume are extracted. The method was trained and evaluated on data acquired from age-related macular degeneration patients, and "relevance" was defined as the presence of visibly discernible structural abnormalities. The summarisation system consists of a detection module, where relevant B-scans are extracted from the volume, and a set of rules that determines which B-scans are included in the visual summary. Two deep learning approaches are presented and compared for the classification of B-scans-transfer learning and de novo learning. Both approaches performed comparably with AUCs of 0.97 and 0.96, respectively, obtained on an independent test set. The de novo network, however, was 98% smaller than the transfer learning approach, and had a run-time that was also significantly shorter.


Assuntos
Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Tomografia de Coerência Óptica , Algoritmos , Área Sob a Curva , Humanos , Processamento de Imagem Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/normas , Redes Neurais de Computação , Reprodutibilidade dos Testes , Tomografia de Coerência Óptica/métodos , Tomografia de Coerência Óptica/normas
5.
Brief Bioinform ; 20(2): 426-435, 2019 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-28673025

RESUMO

We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed.


Assuntos
Evolução Molecular , Genoma , Filogenia , Algoritmos , Animais , Humanos , Microbiota/genética , Modelos Genéticos , Alinhamento de Sequência , Análise de Sequência de DNA , Vírus/genética
6.
Cancer Inform ; 13: 59-66, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24653643

RESUMO

The emergence of transcriptomics, fuelled by high-throughput sequencing technologies, has changed the nature of cancer research and resulted in a massive accumulation of data. Computational analysis, integration, and data visualization are now major bottlenecks in cancer biology and translational research. Although many tools have been brought to bear on these problems, their use remains unnecessarily restricted to computational biologists, as many tools require scripting skills, data infrastructure, and powerful computational facilities. New user-friendly, integrative, and automated analytical approaches are required to make computational methods more generally useful to the research community. Here we present INsPeCT (INtegrative Platform for Cancer Transcriptomics), which allows users with basic computer skills to perform comprehensive in-silico analyses of microarray, ChIP-seq, and RNA-seq data. INsPeCT supports the selection of interesting genes for advanced functional analysis. Included in its automated workflows are (i) a novel analytical framework, RMaNI (regulatory module network inference), which supports the inference of cancer subtype-specific transcriptional module networks and the analysis of modules; and (ii) WGCNA (weighted gene co-expression network analysis), which infers modules of highly correlated genes across microarray samples, associated with sample traits, eg survival time. INsPeCT is available free of cost from Bioinformatics Resource Australia-EMBL and can be accessed at http://inspect.braembl.org.au.

7.
Bioinformatics ; 30(9): 1273-9, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24407221

RESUMO

MOTIVATION: Cancer is a heterogeneous progressive disease caused by perturbations of the underlying gene regulatory network that can be described by dynamic models. These dynamics are commonly modeled as Boolean networks or as ordinary differential equations. Their inference from data is computationally challenging, and at least partial knowledge of the regulatory network and its kinetic parameters is usually required to construct predictive models. RESULTS: Here, we construct Hopfield networks from static gene-expression data and demonstrate that cancer subtypes can be characterized by different attractors of the Hopfield network. We evaluate the clustering performance of the network and find that it is comparable with traditional methods but offers additional advantages including a dynamic model of the energy landscape and a unification of clustering, feature selection and network inference. We visualize the Hopfield attractor landscape and propose a pruning method to generate sparse networks for feature selection and improved understanding of feature relationships.


Assuntos
Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias/genética , Algoritmos , Análise por Conglomerados , Humanos , Cinética , Software
8.
Brief Bioinform ; 15(2): 195-211, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23698722

RESUMO

Inference of gene regulatory network from expression data is a challenging task. Many methods have been developed to this purpose but a comprehensive evaluation that covers unsupervised, semi-supervised and supervised methods, and provides guidelines for their practical application, is lacking. We performed an extensive evaluation of inference methods on simulated and experimental expression data. The results reveal low prediction accuracies for unsupervised techniques with the notable exception of the Z-SCORE method on knockout data. In all other cases, the supervised approach achieved the highest accuracies and even in a semi-supervised setting with small numbers of only positive samples, outperformed the unsupervised techniques.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Algoritmos , Inteligência Artificial , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Escherichia coli/genética , Perfilação da Expressão Gênica/estatística & dados numéricos , Genes Bacterianos , Genes Fúngicos , Saccharomyces cerevisiae/genética , Software , Máquina de Vetores de Suporte , Biologia de Sistemas
9.
BMC Bioinformatics ; 14 Suppl 16: S14, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564496

RESUMO

BACKGROUND: Cell survival and development are orchestrated by complex interlocking programs of gene activation and repression. Understanding how this gene regulatory network (GRN) functions in normal states, and is altered in cancers subtypes, offers fundamental insight into oncogenesis and disease progression, and holds great promise for guiding clinical decisions. Inferring a GRN from empirical microarray gene expression data is a challenging task in cancer systems biology. In recent years, module-based approaches for GRN inference have been proposed to address this challenge. Despite the demonstrated success of module-based approaches in uncovering biologically meaningful regulatory interactions, their application remains limited a single condition, without supporting the comparison of multiple disease subtypes/conditions. Also, their use remains unnecessarily restricted to computational biologists, as accurate inference of modules and their regulators requires integration of diverse tools and heterogeneous data sources, which in turn requires scripting skills, data infrastructure and powerful computational facilities. New analytical frameworks are required to make module-based GRN inference approach more generally useful to the research community. RESULTS: We present the RMaNI (Regulatory Module Network Inference) framework, which supports cancer subtype-specific or condition specific GRN inference and differential network analysis. It combines both transcriptomic as well as genomic data sources, and integrates heterogeneous knowledge resources and a set of complementary bioinformatic methods for automated inference of modules, their condition specific regulators and facilitates downstream network analyses and data visualization. To demonstrate its utility, we applied RMaNI to a hepatocellular microarray data containing normal and three disease conditions. We demonstrate that how RMaNI can be employed to understand the genetic architecture underlying three disease conditions. RMaNI is freely available at http://inspect.braembl.org.au/bi/inspect/rmani CONCLUSION: RMaNI makes available a workflow with comprehensive set of tools that would otherwise be challenging for non-expert users to install and apply. The framework presented in this paper is flexible and can be easily extended to analyse any dataset with multiple disease conditions.


Assuntos
Carcinoma Hepatocelular/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Neoplasias Hepáticas/genética , Análise por Conglomerados , Expressão Gênica , Humanos , Internet , Biologia de Sistemas/métodos
10.
J Clin Bioinforma ; 2(1): 22, 2012 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-23216803

RESUMO

BACKGROUND: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. RESULTS: We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. CONCLUSIONS: We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers.

11.
Genome Med ; 4(5): 41, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-22548828

RESUMO

BACKGROUND: Altered networks of gene regulation underlie many complex conditions, including cancer. Inferring gene regulatory networks from high-throughput microarray expression data is a fundamental but challenging task in computational systems biology and its translation to genomic medicine. Although diverse computational and statistical approaches have been brought to bear on the gene regulatory network inference problem, their relative strengths and disadvantages remain poorly understood, largely because comparative analyses usually consider only small subsets of methods, use only synthetic data, and/or fail to adopt a common measure of inference quality. METHODS: We report a comprehensive comparative evaluation of nine state-of-the art gene regulatory network inference methods encompassing the main algorithmic approaches (mutual information, correlation, partial correlation, random forests, support vector machines) using 38 simulated datasets and empirical serous papillary ovarian adenocarcinoma expression-microarray data. We then apply the best-performing method to infer normal and cancer networks. We assess the druggability of the proteins encoded by our predicted target genes using the CancerResource and PharmGKB webtools and databases. RESULTS: We observe large differences in the accuracy with which these methods predict the underlying gene regulatory network depending on features of the data, network size, topology, experiment type, and parameter settings. Applying the best-performing method (the supervised method SIRENE) to the serous papillary ovarian adenocarcinoma dataset, we infer and rank regulatory interactions, some previously reported and others novel. For selected novel interactions we propose testable mechanistic models linking gene regulation to cancer. Using network analysis and visualization, we uncover cross-regulation of angiogenesis-specific genes through three key transcription factors in normal and cancer conditions. Druggabilty analysis of proteins encoded by the 10 highest-confidence target genes, and by 15 genes with differential regulation in normal and cancer conditions, reveals 75% to be potential drug targets. CONCLUSIONS: Our study represents a concrete application of gene regulatory network inference to ovarian cancer, demonstrating the complete cycle of computational systems biology research, from genome-scale data analysis via network inference, evaluation of methods, to the generation of novel testable hypotheses, their prioritization for experimental validation, and discovery of potential drug targets.

12.
Bioinformatics ; 28(6): 851-7, 2012 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-22219205

RESUMO

MOTIVATION: Phylogenetic profiling methods can achieve good accuracy in predicting protein-protein interactions, especially in prokaryotes. Recent studies have shown that the choice of reference taxa (RT) is critical for accurate prediction, but with more than 2500 fully sequenced taxa publicly available, identifying the most-informative RT is becoming increasingly difficult. Previous studies on the selection of RT have provided guidelines for manual taxon selection, and for eliminating closely related taxa. However, no general strategy for automatic selection of RT is currently available. RESULTS: We present three novel methods for automating the selection of RT, using machine learning based on known protein-protein interaction networks. One of these methods in particular, Tree-Based Search, yields greatly improved prediction accuracies. We further show that different methods for constituting phylogenetic profiles often require very different RT sets to support high prediction accuracy.


Assuntos
Archaea/genética , Inteligência Artificial , Bactérias/genética , Eucariotos/genética , Filogenia , Mapas de Interação de Proteínas , Proteínas/genética , Archaea/classificação , Archaea/metabolismo , Bactérias/classificação , Bactérias/metabolismo , Eucariotos/classificação , Eucariotos/metabolismo , Proteínas/química , Proteínas/metabolismo
13.
Bioinformatics ; 28(1): 69-75, 2012 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22057159

RESUMO

MOTIVATION: Protein-protein interactions (PPIs) are pivotal for many biological processes and similarity in Gene Ontology (GO) annotation has been found to be one of the strongest indicators for PPI. Most GO-driven algorithms for PPI inference combine machine learning and semantic similarity techniques. We introduce the concept of inducers as a method to integrate both approaches more effectively, leading to superior prediction accuracies. RESULTS: An inducer (ULCA) in combination with a Random Forest classifier compares favorably to several sequence-based methods, semantic similarity measures and multi-kernel approaches. On a newly created set of high-quality interaction data, the proposed method achieves high cross-species prediction accuracies (Area under the ROC curve ≤ 0.88), rendering it a valuable companion to sequence-based methods. AVAILABILITY: Software and datasets are available at http://bioinformatics.org.au/go2ppi/ CONTACT: m.ragan@uq.edu.au.


Assuntos
Algoritmos , Anotação de Sequência Molecular , Proteínas/genética , Software , Vocabulário Controlado , Bases de Dados de Proteínas , Humanos , Mapas de Interação de Proteínas , Curva ROC , Leveduras/genética , Leveduras/metabolismo
14.
Bioinformatics ; 26(6): 737-44, 2010 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-20130028

RESUMO

MOTIVATION: Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution. RESULTS: Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis. AVAILABILITY: A software implementation (MOSAIC) of the framework described here can be downloaded from http://bioinformatics.org.au/mosaic/ CONTACT: m.ragan@uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência de Proteína/métodos , Gráficos por Computador , Bases de Dados de Proteínas , Proteínas/química , Alinhamento de Sequência
15.
BMC Bioinformatics ; 10: 341, 2009 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-19835626

RESUMO

BACKGROUND: RNA-protein interactions are important for a wide range of biological processes. Current computational methods to predict interacting residues in RNA-protein interfaces predominately rely on sequence data. It is, however, known that interface residue propensity is closely correlated with structural properties. In this paper we systematically study information obtained from sequences and structures and compare their contributions in this prediction problem. Particularly, different geometrical and network topological properties of protein structures are evaluated to improve interface residue prediction accuracy. RESULTS: We have quantified the impact of structural information on the prediction accuracy in comparison to the purely sequence based approach using two machine learning techniques: Naïve Bayes classifiers and Support Vector Machines. The highest AUC of 0.83 was achieved by a Support Vector Machine, exploiting PSI-BLAST profile, accessible surface area, betweenness-centrality and retention coefficient as input features. Taking into account that our results are based on a larger non-redundant data set, the prediction accuracy is considerably higher than reported in previous, comparable studies. A protein-RNA interface predictor (PRIP) and the data set have been made available at http://www.qfab.org/PRIP. CONCLUSION: Graph-theoretic properties of residue contact maps derived from protein structures such as betweenness-centrality can supplement sequence or structure features to improve the prediction accuracy for binding residues in RNA-protein interactions. While Support Vector Machines perform better on this task, Naïve Bayes classifiers also have been found to achieve good prediction accuracies but require much less training time and are an attractive choice for large scale predictions.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a RNA/química , RNA/química , Sítios de Ligação , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo
16.
Res Microbiol ; 158(8-9): 685-93, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18039561

RESUMO

Currently, there is a lack of phylogenetic footprinting programmes that can take advantage of multiple whole genome sequences of different species within the same bacterial genus. Therefore, we have developed and tested a position weight matrix-based programme called Footy, that performs genome-wide analysis of bacterial genomes for promoters that phylogenetically footprint. When Footy was used to analyse the non-coding regions upstream of genes from three chlamyidal species for promoters that phylogenetically footprint, it predicted a total of 42 promoters, of which 41 were new. Ten of the 41 new promoters predicted by Footy were biologically assayed in Chlamydia trachomatis by mapping the 5' end of the transcripts for the associated genes. The primer extension assay validated seven of the 10 promoters. When Footy was compared to two other accepted methods for genome-wide prediction of promoters in bacteria (the standard PWM method and MITRA), Footy performed equally as well or better than these programmes. This paper, therefore, shows the value of a bioinformatics programme able to perform genome-wide analysis of bacteria for promoters that phylogenetically footprint.


Assuntos
Chlamydia/genética , Regiões Promotoras Genéticas , Sequência de Bases , Chlamydia/classificação , Dados de Sequência Molecular , Filogenia , Sítio de Iniciação de Transcrição
17.
Proteins ; 69(3): 606-16, 2007 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-17636571

RESUMO

Peroxisomes are small subcellular compartments responsible for a range of essential metabolic processes. Efforts in predicting peroxisomal protein import are challenged by species variation and sparse sequence data sets with experimentally confirmed localization. We present a predictor of peroxisomal import based on the presence of the dominant peroxisomal targeting signal one (PTS1), a seemingly wellconserved but highly unspecific motif. The signal appears to rely on subtle dependencies with the preceding residues. We evaluate prediction accuracies against two alternative predictor services, PEROXIP and the PTS1 PREDICTOR. We test the integrity of prediction on a range of prokaryotic and eukaryotic proteomes lacking peroxisomes. Similarly we test the accuracy on peroxisomal proteins known to not overlap with training data. The model identified a number of proteins within the RIKEN IPS7 mouse protein dataset as potentially novel peroxisomal proteins. Three were confirmed in vitro using immunofluorescent detection of myc-epitope-tagged proteins in transiently transfected BHK-21 cells (Dhrs2, Serhl, and Ehhadh). The final model has a superior specificity to both alternatives, and an accuracy better than PEROXIP and on par with PTS1 PREDICTOR. Thus, the model we present should prove invaluable for labeling PTS1 targeted proteins with high confidence. We use the predictor to screen several additional eukaryotic genomes to revise previously estimated numbers of peroxisomal proteins. Available at http://pprowler.itee.uq.edu.au.


Assuntos
Inteligência Artificial , Simulação por Computador , Modelos Químicos , Peroxissomos/química , Sinais Direcionadores de Proteínas , Proteínas/análise , Aminoácidos/química , Animais , Linhagem Celular , Cricetinae , Bases de Dados de Proteínas , Células Eucarióticas , Genes myc , Humanos , Mesocricetus , Camundongos , Peroxissomos/metabolismo , Transporte Proteico , Proteínas/química , Proteínas/metabolismo , Proteoma , Software , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...