Búsqueda | Portal Regional de la BVS

1.

Influencing public health policy with data-informed mathematical models of infectious diseases: Recent developments and new challenges.

Alahmadi, Amani; Belet, Sarah; Black, Andrew; Cromer, Deborah; Flegg, Jennifer A; House, Thomas; Jayasundara, Pavithra; Keith, Jonathan M; McCaw, James M; Moss, Robert; Ross, Joshua V; Shearer, Freya M; Tun, Sai Thein Than; Walker, James; White, Lisa; Whyte, Jason M; Yan, Ada W C; Zarebski, Alexander E.

Epidemics ; 32: 100393, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32674025

RESUMEN

Modern data and computational resources, coupled with algorithmic and theoretical advances to exploit these, allow disease dynamic models to be parameterised with increasing detail and accuracy. While this enhances models' usefulness in prediction and policy, major challenges remain. In particular, lack of identifiability of a model's parameters may limit the usefulness of the model. While lack of parameter identifiability may be resolved through incorporation into an inference procedure of prior knowledge, formulating such knowledge is often difficult. Furthermore, there are practical challenges associated with acquiring data of sufficient quantity and quality. Here, we discuss recent progress on these issues.

Asunto(s)

Enfermedades Transmisibles/epidemiología , Política de Salud , Modelos Teóricos , Salud Pública/estadística & datos numéricos , Teorema de Bayes , Humanos , Modelos Biológicos

2.

A comparison of approximate versus exact techniques for Bayesian parameter inference in nonlinear ordinary differential equation models.

Alahmadi, Amani A; Flegg, Jennifer A; Cochrane, Davis G; Drovandi, Christopher C; Keith, Jonathan M.

R Soc Open Sci ; 7(3): 191315, 2020 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-32269786

RESUMEN

The behaviour of many processes in science and engineering can be accurately described by dynamical system models consisting of a set of ordinary differential equations (ODEs). Often these models have several unknown parameters that are difficult to estimate from experimental data, in which case Bayesian inference can be a useful tool. In principle, exact Bayesian inference using Markov chain Monte Carlo (MCMC) techniques is possible; however, in practice, such methods may suffer from slow convergence and poor mixing. To address this problem, several approaches based on approximate Bayesian computation (ABC) have been introduced, including Markov chain Monte Carlo ABC (MCMC ABC) and sequential Monte Carlo ABC (SMC ABC). While the system of ODEs describes the underlying process that generates the data, the observed measurements invariably include errors. In this paper, we argue that several popular ABC approaches fail to adequately model these errors because the acceptance probability depends on the choice of the discrepancy function and the tolerance without any consideration of the error term. We observe that the so-called posterior distributions derived from such methods do not accurately reflect the epistemic uncertainties in parameter values. Moreover, we demonstrate that these methods provide minimal computational advantages over exact Bayesian methods when applied to two ODE epidemiological models with simulated data and one with real data concerning malaria transmission in Afghanistan.

3.

Delimiting a species' geographic range using posterior sampling and computational geometry.

Keith, Jonathan M; Spring, Daniel; Kompas, Tom.

Sci Rep ; 9(1): 8938, 2019 06 20.

Artículo en Inglés | MEDLINE | ID: mdl-31222114

RESUMEN

Accurate delimitation of the geographic range of a species is important for control of biological invasions, conservation of threatened species, and understanding species range dynamics under environmental change. However, estimating range boundaries is challenging because monitoring methods are imperfect, the area that might contain individuals is often incompletely surveyed, and species may have patchy distributions. In these circumstances, large areas can be surveyed without finding individuals despite occupancy extending beyond surveyed areas, resulting in underestimation of range limits. We developed a delimitation method that can be applied with imperfect survey data and patchy distributions. The approach is to construct polygons indicative of the geographic range of a species. Each polygon is associated with a specific probability such that each interior point of the polygon has at least that posterior probability of being interior to the true boundary according to a Bayesian model. The method uses the posterior distribution of latent quantities derived from an agent-based Bayesian model and calculates the posterior distribution of the range as a derived quantity from Markov chain Monte Carlo samples. An application of this method described here informed the Australian campaign to eradicate red imported fire ants (Solenopsis invicta).

4.

Agent-based models of malaria transmission: a systematic review.

Smith, Neal R; Trauer, James M; Gambhir, Manoj; Richards, Jack S; Maude, Richard J; Keith, Jonathan M; Flegg, Jennifer A.

Malar J ; 17(1): 299, 2018 Aug 17.

Artículo en Inglés | MEDLINE | ID: mdl-30119664

RESUMEN

BACKGROUND: Much of the extensive research regarding transmission of malaria is underpinned by mathematical modelling. Compartmental models, which focus on interactions and transitions between population strata, have been a mainstay of such modelling for more than a century. However, modellers are increasingly adopting agent-based approaches, which model hosts, vectors and/or their interactions on an individual level. One reason for the increasing popularity of such models is their potential to provide enhanced realism by allowing system-level behaviours to emerge as a consequence of accumulated individual-level interactions, as occurs in real populations. METHODS: A systematic review of 90 articles published between 1998 and May 2018 was performed, characterizing agent-based models (ABMs) relevant to malaria transmission. The review provides an overview of approaches used to date, determines the advantages of these approaches, and proposes ideas for progressing the field. RESULTS: The rationale for ABM use over other modelling approaches centres around three points: the need to accurately represent increased stochasticity in low-transmission settings; the benefits of high-resolution spatial simulations; and heterogeneities in drug and vaccine efficacies due to individual patient characteristics. The success of these approaches provides avenues for further exploration of agent-based techniques for modelling malaria transmission. Potential extensions include varying elimination strategies across spatial landscapes, extending the size of spatial models, incorporating human movement dynamics, and developing increasingly comprehensive parameter estimation and optimization techniques. CONCLUSION: Collectively, the literature covers an extensive array of topics, including the full spectrum of transmission and intervention regimes. Bringing these elements together under a common framework may enhance knowledge of, and guide policies towards, malaria elimination. However, because of the diversity of available models, endorsing a standardized approach to ABM implementation may not be possible. Instead it is recommended that model frameworks be contextually appropriate and sufficiently described. One key recommendation is to develop enhanced parameter estimation and optimization techniques. Extensions of current techniques will provide the robust results required to enhance current elimination efforts.

Asunto(s)

Transmisión de Enfermedad Infecciosa , Interacciones Huésped-Parásitos , Malaria/transmisión , Modelos Estadísticos , Mosquitos Vectores/fisiología , Animales , Humanos

5.

Bayesian change-point modeling with segmented ARMA model.

Sadia, Farhana; Boyd, Sarah; Keith, Jonathan M.

PLoS One ; 13(12): e0208927, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30596668

RESUMEN

Time series segmentation aims to identify segment boundary points in a time series, and to determine the dynamical properties corresponding to each segment. To segment time series data, this article presents a Bayesian change-point model in which the data within segments follows an autoregressive moving average (ARMA) model. A prior distribution is defined for the number of change-points, their positions, segment means and error terms. To quantify uncertainty about the location of change-points, the resulting posterior probability distributions are sampled using the Generalized Gibbs sampler Markov chain Monte Carlo technique. This methodology is illustrated by applying it to simulated data and to real data known as the well-log time series data. This well-log data records the measurements of nuclear magnetic response of underground rocks during the drilling of a well. Our approach has high sensitivity, and detects a larger number of change-points than have been identified by comparable methods in the existing literature.

Asunto(s)

Teorema de Bayes , Espectroscopía de Resonancia Magnética/estadística & datos numéricos , Modelos Estadísticos , Humanos , Cadenas de Markov , Método de Montecarlo

6.

Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M.

BMC Genomics ; 18(1): 259, 2017 03 27.

Artículo en Inglés | MEDLINE | ID: mdl-28347272

RESUMEN

BACKGROUND: Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. RESULTS: We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. CONCLUSIONS: This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.

Asunto(s)

Genoma , ARN no Traducido/metabolismo , Animales , Teorema de Bayes , Sitios de Unión , Secuencia Conservada , Humanos , Intrones , Ratones , Desarrollo de Músculos/genética , Conformación de Ácido Nucleico , ARN no Traducido/química , ARN no Traducido/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Interfaz Usuario-Computador , Pez Cebra/genética

7.

Bayesian model of signal rewiring reveals mechanisms of gene dysregulation in acquired drug resistance in breast cancer.

Azad, A K M; Lawen, Alfons; Keith, Jonathan M.

PLoS One ; 12(3): e0173331, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-28288164

RESUMEN

Small molecule inhibitors, such as lapatinib, are effective against breast cancer in clinical trials, but tumor cells ultimately acquire resistance to the drug. Maintaining sensitization to drug action is essential for durable growth inhibition. Recently, adaptive reprogramming of signaling circuitry has been identified as a major cause of acquired resistance. We developed a computational framework using a Bayesian statistical approach to model signal rewiring in acquired resistance. We used the p1-model to infer potential aberrant gene-pairs with differential posterior probabilities of appearing in resistant-vs-parental networks. Results were obtained using matched gene expression profiles under resistant and parental conditions. Using two lapatinib-treated ErbB2-positive breast cancer cell-lines: SKBR3 and BT474, our method identified similar dysregulated signaling pathways including EGFR-related pathways as well as other receptor-related pathways, many of which were reported previously as compensatory pathways of EGFR-inhibition via signaling cross-talk. A manual literature survey provided strong evidence that aberrant signaling activities in dysregulated pathways are closely related to acquired resistance in EGFR tyrosine kinase inhibitors. Our approach predicted literature-supported dysregulated pathways complementary to both node-centric (SPIA, DAVID, and GATHER) and edge-centric (ESEA and PAGI) methods. Moreover, by proposing a novel pattern of aberrant signaling called V-structures, we observed that genes were dysregulated in resistant-vs-sensitive conditions when they were involved in the switch of dependencies from targeted to bypass signaling events. A literature survey of some important V-structures suggested they play a role in breast cancer metastasis and/or acquired resistance to EGFR-TKIs, where the mRNA changes of TGFBR2, LEF1 and TP53 in resistant-vs-sensitive conditions were related to the dependency switch from targeted to bypass signaling links. Our results suggest many signaling pathway structures are compromised in acquired resistance, and V-structures of aberrant signaling within/among those pathways may provide further insights into the bypass mechanism of targeted inhibition.

Asunto(s)

Antineoplásicos/uso terapéutico , Teorema de Bayes , Neoplasias de la Mama/tratamiento farmacológico , Resistencia a Antineoplásicos/genética , Regulación Neoplásica de la Expresión Génica , Neoplasias de la Mama/genética , Femenino , Humanos , Probabilidad

8.

Erratum to: Sequence Segmentation with changeptGUI.

Tasker, Edward; Keith, Jonathan M.

Methods Mol Biol ; 1525: E1, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-28220404

9.

Sequence Segmentation with changeptGUI.

Tasker, Edward; Keith, Jonathan M.

Methods Mol Biol ; 1525: 293-312, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-27896726

RESUMEN

Many biological sequences have a segmental structure that can provide valuable clues to their content, structure, and function. The program changept is a tool for investigating the segmental structure of a sequence, and can also be applied to multiple sequences in parallel to identify a common segmental structure, thus providing a method for integrating multiple data types to identify functional elements in genomes. In the previous edition of this book, a command line interface for changept is described. Here we present a graphical user interface for this package, called changeptGUI. This interface also includes tools for pre- and post-processing of data and results to facilitate investigation of the number and characteristics of segment classes.

Asunto(s)

Biología Computacional/métodos , Genoma/genética , Programas Informáticos , Interfaz Usuario-Computador

10.

Discovery of putative small non-coding RNAs from the obligate intracellular bacterium Wolbachia pipientis.

Woolfit, Megan; Algama, Manjula; Keith, Jonathan M; McGraw, Elizabeth A; Popovici, Jean.

PLoS One ; 10(3): e0118595, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-25739023

RESUMEN

Wolbachia pipientis is an endosymbiotic bacterium that induces a wide range of effects in its insect hosts, including manipulation of reproduction and protection against pathogens. Little is known of the molecular mechanisms underlying the insect-Wolbachia interaction, though it is likely to be mediated via the secretion of proteins or other factors. There is an increasing amount of evidence that bacteria regulate many cellular processes, including secretion of virulence factors, using small non-coding RNAs (sRNAs), but sRNAs have not previously been described from Wolbachia. We have used two independent approaches, one based on comparative genomics and the other using RNA-Seq data generated for gene expression studies, to identify candidate sRNAs in Wolbachia. We experimentally characterized the expression of one of these candidates in four Wolbachia strains, and showed that it is differentially regulated in different host tissues and sexes. Given the roles played by sRNAs in other host-associated bacteria, the conservation of the candidate sRNAs between different Wolbachia strains, and the sex- and tissue-specific differential regulation we have identified, we hypothesise that sRNAs may play a significant role in the biology of Wolbachia, and in particular in its interactions with its host.

Asunto(s)

Espacio Intracelular/microbiología , ARN Pequeño no Traducido/genética , Wolbachia/genética , Wolbachia/fisiología , Animales , Biología Computacional , Secuencia Conservada , Drosophila melanogaster/microbiología , Femenino , Especificidad del Huésped , Masculino , Especificidad de Órganos , ARN Mensajero/genética , ARN Mensajero/metabolismo , Análisis de Secuencia de ARN , Transcripción Genética

11.

Prediction of signaling cross-talks contributing to acquired drug resistance in breast cancer cells by Bayesian statistical modeling.

Azad, A K M; Lawen, Alfons; Keith, Jonathan M.

BMC Syst Biol ; 9: 2, 2015 Jan 20.

Artículo en Inglés | MEDLINE | ID: mdl-25599599

RESUMEN

BACKGROUND: Initial success of inhibitors targeting oncogenes is often followed by tumor relapse due to acquired resistance. In addition to mutations in targeted oncogenes, signaling cross-talks among pathways play a vital role in such drug inefficacy. These include activation of compensatory pathways and altered activities of key effectors in other cell survival and growth-associated pathways. RESULTS: We propose a computational framework using Bayesian modeling to systematically characterize potential cross-talks among breast cancer signaling pathways. We employed a fully Bayesian approach known as the p 1-model to infer posterior probabilities of gene-pairs in networks derived from the gene expression datasets of ErbB2-positive breast cancer cell-lines (parental, lapatinib-sensitive cell-line SKBR3 and the lapatinib-resistant cell-line SKBR3-R, derived from SKBR3). Using this computational framework, we searched for cross-talks between EGFR/ErbB and other signaling pathways from Reactome, KEGG and WikiPathway databases that contribute to lapatinib resistance. We identified 104, 188 and 299 gene-pairs as putative drug-resistant cross-talks, respectively, each comprised of a gene in the EGFR/ErbB signaling pathway and a gene from another signaling pathway, that appear to be interacting in resistant cells but not in parental cells. In 168 of these (distinct) gene-pairs, both of the interacting partners are up-regulated in resistant conditions relative to parental conditions. These gene-pairs are prime candidates for novel cross-talks contributing to lapatinib resistance. They associate EGFR/ErbB signaling with six other signaling pathways: Notch, Wnt, GPCR, hedgehog, insulin receptor/IGF1R and TGF- ß receptor signaling. We conducted a literature survey to validate these cross-talks, and found evidence supporting a role for many of them in contributing to drug resistance. We also analyzed an independent study of lapatinib resistance in the BT474 breast cancer cell-line and found the same signaling pathways making cross-talks with the EGFR/ErbB signaling pathway as in the primary dataset. CONCLUSIONS: Our results indicate that the activation of compensatory pathways can potentially cause up-regulation of EGFR/ErbB pathway genes (counteracting the inhibiting effect of lapatinib) via signaling cross-talk. Thus, the up-regulated members of these compensatory pathways along with the members of the EGFR/ErbB signaling pathway are interesting as potential targets for designing novel anti-cancer therapeutics.

Asunto(s)

Neoplasias de la Mama/patología , Biología Computacional/métodos , Resistencia a Antineoplásicos , Modelos Estadísticos , Transducción de Señal/efectos de los fármacos , Teorema de Bayes , Línea Celular Tumoral , Receptores ErbB/metabolismo , Humanos , Receptor IGF Tipo 1 , Receptor de Insulina/metabolismo , Receptores Notch/metabolismo , Receptores de Somatomedina/metabolismo , Vía de Señalización Wnt/efectos de los fármacos

12.

Sampling phylogenetic tree space with the generalized Gibbs sampler.

Keith, Jonathan M.

Cladistics ; 31(4): 438-440, 2015 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-34772263

RESUMEN

A recent article published in Cladistics is critical of a number of heuristic methods for phylogenetic inference based on parsimony scores. One of my papers is among those criticized, and I would appreciate the opportunity to make a public response. The specific criticism is that I have re-invented an algorithm for economizing parsimony calculations on trees that differ by a subtree pruning and regrafting (SPR) rearrangement. This criticism is justified, and I apologize for incorrectly claiming originality for my presentation of this algorithm. However, I would like to clarify the intent of my paper, if I can do so without detracting from the sincerity of my apology. My paper is not about that algorithm, nor even primarily about parsimony. Rather, it is about a novel strategy for Markov chain Monte Carlo (MCMC) sampling in a state space consisting of trees. The sampler involves drawing from conditional distributions over sets of trees: a Gibbs-like strategy that had not previously been used to sample tree-space. I would like to see this technique incorporated into MCMC samplers for phylogenetics, as it may have advantages over commonly used Metropolis-like strategies. I have recently used it to sample phylogenies of a biological invasion, and I am finding many applications for it in agent-based Bayesian ecological modelling. It is thus my contention that my 2005 paper retains substantial value.

13.

Investigating genomic structure using changept: A Bayesian segmentation model.

Algama, Manjula; Keith, Jonathan M.

Comput Struct Biotechnol J ; 10(17): 107-15, 2014 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-25349679

RESUMEN

Genomes are composed of a wide variety of elements with distinct roles and characteristics. Some of these elements are well-characterised functional components such as protein-coding exons. Other elements play regulatory or structural roles, encode functional non-protein-coding RNAs, or perform some other function yet to be characterised. Still others may have no functional importance, though they may nevertheless be of interest to biologists. One technique for investigating the composition of genomes is to segment sequences into compositionally homogenous blocks. This technique, known as 'sequence segmentation' or 'change-point analysis', is used to identify patterns of variation across genomes such as GC-rich and GC-poor regions, coding and non-coding regions, slowly evolving and rapidly evolving regions and many other types of variation. In this mini-review we outline many of the genome segmentation methods currently available and then focus on a Bayesian DNA segmentation algorithm, with examples of its various applications.

14.

Drosophila 3' UTRs are more complex than protein-coding sequences.

Algama, Manjula; Oldmeadow, Christopher; Tasker, Edward; Mengersen, Kerrie; Keith, Jonathan M.

PLoS One ; 9(5): e97336, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24824035

RESUMEN

The 3' UTRs of eukaryotic genes participate in a variety of post-transcriptional (and some transcriptional) regulatory interactions. Some of these interactions are well characterised, but an undetermined number remain to be discovered. While some regulatory sequences in 3' UTRs may be conserved over long evolutionary time scales, others may have only ephemeral functional significance as regulatory profiles respond to changing selective pressures. Here we propose a sensitive segmentation methodology for investigating patterns of composition and conservation in 3' UTRs based on comparison of closely related species. We describe encodings of pairwise and three-way alignments integrating information about conservation, GC content and transition/transversion ratios and apply the method to three closely related Drosophila species: D. melanogaster, D. simulans and D. yakuba. Incorporating multiple data types greatly increased the number of segment classes identified compared to similar methods based on conservation or GC content alone. We propose that the number of segments and number of types of segment identified by the method can be used as proxies for functional complexity. Our main finding is that the number of segments and segment classes identified in 3' UTRs is greater than in the same length of protein-coding sequence, suggesting greater functional complexity in 3' UTRs. There is thus a need for sustained and extensive efforts by bioinformaticians to delineate functional elements in this important genomic fraction. C code, data and results are available upon request.

Asunto(s)

Regiones no Traducidas 3'/genética , Drosophila/genética , Variación Genética , Modelos Genéticos , Animales , Secuencia de Bases , Biología Computacional , Datos de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Especificidad de la Especie

15.

Agent-based Bayesian approach to monitoring the progress of invasive species eradication programs.

Keith, Jonathan M; Spring, Daniel.

Proc Natl Acad Sci U S A ; 110(33): 13428-33, 2013 Aug 13.

Artículo en Inglés | MEDLINE | ID: mdl-23878210

RESUMEN

Eradication of an invasive species can provide significant environmental, economic, and social benefits, but eradication programs often fail. Constant and careful monitoring improves the chance of success, but an invasion may seem to be in decline even when it is expanding in abundance or spatial extent. Determining whether an invasion is in decline is a challenging inference problem for two reasons. First, it is typically infeasible to regularly survey the entire infested region owing to high cost. Second, surveillance methods are imperfect and fail to detect some individuals. These two factors also make it difficult to determine why an eradication program is failing. Agent-based methods enable inferences to be made about the locations of undiscovered individuals over time to identify trends in invader abundance and spatial extent. We develop an agent-based Bayesian method and apply it to Australia's largest eradication program: the campaign to eradicate the red imported fire ant (Solenopsis invicta) from Brisbane. The invasion was deemed to be almost eradicated in 2004 but our analyses indicate that its geographic range continued to expand despite a sharp decline in number of nests. We also show that eradication would probably have been achieved with a relatively small increase in the area searched and treated. Our results demonstrate the importance of inferring temporal and spatial trends in ongoing invasions. The method can handle incomplete observations and takes into account the effects of human intervention. It has the potential to transform eradication practices.

Asunto(s)

Hormigas/fisiología , Conservación de los Recursos Naturales/métodos , Monitoreo del Ambiente/métodos , Control de Insectos/métodos , Especies Introducidas/estadística & datos numéricos , Modelos Biológicos , Animales , Teorema de Bayes , Dinámica Poblacional , Queensland

16.

A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard.

Keith, Jonathan M; Davey, Christian M; Boyd, Sarah E.

BMC Bioinformatics ; 13: 179, 2012 Jul 27.

Artículo en Inglés | MEDLINE | ID: mdl-22838505

RESUMEN

BACKGROUND: Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier can be evaluated in terms of sensitivity and specificity using benchmark, or gold standard, data, that is, data for which the true classification is known. However, a gold standard is not always available. Here we demonstrate that a Bayesian model for comparing medical diagnostics without a gold standard can be successfully applied in the bioinformatics domain, to genomic scale data sets. We present a new implementation, which unlike previous implementations is applicable to any number of classifiers. We apply this model, for the first time, to the problem of finding the globally optimal logical combination of classifiers. RESULTS: We compared three classifiers of protein subcellular localisation, and evaluated our estimates of sensitivity and specificity against estimates obtained using a gold standard. The method overestimated sensitivity and specificity with only a small discrepancy, and correctly ranked the classifiers. Diagnostic tests for swine flu were then compared on a small data set. Lastly, classifiers for a genome-wide association study of macular degeneration with 541094 SNPs were analysed. In all cases, run times were feasible, and results precise. The optimal logical combination of classifiers was also determined for all three data sets. Code and data are available from http://bioinformatics.monash.edu.au/downloads/. CONCLUSIONS: The examples demonstrate the methods are suitable for both small and large data sets, applicable to the wide range of bioinformatics classification problems, and robust to dependence between classifiers. In all three test cases, the globally optimal logical combination of the classifiers was found to be their union, according to three out of four ranking criteria. We propose as a general rule of thumb that the union of classifiers will be close to optimal.

Asunto(s)

Biología Computacional/métodos , Algoritmos , Teorema de Bayes , Clasificación/métodos , Estudio de Asociación del Genoma Completo , Humanos , Degeneración Macular/genética , Polimorfismo de Nucleótido Simple , Proteínas/análisis , Sensibilidad y Especificidad

17.

Computational characterization of 3' splice variants in the GFAP isoform family.

Boyd, Sarah E; Nair, Betina; Ng, Sze Woei; Keith, Jonathan M; Orian, Jacqueline M.

PLoS One ; 7(3): e33565, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22479412

RESUMEN

Glial fibrillary acidic protein (GFAP) is an intermediate filament (IF) protein specific to central nervous system (CNS) astrocytes. It has been the subject of intense interest due to its association with neurodegenerative diseases, and because of growing evidence that IF proteins not only modulate cellular structure, but also cellular function. Moreover, GFAP has a family of splicing isoforms apparently more complex than that of other CNS IF proteins, consistent with it possessing a range of functional and structural roles. The gene consists of 9 exons, and to date all isoforms associated with 3' end splicing have been identified from modifications within intron 7, resulting in the generation of exon 7a (GFAPÎ´/Îµ) and 7b (GFAPκ). To better understand the nature and functional significance of variation in this region, we used a Bayesian multiple change-point approach to identify conserved regions. This is the first successful application of this method to a single gene--it has previously only been used in whole-genome analyses. We identified several highly or moderately conserved regions throughout the intron 7/7a/7b regions, including untranslated regions and regulatory features, consistent with the biology of GFAP. Several putative unconfirmed features were also identified, including a possible new isoform. We then integrated multiple computational analyses on both the DNA and protein sequences from the mouse, rat and human, showing that the major isoform, GFAPα, has highly conserved structure and features across the three species, whereas the minor isoforms GFAPÎ´/Îµ and GFAPκ have low conservation of structure and features at the distal 3' end, both relative to each other and relative to GFAPα. The overall picture suggests distinct and tightly regulated functions for the 3' end isoforms, consistent with complex astrocyte biology. The results illustrate a computational approach for characterising splicing isoform families, using both DNA and protein sequences.

Asunto(s)

Biología Computacional/métodos , Proteína Ácida Fibrilar de la Glía/química , Empalme Alternativo , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Secuencia Conservada , Exones , Proteína Ácida Fibrilar de la Glía/genética , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Ratones , Datos de Secuencia Molecular , Fosforilación , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Sitios de Empalme de ARN , Ratas , Elementos Reguladores de la Transcripción

18.

Model selection in Bayesian segmentation of multiple DNA alignments.

Oldmeadow, Christopher; Keith, Jonathan M.

Bioinformatics ; 27(5): 604-10, 2011 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-21208984

RESUMEN

MOTIVATION: The analysis of multiple sequence alignments is allowing researchers to glean valuable insights into evolution, as well as identify genomic regions that may be functional, or discover novel classes of functional elements. Understanding the distribution of conservation levels that constitutes the evolutionary landscape is crucial to distinguishing functional regions from non-functional. Recent evidence suggests that a binary classification of evolutionary rates is inappropriate for this purpose and finds only highly conserved functional elements. Given that the distribution of evolutionary rates is multi-modal, determining the number of modes is of paramount concern. Through simulation, we evaluate the performance of a number of information criterion approaches derived from MCMC simulations in determining the dimension of a model. RESULTS: We utilize a deviance information criterion (DIC) approximation that is more robust than the approximations from other information criteria, and show our information criteria approximations do not produce superfluous modes when estimating conservation distributions under a variety of circumstances. We analyse the distribution of conservation for a multiple alignment comprising four primate species and mouse, and repeat this on two additional multiple alignments of similar species. We find evidence of six distinct classes of evolutionary rates that appear to be robust to the species used. AVAILABILITY: Source code and data are available at http://dl.dropbox.com/u/477240/changept.zip.

Asunto(s)

Evolución Molecular , Modelos Estadísticos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Animales , Teorema de Bayes , Biología Computacional/métodos , Simulación por Computador , ADN/análisis , Genómica/métodos , Ratones , Primates

19.

Multiple evolutionary rate classes in animal genome evolution.

Oldmeadow, Christopher; Mengersen, Kerrie; Mattick, John S; Keith, Jonathan M.

Mol Biol Evol ; 27(4): 942-53, 2010 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-19955480

RESUMEN

The proportion of functional sequence in the human genome is currently a subject of debate. The most widely accepted figure is that approximately 5% is under purifying selection. In Drosophila, estimates are an order of magnitude higher, though this corresponds to a similar quantity of sequence. These estimates depend on the difference between the distribution of genomewide evolutionary rates and that observed in a subset of sequences presumed to be neutrally evolving. Motivated by the widening gap between these estimates and experimental evidence of genome function, especially in mammals, we developed a sensitive technique for evaluating such distributions and found that they are much more complex than previously apparent. We found strong evidence for at least nine well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least seven classes in an alignment of four mammals, including human. We also identified at least three rate classes in human ancestral repeats. By positing that the largest of these ancestral repeat classes is neutrally evolving, we estimate that the proportion of nonneutrally evolving sequence is 30% of human ancestral repeats and 45% of the aligned portion of the genome. However, we also question whether any of the classes represent neutrally evolving sequences and argue that a plausible alternative is that they reflect variable structure-function constraints operating throughout the genomes of complex organisms.

Asunto(s)

Drosophila/genética , Mamíferos/genética , Animales , Secuencia Conservada , Evolución Molecular , Genoma Humano , Humanos , Recombinación Genética , Alineación de Secuencia

20.

Bayesian latent trait modeling of migraine symptom data.

Chen, Carla Chia Ming; Keith, Jonathan M; Nyholt, Dale R; Martin, Nicholas G; Mengersen, Kerrie L.

Hum Genet ; 126(2): 277-88, 2009 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-19390863

RESUMEN

Definition of disease phenotype is a necessary preliminary to research into genetic causes of a complex disease. Clinical diagnosis of migraine is currently based on diagnostic criteria developed by the International Headache Society. Previously, we examined the natural clustering of these diagnostic symptoms using latent class analysis (LCA) and found that a four-class model was preferred. However, the classes can be ordered such that all symptoms progressively intensify, suggesting that a single continuous variable representing disease severity may provide a better model. Here, we compare two models: item response theory and LCA, each constructed within a Bayesian context. A deviance information criterion is used to assess model fit. We phenotyped our population sample using these models, estimated heritability and conducted genome-wide linkage analysis using Merlin-qtl. LCA with four classes was again preferred. After transformation, phenotypic trait values derived from both models are highly correlated (correlation = 0.99) and consequently results from subsequent genetic analyses were similar. Heritability was estimated at 0.37, while multipoint linkage analysis produced genome-wide significant linkage to chromosome 7q31-q33 and suggestive linkage to chromosomes 1 and 2. We argue that such continuous measures are a powerful tool for identifying genes contributing to migraine susceptibility.

Asunto(s)

Trastornos Migrañosos/diagnóstico , Trastornos Migrañosos/genética , Adulto , Anciano , Anciano de 80 o más Años , Teorema de Bayes , Análisis por Conglomerados , Enfermedades en Gemelos , Femenino , Ligamiento Genético , Predisposición Genética a la Enfermedad , Humanos , Escala de Lod , Masculino , Persona de Mediana Edad , Trastornos Migrañosos/fisiopatología , Fenotipo

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA