|

1.

Single B cell transcriptomics identifies multiple isotypes of broadly neutralizing antibodies against flaviviruses.

Lubow, Jay; Levoir, Lisa M; Ralph, Duncan K; Belmont, Laura; Contreras, Maya; Cartwright-Acar, Catiana H; Kikawa, Caroline; Kannan, Shruthi; Davidson, Edgar; Duran, Veronica; Rebellon-Sanchez, David E; Sanz, Ana M; Rosso, Fernando; Doranz, Benjamin J; Einav, Shirit; Matsen Iv, Frederick A; Goo, Leslie.

PLoS Pathog ; 19(10): e1011722, 2023 10.

Article En | MEDLINE | ID: mdl-37812640

Sequential dengue virus (DENV) infections often generate neutralizing antibodies against all four DENV serotypes and sometimes, Zika virus. Characterizing cross-flavivirus broadly neutralizing antibody (bnAb) responses can inform countermeasures that avoid enhancement of infection associated with non-neutralizing antibodies. Here, we used single cell transcriptomics to mine the bnAb repertoire following repeated DENV infections. We identified several new bnAbs with comparable or superior breadth and potency to known bnAbs, and with distinct recognition determinants. Unlike all known flavivirus bnAbs, which are IgG1, one newly identified cross-flavivirus bnAb (F25.S02) was derived from IgA1. Both IgG1 and IgA1 versions of F25.S02 and known bnAbs displayed neutralizing activity, but only IgG1 enhanced infection in monocytes expressing IgG and IgA Fc receptors. Moreover, IgG-mediated enhancement of infection was inhibited by IgA1 versions of bnAbs. We demonstrate a role for IgA in flavivirus infection and immunity with implications for vaccine and therapeutic strategies.

Flavivirus , Zika Virus Infection , Zika Virus , Humans , Broadly Neutralizing Antibodies , Transcriptome , Antibodies, Neutralizing , Immunoglobulin G , Immunoglobulin A , Antibodies, Viral

2.

Representing and extending ensembles of parsimonious evolutionary histories with a directed acyclic graph.

Dumm, Will; Barker, Mary; Howard-Snyder, William; DeWitt Iii, William S; Matsen Iv, Frederick A.

J Math Biol ; 87(5): 75, 2023 10 25.

Article En | MEDLINE | ID: mdl-37878119

In many situations, it would be useful to know not just the best phylogenetic tree for a given data set, but the collection of high-quality trees. This goal is typically addressed using Bayesian techniques, however, current Bayesian methods do not scale to large data sets. Furthermore, for large data sets with relatively low signal one cannot even store every good tree individually, especially when the trees are required to be bifurcating. In this paper, we develop a novel object called the "history subpartition directed acyclic graph" (or "history sDAG" for short) that compactly represents an ensemble of trees with labels (e.g. ancestral sequences) mapped onto the internal nodes. The history sDAG can be built efficiently and can also be efficiently trimmed to only represent maximally parsimonious trees. We show that the history sDAG allows us to find many additional equally parsimonious trees, extending combinatorially beyond the ensemble used to construct it. We argue that this object could be useful as the "skeleton" of a more complete uncertainty quantification.

Biological Evolution , Radiopharmaceuticals , Phylogeny , Bayes Theorem , Uncertainty

3.

Jointly modeling deep mutational scans identifies shifted mutational effects among SARS-CoV-2 spike homologs.

Haddox, Hugh K; Galloway, Jared G; Dadonaite, Bernadeta; Bloom, Jesse D; Matsen Iv, Frederick A; DeWitt, William S.

bioRxiv ; 2023 Aug 02.

Article En | MEDLINE | ID: mdl-37577604

Deep mutational scanning (DMS) is a high-throughput experimental technique that measures the effects of thousands of mutations to a protein. These experiments can be performed on multiple homologs of a protein or on the same protein selected under multiple conditions. It is often of biological interest to identify mutations with shifted effects across homologs or conditions. However, it is challenging to determine if observed shifts arise from biological signal or experimental noise. Here, we describe a method for jointly inferring mutational effects across multiple DMS experiments while also identifying mutations that have shifted in their effects among experiments. A key aspect of our method is to regularize the inferred shifts, so that they are nonzero only when strongly supported by the data. We apply this method to DMS experiments that measure how mutations to spike proteins from SARS-CoV-2 variants (Delta, Omicron BA.1, and Omicron BA.2) affect cell entry. Most mutational effects are conserved between these spike homologs, but a fraction have markedly shifted. We experimentally validate a subset of the mutations inferred to have shifted effects, and confirm differences of > 1,000-fold in the impact of the same mutation on spike-mediated viral infection across spikes from different SARS-CoV-2 variants. Overall, our work establishes a general approach for comparing sets of DMS experiments to identify biologically important shifts in mutational effects.

4.

Automatic Differentiation is no Panacea for Phylogenetic Gradient Computation.

Fourment, Mathieu; Swanepoel, Christiaan J; Galloway, Jared G; Ji, Xiang; Gangavarapu, Karthik; Suchard, Marc A; Matsen Iv, Frederick A.

Genome Biol Evol ; 15(6)2023 06 01.

Article En | MEDLINE | ID: mdl-37265233

Gradients of probabilistic model likelihoods with respect to their parameters are essential for modern computational statistics and machine learning. These calculations are readily available for arbitrary models via "automatic differentiation" implemented in general-purpose machine-learning libraries such as TensorFlow and PyTorch. Although these libraries are highly optimized, it is not clear if their general-purpose nature will limit their algorithmic complexity or implementation speed for the phylogenetic case compared to phylogenetics-specific code. In this paper, we compare six gradient implementations of the phylogenetic likelihood functions, in isolation and also as part of a variational inference procedure. We find that although automatic differentiation can scale approximately linearly in tree size, it is much slower than the carefully implemented gradient calculation for tree likelihood and ratio transformation operations. We conclude that a mixed approach combining phylogenetic libraries with machine learning libraries will provide the optimal combination of speed and model flexibility moving forward.

Machine Learning , Models, Statistical , Phylogeny , Likelihood Functions , Algorithms

5.

Comparing T cell receptor repertoires using optimal transport.

Olson, Branden J; Schattgen, Stefan A; Thomas, Paul G; Bradley, Philip; Matsen Iv, Frederick A.

PLoS Comput Biol ; 18(12): e1010681, 2022 Dec.

Article En | MEDLINE | ID: mdl-36476997

The complexity of entire T cell receptor (TCR) repertoires makes their comparison a difficult but important task. Current methods of TCR repertoire comparison can incur a high loss of distributional information by considering overly simplistic sequence- or repertoire-level characteristics. Optimal transport methods form a suitable approach for such comparison given some distance or metric between values in the sample space, with appealing theoretical and computational properties. In this paper we introduce a nonparametric approach to comparing empirical TCR repertoires that applies the Sinkhorn distance, a fast, contemporary optimal transport method, and a recently-created distance between TCRs called TCRdist. We show that our methods identify meaningful differences between samples from distinct TCR distributions for several case studies, and compete with more complicated methods despite minimal modeling assumptions and a simpler pipeline.

6.

Detailed analysis of antibody responses to SARS-CoV-2 vaccination and infection in macaques.

Willcox, Alexandra C; Sung, Kevin; Garrett, Meghan E; Galloway, Jared G; Erasmus, Jesse H; Logue, Jennifer K; Hawman, David W; Chu, Helen Y; Hasenkrug, Kim J; Fuller, Deborah H; Matsen Iv, Frederick A; Overbaugh, Julie.

PLoS Pathog ; 18(4): e1010155, 2022 04.

Article En | MEDLINE | ID: mdl-35404959

Macaques are a commonly used model for studying immunity to human viruses, including for studies of SARS-CoV-2 infection and vaccination. However, it is unknown whether macaque antibody responses resemble the response in humans. To answer this question, we employed a phage-based deep mutational scanning approach (Phage-DMS) to compare which linear epitopes are targeted on the SARS-CoV-2 Spike protein in convalescent humans, convalescent (re-infected) rhesus macaques, mRNA-vaccinated humans, and repRNA-vaccinated pigtail macaques. We also used Phage-DMS to determine antibody escape pathways within each epitope, enabling a granular comparison of antibody binding specificities at the locus level. Overall, we identified some common epitope targets in both macaques and humans, including in the fusion peptide (FP) and stem helix-heptad repeat 2 (SH-H) regions. Differences between groups included a response to epitopes in the N-terminal domain (NTD) and C-terminal domain (CTD) in vaccinated humans but not vaccinated macaques, as well as recognition of a CTD epitope and epitopes flanking the FP in convalescent macaques but not convalescent humans. There was also considerable variability in the escape pathways among individuals within each group. Sera from convalescent macaques showed the least variability in escape overall and converged on a common response with vaccinated humans in the SH-H epitope region, suggesting highly similar antibodies were elicited. Collectively, these findings suggest that the antibody response to SARS-CoV-2 in macaques shares many features with humans, but with substantial differences in the recognition of certain epitopes and considerable individual variability in antibody escape profiles, suggesting a diverse repertoire of antibodies that can respond to major epitopes in both humans and macaques. Differences in macaque species and exposure type may also contribute to these findings.

COVID-19 , SARS-CoV-2 , Animals , Antibodies, Neutralizing , Antibodies, Viral , Antibody Formation , COVID-19/prevention & control , COVID-19 Vaccines , Epitopes , Humans , Macaca mulatta , Spike Glycoprotein, Coronavirus , Vaccination

7.

Dynamics of HIV DNA reservoir seeding in a cohort of superinfected Kenyan women.

Pankau, Mark D; Reeves, Daniel B; Harkins, Elias; Ronen, Keshet; Jaoko, Walter; Mandaliya, Kishor; Graham, Susan M; McClelland, R Scott; Matsen Iv, Frederick A; Schiffer, Joshua T; Overbaugh, Julie; Lehman, Dara A.

PLoS Pathog ; 16(2): e1008286, 2020 02.

Article En | MEDLINE | ID: mdl-32023326

A reservoir of HIV-infected cells that persists despite suppressive antiretroviral therapy (ART) is the source of viral rebound upon ART cessation and the major barrier to a cure. Understanding reservoir seeding dynamics will help identify the best timing for HIV cure strategies. Here we characterize reservoir seeding using longitudinal samples from before and after ART initiation in individuals who sequentially became infected with genetically distinct HIV variants (superinfected). We previously identified cases of superinfection in a cohort of Kenyan women, and the dates of both initial infection and superinfection were determined. Six women, superinfected 0.2-5.2 years after initial infection, were subsequently treated with ART 5.4-18.0 years after initial infection. We performed next-generation sequencing of HIV gag and env RNA from plasma collected during acute infection as well as every ~2 years thereafter until ART initiation, and of HIV DNA from PBMCs collected 0.9-4.8 years after viral suppression on ART. We assessed phylogenetic relationships between HIV DNA reservoir sequences and longitudinal plasma RNA sequences prior to ART, to determine proportions of initial and superinfecting variants in the reservoir. The proportions of initial and superinfection lineage variants present in the HIV DNA reservoir were most similar to the proportions present in HIV RNA immediately prior to ART initiation. Phylogenetic analysis confirmed that the majority of HIV DNA reservoir sequences had the smallest pairwise distance to RNA sequences from timepoints closest to ART initiation. Our data suggest that while reservoir cells are created throughout pre-ART infection, the majority of HIV-infected cells that persist during ART entered the reservoir near the time of ART initiation. We estimate the half-life of pre-ART DNA reservoir sequences to be ~25 months, which is shorter than estimated reservoir decay rates during suppressive ART, implying continual decay and reseeding of the reservoir up to the point of ART initiation.

DNA, Viral , HIV Infections , HIV-1 , Phylogeny , env Gene Products, Human Immunodeficiency Virus , gag Gene Products, Human Immunodeficiency Virus , Adult , DNA, Viral/blood , DNA, Viral/genetics , Female , Follow-Up Studies , HIV Infections/blood , HIV Infections/drug therapy , HIV Infections/genetics , HIV-1/genetics , HIV-1/metabolism , High-Throughput Nucleotide Sequencing , Humans , Kenya , env Gene Products, Human Immunodeficiency Virus/genetics , env Gene Products, Human Immunodeficiency Virus/metabolism , gag Gene Products, Human Immunodeficiency Virus/genetics , gag Gene Products, Human Immunodeficiency Virus/metabolism

8.

Combining Viral Genetics and Statistical Modeling to Improve HIV-1 Time-of-infection Estimation towards Enhanced Vaccine Efficacy Assessment.

Rossenkhan, Raabya; Rolland, Morgane; Labuschagne, Jan P L; Ferreira, Roux-Cil; Magaret, Craig A; Carpp, Lindsay N; Matsen Iv, Frederick A; Huang, Yunda; Rudnicki, Erika E; Zhang, Yuanyuan; Ndabambi, Nonkululeko; Logan, Murray; Holzman, Ted; Abrahams, Melissa-Rose; Anthony, Colin; Tovanabutra, Sodsai; Warth, Christopher; Botha, Gordon; Matten, David; Nitayaphan, Sorachai; Kibuuka, Hannah; Sawe, Fred K; Chopera, Denis; Eller, Leigh Anne; Travers, Simon; Robb, Merlin L; Williamson, Carolyn; Gilbert, Peter B; Edlefsen, Paul T.

Viruses ; 11(7)2019 07 03.

Article En | MEDLINE | ID: mdl-31277299

Knowledge of the time of HIV-1 infection and the multiplicity of viruses that establish HIV-1 infection is crucial for the in-depth analysis of clinical prevention efficacy trial outcomes. Better estimation methods would improve the ability to characterize immunological and genetic sequence correlates of efficacy within preventive efficacy trials of HIV-1 vaccines and monoclonal antibodies. We developed new methods for infection timing and multiplicity estimation using maximum likelihood estimators that shift and scale (calibrate) estimates by fitting true infection times and founder virus multiplicities to a linear regression model with independent variables defined by data on HIV-1 sequences, viral load, diagnostics, and sequence alignment statistics. Using Poisson models of measured mutation counts and phylogenetic trees, we analyzed longitudinal HIV-1 sequence data together with diagnostic and viral load data from the RV217 and CAPRISA 002 acute HIV-1 infection cohort studies. We used leave-one-out cross validation to evaluate the prediction error of these calibrated estimators versus that of existing estimators and found that both infection time and founder multiplicity can be estimated with improved accuracy and precision by calibration. Calibration considerably improved all estimators of time since HIV-1 infection, in terms of reducing bias to near zero and reducing root mean squared error (RMSE) to 5-10 days for sequences collected 1-2 months after infection. The calibration of multiplicity assessments yielded strong improvements with accurate predictions (ROC-AUC above 0.85) in all cases. These results have not yet been validated on external data, and the best-fitting models are likely to be less robust than simpler models to variation in sequencing conditions. For all evaluated models, these results demonstrate the value of calibration for improved estimation of founder multiplicity and of time since HIV-1 infection.

AIDS Vaccines , HIV Infections/prevention & control , HIV-1/genetics , Models, Statistical , Evolution, Molecular , Genetic Variation , HIV Infections/virology , Humans , Mutation , Phylogeny , Sequence Analysis , Time Factors , Viral Load

9.

Kappa chain maturation helps drive rapid development of an infant HIV-1 broadly neutralizing antibody lineage.

Simonich, Cassandra A; Doepker, Laura; Ralph, Duncan; Williams, James A; Dhar, Amrit; Yaffe, Zak; Gentles, Lauren; Small, Christopher T; Oliver, Brian; Vigdorovich, Vladimir; Mangala Prasad, Vidya; Nduati, Ruth; Sather, D Noah; Lee, Kelly K; Matsen Iv, Frederick A; Overbaugh, Julie.

Nat Commun ; 10(1): 2190, 2019 05 16.

Article En | MEDLINE | ID: mdl-31097697

HIV-infected infants develop broadly neutralizing plasma responses with more rapid kinetics than adults, suggesting the ontogeny of infant responses could better inform a path to achievable vaccine targets. Here we reconstruct the developmental lineage of BF520.1, an infant-derived HIV-specific broadly neutralizing antibody (bnAb), using computational methods developed specifically for this purpose. We find that the BF520.1 inferred naive precursor binds HIV Env. We also show that heterologous cross-clade neutralizing activity evolved in the infant within six months of infection and that, ultimately, only 2% SHM is needed to achieve the full breadth of the mature antibody. Mutagenesis and structural analyses reveal that, for this infant bnAb, substitutions in the kappa chain were critical for activity, particularly in CDRL1. Overall, the developmental pathway of this infant antibody includes features distinct from adult antibodies, including several that may be amenable to better vaccine responses.

Antibodies, Neutralizing/immunology , HIV Antibodies/immunology , HIV Infections/prevention & control , HIV-1/immunology , Immunoglobulin kappa-Chains/immunology , AIDS Vaccines/immunology , Age Factors , Antibodies, Neutralizing/genetics , Antibodies, Neutralizing/isolation & purification , Antibodies, Neutralizing/metabolism , Computational Biology/methods , Cross Reactions/immunology , Drug Design , HIV Antibodies/genetics , HIV Antibodies/isolation & purification , HIV Antibodies/metabolism , HIV Infections/blood , HIV Infections/immunology , HIV Infections/virology , Humans , Immunoglobulin kappa-Chains/genetics , Immunoglobulin kappa-Chains/metabolism , Infant , Leukocytes, Mononuclear , Mutagenesis , Sequence Analysis, DNA , env Gene Products, Human Immunodeficiency Virus/immunology

10.

Inferred Allelic Variants of Immunoglobulin Receptor Genes: A System for Their Evaluation, Documentation, and Naming.

Ohlin, Mats; Scheepers, Cathrine; Corcoran, Martin; Lees, William D; Busse, Christian E; Bagnara, Davide; Thörnqvist, Linnea; Bürckert, Jean-Philippe; Jackson, Katherine J L; Ralph, Duncan; Schramm, Chaim A; Marthandan, Nishanth; Breden, Felix; Scott, Jamie; Matsen Iv, Frederick A; Greiff, Victor; Yaari, Gur; Kleinstein, Steven H; Christley, Scott; Sherkow, Jacob S; Kossida, Sofia; Lefranc, Marie-Paule; van Zelm, Menno C; Watson, Corey T; Collins, Andrew M.

Front Immunol ; 10: 435, 2019.

Article En | MEDLINE | ID: mdl-30936866

Immunoglobulins or antibodies are the main effector molecules of the B-cell lineage and are encoded by hundreds of variable (V), diversity (D), and joining (J) germline genes, which recombine to generate enormous IG diversity. Recently, high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) of recombined V-(D)-J genes has offered unprecedented insights into the dynamics of IG repertoires in health and disease. Faithful biological interpretation of AIRR-seq studies depends upon the annotation of raw AIRR-seq data, using reference germline gene databases to identify the germline genes within each rearrangement. Existing reference databases are incomplete, as shown by recent AIRR-seq studies that have inferred the existence of many previously unreported polymorphisms. Completing the documentation of genetic variation in germline gene databases is therefore of crucial importance. Lymphocyte receptor genes and alleles are currently assigned by the Immunoglobulins, T cell Receptors and Major Histocompatibility Nomenclature Subcommittee of the International Union of Immunological Societies (IUIS) and managed in IMGT®, the international ImMunoGeneTics information system® (IMGT). In 2017, the IMGT Group reached agreement with a group of AIRR-seq researchers on the principles of a streamlined process for identifying and naming inferred allelic sequences, for their incorporation into IMGT®. These researchers represented the AIRR Community, a network of over 300 researchers whose objective is to promote all aspects of immunoglobulin and T-cell receptor repertoire studies, including the standardization of experimental and computational aspects of AIRR-seq data generation and analysis. The Inferred Allele Review Committee (IARC) was established by the AIRR Community to devise policies, criteria, and procedures to perform this function. Formalized evaluations of novel inferred sequences have now begun and submissions are invited via a new dedicated portal (https://ogrdb.airr-community.org). Here, we summarize recommendations developed by the IARC-focusing, to begin with, on human IGHV genes-with the goal of facilitating the acceptance of inferred allelic variants of germline IGHV genes. We believe that this initiative will improve the quality of AIRR-seq studies by facilitating the description of human IG germline gene variation, and that in time, it will expand to the documentation of TR and IG genes in many vertebrate species.

Alleles , Genes, Immunoglobulin , Genetic Variation/genetics , Terminology as Topic , V(D)J Recombination , Base Sequence , Databases, Genetic , Datasets as Topic , Gene Library , Germ-Line Mutation , High-Throughput Nucleotide Sequencing , Humans , Immunoglobulin Heavy Chains/genetics , Immunoglobulin Variable Region/genetics , Polymerase Chain Reaction/methods , Sequence Alignment , Sequence Homology, Nucleic Acid , VDJ Exons/genetics

11.

SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination.

Jariani, Abbas; Warth, Christopher; Deforche, Koen; Libin, Pieter; Drummond, Alexei J; Rambaut, Andrew; Matsen Iv, Frederick A; Theys, Kristof.

Virus Evol ; 5(1): vez003, 2019 Jan.

Article En | MEDLINE | ID: mdl-30863552

Simulations are widely used to provide expectations and predictive distributions under known conditions against which to compare empirical data. Such simulations are also invaluable for testing and comparing the behaviour and power of inference methods. We describe SANTA-SIM, a software package to simulate the evolution of a population of gene sequences forwards through time. It models the underlying biological processes as discrete components: replication, recombination, point mutations, insertion-deletions, and selection under various fitness models and population size dynamics. The software is designed to be intuitive to work with for a wide range of users and executable in a cross-platform manner.

12.

AIRR Community Standardized Representations for Annotated Immune Repertoires.

Vander Heiden, Jason Anthony; Marquez, Susanna; Marthandan, Nishanth; Bukhari, Syed Ahmad Chan; Busse, Christian E; Corrie, Brian; Hershberg, Uri; Kleinstein, Steven H; Matsen Iv, Frederick A; Ralph, Duncan K; Rosenfeld, Aaron M; Schramm, Chaim A; Christley, Scott; Laserson, Uri.

Front Immunol ; 9: 2206, 2018.

Article En | MEDLINE | ID: mdl-30323809

Increased interest in the immune system's involvement in pathophysiological phenomena coupled with decreased DNA sequencing costs have led to an explosion of antibody and T cell receptor sequencing data collectively termed "adaptive immune receptor repertoire sequencing" (AIRR-seq or Rep-Seq). The AIRR Community has been actively working to standardize protocols, metadata, formats, APIs, and other guidelines to promote open and reproducible studies of the immune repertoire. In this paper, we describe the work of the AIRR Community's Data Representation Working Group to develop standardized data representations for storing and sharing annotated antibody and T cell receptor data. Our file format emphasizes ease-of-use, accessibility, scalability to large data sets, and a commitment to open and transparent science. It is composed of a tab-delimited format with a specific schema. Several popular repertoire analysis tools and data repositories already utilize this AIRR-seq data format. We hope that others will follow suit in the interest of promoting interoperable standards.

Antibodies/genetics , Base Sequence , Database Management Systems , Information Dissemination/methods , Receptors, Antigen, T-Cell/genetics , Adaptive Immunity/genetics , Databases, Genetic , Datasets as Topic , High-Throughput Nucleotide Sequencing/economics , Humans , Receptors, Immunologic/genetics , Research Design

13.

Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals.

Fourment, Mathieu; Claywell, Brian C; Dinh, Vu; McCoy, Connor; Matsen Iv, Frederick A; Darling, Aaron E.

Syst Biol ; 67(3): 490-502, 2018 May 01.

Article En | MEDLINE | ID: mdl-29186587

Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop "guided" proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy.

Classification/methods , Models, Biological , Phylogeny , Algorithms , Bayes Theorem , Internet , Monte Carlo Method

14.

Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo.

Dinh, Vu; Darling, Aaron E; Matsen Iv, Frederick A.

Syst Biol ; 67(3): 503-517, 2018 05 01.

Article En | MEDLINE | ID: mdl-29244177

Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if computationally expensive means of doing phylogenetics. Modern data collection technologies are quickly adding new sequences to already substantial databases. With all current techniques for Bayesian phylogenetics, computation must start anew each time a sequence becomes available, making it costly to maintain an up-to-date estimate of a phylogenetic posterior. These considerations highlight the need for an online Bayesian phylogenetic method which can update an existing posterior with new sequences. Here, we provide theoretical results on the consistency and stability of methods for online Bayesian phylogenetic inference based on Sequential Monte Carlo (SMC) and Markov chain Monte Carlo. We first show a consistency result, demonstrating that the method samples from the correct distribution in the limit of a large number of particles. Next, we derive the first reported set of bounds on how phylogenetic likelihood surfaces change when new sequences are added. These bounds enable us to characterize the theoretical performance of sampling algorithms by bounding the effective sample size (ESS) with a given number of particles from below. We show that the ESS is guaranteed to grow linearly as the number of particles in an SMC sampler grows. Surprisingly, this result holds even though the dimensions of the phylogenetic model grow with each new added sequence.

Classification/methods , Models, Biological , Phylogeny , Algorithms , Bayes Theorem , Monte Carlo Method

15.

A Surrogate Function for One-Dimensional Phylogenetic Likelihoods.

Claywell, Brian C; Dinh, Vu; Fourment, Mathieu; McCoy, Connor O; Matsen Iv, Frederick A.

Mol Biol Evol ; 35(1): 242-246, 2018 01 01.

Article En | MEDLINE | ID: mdl-29029199

Phylogenetics has seen a steady increase in data set size and substitution model complexity, which require increasing amounts of computational power to compute likelihoods. This motivates strategies to approximate the likelihood functions for branch length optimization and Bayesian sampling. In this article, we develop an approximation to the 1D likelihood function as parametrized by a single branch length. Our method uses a four-parameter surrogate function abstracted from the simplest phylogenetic likelihood function, the binary symmetric model. We show that it offers a surrogate that can be fit over a variety of branch lengths, that it is applicable to a wide variety of models and trees, and that it can be used effectively as a proposal mechanism for Bayesian sampling. The method is implemented as a stand-alone open-source C library for calling from phylogenetics algorithms; it has proven essential for good performance of our online phylogenetic algorithm sts.

Likelihood Functions , Phylogeny , Sequence Analysis, DNA/methods , Algorithms , Bayes Theorem , Evolution, Molecular , Markov Chains , Models, Genetic , Monte Carlo Method , Sequence Analysis, DNA/statistics & numerical data

16.

Reproducibility and Reuse of Adaptive Immune Receptor Repertoire Data.

Breden, Felix; Luning Prak, Eline T; Peters, Bjoern; Rubelt, Florian; Schramm, Chaim A; Busse, Christian E; Vander Heiden, Jason A; Christley, Scott; Bukhari, Syed Ahmad Chan; Thorogood, Adrian; Matsen Iv, Frederick A; Wine, Yariv; Laserson, Uri; Klatzmann, David; Douek, Daniel C; Lefranc, Marie-Paule; Collins, Andrew M; Bubela, Tania; Kleinstein, Steven H; Watson, Corey T; Cowell, Lindsay G; Scott, Jamie K; Kepler, Thomas B.

Front Immunol ; 8: 1418, 2017.

Article En | MEDLINE | ID: mdl-29163494

High-throughput sequencing (HTS) of immunoglobulin (B-cell receptor, antibody) and T-cell receptor repertoires has increased dramatically since the technique was introduced in 2009 (1-3). This experimental approach explores the maturation of the adaptive immune system and its response to antigens, pathogens, and disease conditions in exquisite detail. It holds significant promise for diagnostic and therapy-guiding applications. New technology often spreads rapidly, sometimes more rapidly than the understanding of how to make the products of that technology reliable, reproducible, or usable by others. As complex technologies have developed, scientific communities have come together to adopt common standards, protocols, and policies for generating and sharing data sets, such as the MIAME protocols developed for microarray experiments. The Adaptive Immune Receptor Repertoire (AIRR) Community formed in 2015 to address similar issues for HTS data of immune repertoires. The purpose of this perspective is to provide an overview of the AIRR Community's founding principles and present the progress that the AIRR Community has made in developing standards of practice and data sharing protocols. Finally, and most important, we invite all interested parties to join this effort to facilitate sharing and use of these powerful data sets (join@airr-community.org).

17.

Population dynamics of rhesus macaques and associated foamy virus in Bangladesh.

Feeroz, Mostafa M; Soliven, Khanh; Small, Christopher T; Engel, Gregory A; Andreina Pacheco, M; Yee, JoAnn L; Wang, Xiaoxing; Kamrul Hasan, M; Oh, Gunwha; Levine, Kathryn L; Rabiul Alam, S M; Craig, Karen L; Jackson, Dana L; Lee, Eun-Gyung; Barry, Peter A; Lerche, Nicholas W; Escalante, Ananias A; Matsen Iv, Frederick A; Linial, Maxine L; Jones-Engel, Lisa.

Emerg Microbes Infect ; 2(5): e29, 2013 May.

Article En | MEDLINE | ID: mdl-26038465

Foamy viruses are complex retroviruses that have been shown to be transmitted from nonhuman primates to humans. In Bangladesh, infection with simian foamy virus (SFV) is ubiquitous among rhesus macaques, which come into contact with humans in diverse locations and contexts throughout the country. We analyzed microsatellite DNA from 126 macaques at six sites in Bangladesh in order to characterize geographic patterns of macaque population structure. We also included in this study 38 macaques owned by nomadic people who train them to perform for audiences. PCR was used to analyze a portion of the proviral gag gene from all SFV-positive macaques, and multiple clones were sequenced. Phylogenetic analysis was used to infer long-term patterns of viral transmission. Analyses of SFV gag gene sequences indicated that macaque populations from different areas harbor genetically distinct strains of SFV, suggesting that geographic features such as forest cover play a role in determining the dispersal of macaques and SFV. We also found evidence suggesting that humans traveling the region with performing macaques likely play a role in the translocation of macaques and SFV. Our studies found that individual animals can harbor more than one strain of SFV and that presence of more than one SFV strain is more common among older animals. Some macaques are infected with SFV that appears to be recombinant. These findings paint a more detailed picture of how geographic and sociocultural factors influence the spectrum of simian-borne retroviruses.

18.

Zoonotic simian foamy virus in Bangladesh reflects diverse patterns of transmission and co-infection.

Engel, Gregory A; Small, Christopher T; Soliven, Khanh; Feeroz, Mostafa M; Wang, Xiaoxing; Kamrul Hasan, M; Oh, Gunwha; Rabiul Alam, S M; Craig, Karen L; Jackson, Dana L; Matsen Iv, Frederick A; Linial, Maxine L; Jones-Engel, Lisa.

Emerg Microbes Infect ; 2(9): e58, 2013 Sep.

Article En | MEDLINE | ID: mdl-26038489

Simian foamy viruses (SFVs) are ubiquitous in non-human primates (NHPs). As in all retroviruses, reverse transcription of SFV leads to recombination and mutation. Because more humans have been shown to be infected with SFV than with any other simian borne virus, SFV is a potentially powerful model for studying the virology and epidemiology of viruses at the human/NHP interface. In Asia, SFV is likely transmitted to humans through macaque bites and scratches that occur in the context of everyday life. We analyzed multiple proviral sequences from the SFV gag gene from both humans and macaques in order to characterize retroviral transmission at the human/NHP interface in Bangladesh. Here we report evidence that humans can be concurrently infected with multiple SFV strains, with some individuals infected by both an autochthonous SFV strain as well as a strain similar to SFV found in macaques from another geographic area. These data, combined with previous results, suggest that both human-facilitated movement of macaques leading to the introduction of non-resident strains of SFV and retroviral recombination in macaques contribute to SFV diversity among humans in Bangladesh.