Búsqueda | Portal Regional de la BVS

1.

Electronic health record signatures identify undiagnosed patients with common variable immunodeficiency disease.

Johnson, Ruth; Stephens, Alexis V; Mester, Rachel; Knyazev, Sergey; Kohn, Lisa A; Freund, Malika K; Bondhus, Leroy; Hill, Brian L; Schwarz, Tommer; Zaitlen, Noah; Arboleda, Valerie A; A Bastarache, Lisa; Pasaniuc, Bogdan; Butte, Manish J.

Sci Transl Med ; 16(745): eade4510, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38691621

RESUMEN

Human inborn errors of immunity include rare disorders entailing functional and quantitative antibody deficiencies due to impaired B cells called the common variable immunodeficiency (CVID) phenotype. Patients with CVID face delayed diagnoses and treatments for 5 to 15 years after symptom onset because the disorders are rare (prevalence of ~1/25,000), and there is extensive heterogeneity in CVID phenotypes, ranging from infections to autoimmunity to inflammatory conditions, overlapping with other more common disorders. The prolonged diagnostic odyssey drives excessive system-wide costs before diagnosis. Because there is no single causal mechanism, there are no genetic tests to definitively diagnose CVID. Here, we present PheNet, a machine learning algorithm that identifies patients with CVID from their electronic health records (EHRs). PheNet learns phenotypic patterns from verified CVID cases and uses this knowledge to rank patients by likelihood of having CVID. PheNet could have diagnosed more than half of our patients with CVID 1 or more years earlier than they had been diagnosed. When applied to a large EHR dataset, followed by blinded chart review of the top 100 patients ranked by PheNet, we found that 74% were highly probable to have CVID. We externally validated PheNet using >6 million records from disparate medical systems in California and Tennessee. As artificial intelligence and machine learning make their way into health care, we show that algorithms such as PheNet can offer clinical benefits by expediting the diagnosis of rare diseases.

Asunto(s)

Inmunodeficiencia Variable Común , Registros Electrónicos de Salud , Humanos , Inmunodeficiencia Variable Común/diagnóstico , Aprendizaje Automático , Algoritmos , Masculino , Femenino , Fenotipo , Adulto , Enfermedades no Diagnosticadas/diagnóstico

2.

High HIV diversity, recombination, and superinfection revealed in a large outbreak among persons who inject drugs in Kentucky and Ohio, USA.

Switzer, William M; Shankar, Anupama; Jia, Hongwei; Knyazev, Sergey; Ambrosio, Frank; Kelly, Reagan; Zheng, HaoQiang; Campbell, Ellsworth M; Cintron, Roxana; Pan, Yi; Saduvala, Neeraja; Panneer, Nivedha; Richman, Rhiannon; Singh, Manny B; Thoroughman, Douglas A; Blau, Erin F; Khalil, George M; Lyss, Sheryl; Heneine, Walid.

Virus Evol ; 10(1): veae015, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38510920

RESUMEN

We investigated transmission dynamics of a large human immunodeficiency virus (HIV) outbreak among persons who inject drugs (PWID) in KY and OH during 2017-20 by using detailed phylogenetic, network, recombination, and cluster dating analyses. Using polymerase (pol) sequences from 193 people associated with the investigation, we document high HIV-1 diversity, including Subtype B (44.6 per cent); numerous circulating recombinant forms (CRFs) including CRF02_AG (2.5 per cent) and CRF02_AG-like (21.8 per cent); and many unique recombinant forms composed of CRFs with major subtypes and sub-subtypes [CRF02_AG/B (24.3 per cent), B/CRF02_AG/B (0.5 per cent), and A6/D/B (6.4 per cent)]. Cluster analysis of sequences using a 1.5 per cent genetic distance identified thirteen clusters, including a seventy-five-member cluster composed of CRF02_AG-like and CRF02_AG/B, an eighteen-member CRF02_AG/B cluster, Subtype B clusters of sizes ranging from two to twenty-three, and a nine-member A6/D and A6/D/B cluster. Recombination and phylogenetic analyses identified CRF02_AG/B variants with ten unique breakpoints likely originating from Subtype B and CRF02_AG-like viruses in the largest clusters. The addition of contact tracing results from OH to the genetic networks identified linkage between persons with Subtype B, CRF02_AG, and CRF02_AG/B sequences in the clusters supporting de novo recombinant generation. Superinfection prevalence was 13.3 per cent (8/60) in persons with multiple specimens and included infection with B and CRF02_AG; B and CRF02_AG/B; or B and A6/D/B. In addition to the presence of multiple, distinct molecular clusters associated with this outbreak, cluster dating inferred transmission associated with the largest molecular cluster occurred as early as 2006, with high transmission rates during 2017-8 in certain other molecular clusters. This outbreak among PWID in KY and OH was likely driven by rapid transmission of multiple HIV-1 variants including de novo viral recombinants from circulating viruses within the community. Our findings documenting the high HIV-1 transmission rate and clustering through partner services and molecular clusters emphasize the importance of leveraging multiple different data sources and analyses, including those from disease intervention specialist investigations, to better understand outbreak dynamics and interrupt HIV spread.

3.

Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability.

Sonrel, Anthony; Luetge, Almut; Soneson, Charlotte; Mallona, Izaskun; Germain, Pierre-Luc; Knyazev, Sergey; Gilis, Jeroen; Gerber, Reto; Seurinck, Ruth; Paul, Dominique; Sonder, Emanuel; Crowell, Helena L; Fanaswala, Imran; Al-Ajami, Ahmad; Heidari, Elyas; Schmeing, Stephan; Milosavljevic, Stefan; Saeys, Yvan; Mangul, Serghei; Robinson, Mark D.

Genome Biol ; 24(1): 119, 2023 05 17.

Artículo en Inglés | MEDLINE | ID: mdl-37198712

RESUMEN

Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, and neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for example, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption.

Asunto(s)

Benchmarking , Biología Computacional , Biología Computacional/métodos , Flujo de Trabajo

4.

The UCLA ATLAS Community Health Initiative: Promoting precision health research in a diverse biobank.

Johnson, Ruth; Ding, Yi; Bhattacharya, Arjun; Knyazev, Sergey; Chiu, Alec; Lajonchere, Clara; Geschwind, Daniel H; Pasaniuc, Bogdan.

Cell Genom ; 3(1): 100243, 2023 Jan 11.

Artículo en Inglés | MEDLINE | ID: mdl-36777178

RESUMEN

The UCLA ATLAS Community Health Initiative (ATLAS) has an initial target to recruit 150,000 participants from across the UCLA Health system with the goal of creating a genomic database to accelerate precision medicine efforts in California. This initiative includes a biobank embedded within the UCLA Health system that comprises de-identified genomic data linked to electronic health records (EHRs). The first freeze of data from September 2020 contains 27,987 genotyped samples imputed to 7.9 million SNPs across the genome and is linked with de-identified versions of the EHRs from UCLA Health. Here, we describe a centralized repository of the genotype data and provide tools and pipelines to perform genome- and phenome-wide association studies across a wide range of EHR-derived phenotypes and genetic ancestry groups. We demonstrate the utility of this resource through the analysis of 7 well-studied traits and recapitulate many previous genetic and phenotypic associations.

5.

Author Correction: Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative.

Johnson, Ruth; Ding, Yi; Venkateswaran, Vidhya; Bhattacharya, Arjun; Boulier, Kristin; Chiu, Alec; Knyazev, Sergey; Schwarz, Tommer; Freund, Malika; Zhan, Lingyu; Burch, Kathryn S; Caggiano, Christa; Hill, Brian; Rakocz, Nadav; Balliu, Brunilda; Denny, Christopher T; Sul, Jae Hoon; Zaitlen, Noah; Arboleda, Valerie A; Halperin, Eran; Sankararaman, Sriram; Butte, Manish J; Lajonchere, Clara; Geschwind, Daniel H; Pasaniuc, Bogdan.

Genome Med ; 14(1): 128, 2022 Nov 16.

Artículo en Inglés | MEDLINE | ID: mdl-36384576

6.

Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative.

Johnson, Ruth; Ding, Yi; Venkateswaran, Vidhya; Bhattacharya, Arjun; Boulier, Kristin; Chiu, Alec; Knyazev, Sergey; Schwarz, Tommer; Freund, Malika; Zhan, Lingyu; Burch, Kathryn S; Caggiano, Christa; Hill, Brian; Rakocz, Nadav; Balliu, Brunilda; Denny, Christopher T; Sul, Jae Hoon; Zaitlen, Noah; Arboleda, Valerie A; Halperin, Eran; Sankararaman, Sriram; Butte, Manish J; Lajonchere, Clara; Geschwind, Daniel H; Pasaniuc, Bogdan.

Genome Med ; 14(1): 104, 2022 Sep 09.

Artículo en Inglés | MEDLINE | ID: mdl-36085083

RESUMEN

BACKGROUND: Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative-an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). METHODS: We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. RESULTS: We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals' SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10-16, EAA p-value=6.73×10-11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. CONCLUSIONS: Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.

Asunto(s)

Registros Electrónicos de Salud , Salud Pública , Pueblo Asiatico , Bancos de Muestras Biológicas , Genómica , Humanos

7.

Unlocking capacities of genomics for the COVID-19 response and future pandemics.

Knyazev, Sergey; Chhugani, Karishma; Sarwal, Varuni; Ayyala, Ram; Singh, Harman; Karthikeyan, Smruthi; Deshpande, Dhrithi; Baykal, Pelin Icer; Comarova, Zoia; Lu, Angela; Porozov, Yuri; Vasylyeva, Tetyana I; Wertheim, Joel O; Tierney, Braden T; Chiu, Charles Y; Sun, Ren; Wu, Aiping; Abedalthagafi, Malak S; Pak, Victoria M; Nagaraj, Shivashankar H; Smith, Adam L; Skums, Pavel; Pasaniuc, Bogdan; Komissarov, Andrey; Mason, Christopher E; Bortz, Eric; Lemey, Philippe; Kondrashov, Fyodor; Beerenwinkel, Niko; Lam, Tommy Tsan-Yuk; Wu, Nicholas C; Zelikovsky, Alex; Knight, Rob; Crandall, Keith A; Mangul, Serghei.

Nat Methods ; 19(4): 374-380, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35396471

Asunto(s)

COVID-19 , Pandemias , Genómica , Humanos , SARS-CoV-2/genética

8.

Creation and Use of Highly Adaptive Productive and Technological Red Currant Genotypes to Improve the Assortment and Introduction into Different Ecological and Geographical Zones.

Panfilova, Olga; Kahramanoglu, Ibrahim; Ondrasek, Gabrijel; Okatan, Volkan; Ryago, Nelly; Tsoy, Mikhail; Golyaeva, Olga; Knyazev, Sergey.

Plants (Basel) ; 11(6)2022 Mar 17.

Artículo en Inglés | MEDLINE | ID: mdl-35336684

RESUMEN

Global climate change with the cyclicity of natural and climatic processes in the growing season of berry plants, causes weakening at the defense system to (a)biotic stressors, which actualize the need for accelerated cultivar-improving breeding. A new hybrid red currant material was obtained and studied by the method of interspecific hybridization. Correlation analysis was used to assess the relationship between adaptively significant and economical and biological traits. To assess intergenotypic variability, hierarchical clustering was used according to the studied features, which allowed combining three standard methods of multidimensional data analysis. Genotypes adapted to different stressors were identified. The genotypes 271-58-24, 44-5-2, 261-65-19, and 'Jonkheer van Tets' were found to have a higher ratio of bound water to free water as compared with the others. Moreover, the genotypes of 271-58-24, 261-65-19, 77-1-47, and 'Jonkheer van Tets' were found to have less cold damage during the cold periods. The two most productive genotypes were found to be the genotypes 44-5-2, 143-23-35, and 1426-21-80. A dependence of yield on the beginning of differentiation of flower buds, which led to the abundance of flower inflorescences, was revealed. Rapid restoration of leaf hydration ensured successful adaptation of genotypes to the "temperature shock" of the growing season. The genotypes 271-58-24 and 'Jonkheer van Tets' were then observed to be far from the test traits and none of these traits were observed to characterize these two genotypes. The genotypes of 261-65-19 and 77-1-47 were then observed to be characterized by their high stability to Cecidophyopsis ribis scores. Genotypes 261-65-19 and 271-58-24, obtained with the participation of 'Jonkheer van Tets' as the maternal form, showed sufficient resistance to Pseudopeziza ribis and Cecidophyopsis ribis. Overall results suggested that the hydration recovery of red currant plants is significantly important for a yield improvement. A new cultivar 'Podarok Pobediteliam (genotype 44-5-2) was obtained that meets the requirements of intensive gardening and is characterized by high adaptability, productivity, and technological effectiveness.

9.

Correction for Rando et al., "Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure".

Rando, Halie M; MacLean, Adam L; Lee, Alexandra J; Lordan, Ronan; Ray, Sandipan; Bansal, Vikas; Skelly, Ashwin N; Sell, Elizabeth; Dziak, John J; Shinholster, Lamonica; D'Agostino McGowan, Lucy; Ben Guebila, Marouen; Wellhausen, Nils; Knyazev, Sergey; Boca, Simina M; Capone, Stephen; Qi, Yanjun; Park, YoSon; Mai, David; Sun, Yuchen; Boerckel, Joel D; Brueffer, Christian; Byrd, James Brian; Kamil, Jeremy P; Wang, Jinhui; Velazquez, Ryan; Szeto, Gregory L; Barton, John P; Goel, Rishi Raj; Mangul, Serghei; Lubiana, Tiago; Gitter, Anthony; Greene, Casey S.

mSystems ; 7(1): e0144721, 2022 Feb 22.

Artículo en Inglés | MEDLINE | ID: mdl-35076276

10.

From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering.

Melnyk, Andrew; Mohebbi, Fatemeh; Knyazev, Sergey; Sahoo, Bikram; Hosseini, Roya; Skums, Pavel; Zelikovsky, Alex; Patterson, Murray.

J Comput Biol ; 28(11): 1113-1129, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34698508

RESUMEN

The availability of millions of SARS-CoV-2 (Severe Acute Respiratory Syndrome-Coronavirus-2) sequences in public databases such as GISAID (Global Initiative on Sharing All Influenza Data) and EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute) (the United Kingdom) allows a detailed study of the evolution, genomic diversity, and dynamics of a virus such as never before. Here, we identify novel variants and subtypes of SARS-CoV-2 by clustering sequences in adapting methods originally designed for haplotyping intrahost viral populations. We asses our results using clustering entropy-the first time it has been used in this context. Our clustering approach reaches lower entropies compared with other methods, and we are able to boost this even further through gap filling and Monte Carlo-based entropy minimization. Moreover, our method clearly identifies the well-known Alpha variant in the U.K. and GISAID data sets, and is also able to detect the much less represented (<1% of the sequences) Beta (South Africa), Epsilon (California), and Gamma and Zeta (Brazil) variants in the GISAID data set. Finally, we show that each variant identified has high selective fitness, based on the growth rate of its cluster over time. This demonstrates that our clustering approach is a viable alternative for detecting even rare subtypes in very large data sets.

Asunto(s)

Análisis por Conglomerados , Biología Computacional/métodos , Brasil , Bases de Datos Genéticas , Entropía , Humanos , Método de Montecarlo , Sudáfrica , Reino Unido , Estados Unidos

11.

Scalable Reconstruction of SARS-CoV-2 Phylogeny with Recurrent Mutations.

Novikov, Daniel; Knyazev, Sergey; Grinshpon, Mark; Icer, Pelin; Skums, Pavel; Zelikovsky, Alex.

J Comput Biol ; 28(11): 1130-1141, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34698524

RESUMEN

This article presents a novel scalable character-based phylogeny algorithm for dense viral sequencing data called SPHERE (Scalable PHylogEny with REcurrent mutations). The algorithm is based on an evolutionary model where recurrent mutations are allowed, but backward mutations are prohibited. The algorithm creates rooted character-based phylogeny trees, wherein all leaves and internal nodes are labeled by observed taxa. We show that SPHERE phylogeny is more stable than Nextstrain's, and that it accurately infers known transmission links from the early pandemic. SPHERE is a fast algorithm that can process >200,000 sequences in <2 hours, which offers a compact phylogenetic visualization of Global Initiative on Sharing All Influenza Data (GISAID).

Asunto(s)

Mutación , Filogenia , SARS-CoV-2/genética , Algoritmos , COVID-19/transmisión , COVID-19/virología , Bases de Datos Genéticas , Humanos

12.

Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure.

Rando, Halie M; MacLean, Adam L; Lee, Alexandra J; Lordan, Ronan; Ray, Sandipan; Bansal, Vikas; Skelly, Ashwin N; Sell, Elizabeth; Dziak, John J; Shinholster, Lamonica; D'Agostino McGowan, Lucy; Ben Guebila, Marouen; Wellhausen, Nils; Knyazev, Sergey; Boca, Simina M; Capone, Stephen; Qi, Yanjun; Park, YoSon; Mai, David; Sun, Yuchen; Boerckel, Joel D; Brueffer, Christian; Byrd, James Brian; Kamil, Jeremy P; Wang, Jinhui; Velazquez, Ryan; Szeto, Gregory L; Barton, John P; Goel, Rishi Raj; Mangul, Serghei; Lubiana, Tiago; Gitter, Anthony; Greene, Casey S.

mSystems ; 6(5): e0009521, 2021 10 26.

Artículo en Inglés | MEDLINE | ID: mdl-34698547

RESUMEN

The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease. IMPORTANCE COVID-19 involves a number of organ systems and can present with a wide range of symptoms. From how the virus infects cells to how it spreads between people, the available research suggests that these patterns are very similar to those seen in the closely related viruses SARS-CoV-1 and possibly Middle East respiratory syndrome-related CoV (MERS-CoV). Understanding the pathogenesis of the SARS-CoV-2 virus also contextualizes how the different biological systems affected by COVID-19 connect. Exploring the structure, phylogeny, and pathogenesis of the virus therefore helps to guide interpretation of the broader impacts of the virus on the human body and on human populations. For this reason, an in-depth exploration of viral mechanisms is critical to a robust understanding of SARS-CoV-2 and, potentially, future emergent human CoVs (HCoVs).

13.

MicrobeTrace: Retooling molecular epidemiology for rapid public health response.

Campbell, Ellsworth M; Boyles, Anthony; Shankar, Anupama; Kim, Jay; Knyazev, Sergey; Cintron, Roxana; Switzer, William M.

PLoS Comput Biol ; 17(9): e1009300, 2021 09.

Artículo en Inglés | MEDLINE | ID: mdl-34492010

RESUMEN

Outbreak investigations use data from interviews, healthcare providers, laboratories and surveillance systems. However, integrated use of data from multiple sources requires a patchwork of software that present challenges in usability, interoperability, confidentiality, and cost. Rapid integration, visualization and analysis of data from multiple sources can guide effective public health interventions. We developed MicrobeTrace to facilitate rapid public health responses by overcoming barriers to data integration and exploration in molecular epidemiology. MicrobeTrace is a web-based, client-side, JavaScript application (https://microbetrace.cdc.gov) that runs in Chromium-based browsers and remains fully operational without an internet connection. Using publicly available data, we demonstrate the analysis of viral genetic distance networks and introduce a novel approach to minimum spanning trees that simplifies results. We also illustrate the potential utility of MicrobeTrace in support of contact tracing by analyzing and displaying data from an outbreak of SARS-CoV-2 in South Korea in early 2020. MicrobeTrace is developed and actively maintained by the Centers for Disease Control and Prevention. Users can email microbetrace@cdc.gov for support. The source code is available at https://github.com/cdcgov/microbetrace.

Asunto(s)

Enfermedades Transmisibles/epidemiología , Visualización de Datos , Epidemiología Molecular/métodos , Salud Pública/métodos , Programas Informáticos , Centers for Disease Control and Prevention, U.S. , Brotes de Enfermedades , Humanos , Estados Unidos

14.

Technology dictates algorithms: recent developments in read alignment.

Alser, Mohammed; Rotman, Jeremy; Deshpande, Dhrithi; Taraszka, Kodi; Shi, Huwenbo; Baykal, Pelin Icer; Yang, Harry Taegyun; Xue, Victor; Knyazev, Sergey; Singer, Benjamin D; Balliu, Brunilda; Koslicki, David; Skums, Pavel; Zelikovsky, Alex; Alkan, Can; Mutlu, Onur; Mangul, Serghei.

Genome Biol ; 22(1): 249, 2021 08 26.

Artículo en Inglés | MEDLINE | ID: mdl-34446078

RESUMEN

Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Alineación de Secuencia , Genoma Humano , VIH/fisiología , Humanos , Metagenómica , Sulfitos

15.

Pipeline for Analyzing Activity of Metabolic Pathways in Planktonic Communities Using Metatranscriptomic Data.

Rondel, Filipp Martin; Hosseini, Roya; Sahoo, Bikram; Knyazev, Sergey; Mandric, Igor; Stewart, Frank; Mandoiu, Ion I; Pasaniuc, Bogdan; Porozov, Yuri; Zelikovsky, Alexander.

J Comput Biol ; 28(8): 842-855, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-34264744

RESUMEN

In this article, we present our novel pipeline for analysis of metabolic activity using a microbial community's metatranscriptome sequence data set for validation. Our method is based on expectation-maximization (EM) algorithm and provides enzyme expression and pathway activity levels. Further expanding our analysis, we consider individual enzymatic activity and compute enzyme participation coefficients to approximate the metabolic pathway activity more accurately. We apply our EM pathways pipeline to a metatranscriptomic data set of a plankton community from surface waters of the Northern Gulf of Mexico. The data set consists of RNA-seq data and respective environmental parameters, which were sampled at two depths, six times a day over multiple 24-hour cycles. Furthermore, we discuss microbial dependence on day-night cycle within our findings based on a three-way correlation of the enzyme expression during antipodal times-midnight and noon. We show that the enzyme participation levels strongly affect the metabolic activity estimates: that is, marginal and multiple linear regression of enzymatic and metabolic pathway activity correlated significantly with the recorded environmental parameters. Our analysis statistically validates that EM-based methods produce meaningful results, as our method confirms statistically significant dependence of metabolic pathway activity on the environmental parameters, such as salinity, temperature, brightness, and a few others.

Asunto(s)

Bacterias/genética , Perfilación de la Expresión Génica/métodos , Redes y Vías Metabólicas , Plancton/microbiología , Algoritmos , Golfo de México , Modelos Lineales , Metagenómica , Análisis de Secuencia de ARN

16.

Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction.

Knyazev, Sergey; Tsyvina, Viachaslau; Shankar, Anupama; Melnyk, Andrew; Artyomenko, Alexander; Malygina, Tatiana; Porozov, Yuri B; Campbell, Ellsworth M; Switzer, William M; Skums, Pavel; Mangul, Serghei; Zelikovsky, Alex.

Nucleic Acids Res ; 49(17): e102, 2021 09 27.

Artículo en Inglés | MEDLINE | ID: mdl-34214168

RESUMEN

Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient's treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Infecciones por Virus ARN/diagnóstico , Virus ARN/genética , COVID-19/diagnóstico , COVID-19/virología , Frecuencia de los Genes , Infecciones por VIH/diagnóstico , Infecciones por VIH/virología , VIH-1/genética , Humanos , Mutación , Polimorfismo de Nucleótido Simple , Infecciones por Virus ARN/virología , Reproducibilidad de los Resultados , SARS-CoV-2/genética , Sensibilidad y Especificidad

17.

Unlocking capacities of viral genomics for the COVID-19 pandemic response.

Knyazev, Sergey; Chhugani, Karishma; Sarwal, Varuni; Ayyala, Ram; Singh, Harman; Karthikeyan, Smruthi; Deshpande, Dhrithi; Comarova, Zoia; Lu, Angela; Porozov, Yuri; Wu, Aiping; Abedalthagafi, Malak S; Nagaraj, Shivashankar H; Smith, Adam L; Skums, Pavel; Ladner, Jason; Lam, Tommy Tsan-Yuk; Wu, Nicholas C; Zelikovsky, Alex; Knight, Rob; Crandall, Keith A; Mangul, Serghei.

ArXiv ; 2021 Apr 28.

Artículo en Inglés | MEDLINE | ID: mdl-33948451

RESUMEN

More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encoded in these vast amounts of data requires substantial effort across the research and public health communities. Studies of SARS-CoV-2 genomes have been critical in tracking the spread of variants and understanding its epidemic dynamics, and may prove crucial for controlling future epidemics and alleviating significant public health burdens. Together, genomic data and bioinformatics methods enable broad-scale investigations of the spread of SARS-CoV-2 at the local, national, and global scales and allow researchers the ability to efficiently track the emergence of novel variants, reconstruct epidemic dynamics, and provide important insights into drug and vaccine development and disease control. Here, we discuss the tremendous opportunities that genomics offers to unlock the effective use of SARS-CoV-2 genomic data for efficient public health surveillance and guiding timely responses to COVID-19.

18.

Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure.

Rando, Halie M; MacLean, Adam L; Lee, Alexandra J; Lordan, Ronan; Ray, Sandipan; Bansal, Vikas; Skelly, Ashwin N; Sell, Elizabeth; Dziak, John J; Shinholster, Lamonica; McGowan, Lucy D'Agostino; Guebila, Marouen Ben; Wellhausen, Nils; Knyazev, Sergey; Boca, Simina M; Capone, Stephen; Qi, Yanjun; Park, YoSon; Sun, Yuchen; Mai, David; Boerckel, Joel D; Brueffer, Christian; Byrd, James Brian; Kamil, Jeremy P; Wang, Jinhui; Velazquez, Ryan; Szeto, Gregory L; Barton, John P; Goel, Rishi Raj; Mangul, Serghei; Lubiana, Tiago; Gitter, Anthony; Greene, Casey S.

ArXiv ; 2021 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-33594340

RESUMEN

The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease.

19.

Molecular Epidemiological Analysis of the Origin and Transmission Dynamics of the HIV-1 CRF01_AE Sub-Epidemic in Bulgaria.

Alexiev, Ivailo; Campbell, Ellsworth M; Knyazev, Sergey; Pan, Yi; Grigorova, Lyubomira; Dimitrova, Reneta; Partsuneva, Aleksandra; Gancheva, Anna; Kostadinova, Asya; Seguin-Devaux, Carole; Elenkov, Ivaylo; Yancheva, Nina; Switzer, William M.

Viruses ; 13(1)2021 Jan 16.

Artículo en Inglés | MEDLINE | ID: mdl-33467166

RESUMEN

HIV-1 subtype CRF01_AE is the second most predominant strain in Bulgaria, yet little is known about the molecular epidemiology of its origin and transmissibility. We used a phylodynamics approach to better understand this sub-epidemic by analyzing 270 HIV-1 polymerase (pol) sequences collected from persons diagnosed with HIV/AIDS between 1995 and 2019. Using network analyses at a 1.5% genetic distance threshold (d), we found a large 154-member outbreak cluster composed mostly of persons who inject drugs (PWID) that were predominantly men. At d = 0.5%, which was used to identify more recent transmission, the large cluster dissociated into three clusters of 18, 12, and 7 members, respectively, five dyads, and 107 singletons. Phylogenetic analysis of the Bulgarian sequences with publicly available global sequences showed that CRF01_AE likely originated from multiple Asian countries, with Vietnam as the likely source of the outbreak cluster between 1988 and 1990. Our findings indicate that CRF01_AE was introduced into Bulgaria multiple times since 1988, and infections then rapidly spread among PWID locally with bridging to other risk groups and countries. CRF01_AE continues to spread in Bulgaria as evidenced by the more recent large clusters identified at d = 0.5%, highlighting the importance of public health prevention efforts in the PWID communities.

Asunto(s)

Genotipo , Infecciones por VIH/epidemiología , Infecciones por VIH/transmisión , Infecciones por VIH/virología , VIH-1/clasificación , VIH-1/genética , Adolescente , Adulto , Anciano , Bulgaria/epidemiología , Femenino , Variación Genética , Infecciones por VIH/prevención & control , VIH-1/efectos de los fármacos , Humanos , Masculino , Persona de Mediana Edad , Epidemiología Molecular , Filogenia , Filogeografía , Vigilancia en Salud Pública , Virus Reordenados , Recombinación Genética , Adulto Joven

20.

Epidemiological data analysis of viral quasispecies in the next-generation sequencing era.

Knyazev, Sergey; Hughes, Lauren; Skums, Pavel; Zelikovsky, Alexander.

Brief Bioinform ; 22(1): 96-108, 2021 01 18.

Artículo en Inglés | MEDLINE | ID: mdl-32568371

RESUMEN

The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.

Asunto(s)

Monitoreo Epidemiológico , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Infecciones por Virus ARN/virología , Virus ARN/genética , Humanos , Infecciones por Virus ARN/epidemiología , Virus ARN/clasificación , Virus ARN/aislamiento & purificación , Virus ARN/patogenicidad

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA