Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39012512

RESUMO

MOTIVATION: Genomic distance estimation is a critical workload since exact computation for whole-genome similarity metrics such as Average Nucleotide Identity (ANI) incurs prohibitive runtime overhead. Genome sketching is a fast and memory-efficient solution to estimate ANI similarity by distilling representative k-mers from the original sequences. In this work, we present HyperGen that improves accuracy, runtime performance, and memory efficiency for large-scale ANI estimation. Unlike existing genome sketching algorithms that convert large genome files into discrete k-mer hashes, HyperGen leverages the emerging hyperdimensional computing (HDC) to encode genomes into quasi-orthogonal vectors (Hypervector, HV) in high-dimensional space. HV is compact and can preserve more information, allowing for accurate ANI estimation while reducing required sketch sizes. In particular, the HV sketch representation in HyperGen allows efficient ANI estimation using vector multiplication, which naturally benefits from highly optimized general matrix multiply (GEMM) routines. As a result, HyperGen enables the efficient sketching and ANI estimation for massive genome collections. RESULTS: We evaluate HyperGen 's sketching and database search performance using several genome datasets at various scales. HyperGen is able to achieve comparable or superior ANI estimation error and linearity compared to other sketch-based counterparts. The measurement results show that HyperGen is one of the fastest tools for both genome sketching and database search. Meanwhile, HyperGen produces memory-efficient sketch files while ensuring high ANI estimation accuracy. AVAILABILITY: A Rust implementation of HyperGen is freely available under the MIT license as an open-source software project at https://github.com/wh-xu/Hyper-Gen. The scripts to reproduce the experimental results can be accessed at https://github.com/wh-xu/experiment-hyper-gen.

2.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38200583

RESUMO

MOTIVATION: The genomic surveillance of viral pathogens such as SARS-CoV-2 and HIV-1 has been critical to modern epidemiology and public health, but the use of sequence analysis pipelines requires computational expertise, and web-based platforms require sending potentially sensitive raw sequence data to remote servers. RESULTS: We introduce ViralWasm, a user-friendly graphical web application suite for viral genomics. All ViralWasm tools utilize WebAssembly to execute the original command line tools client-side directly in the web browser without any user setup, with a cost of just 2-3x slowdown with respect to their command line counterparts. AVAILABILITY AND IMPLEMENTATION: The ViralWasm tool suite can be accessed at: https://niema-lab.github.io/ViralWasm.


Assuntos
Genômica , Software , Humanos , Genômica/métodos , Navegador , Genoma Viral
3.
bioRxiv ; 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-37873270

RESUMO

Coronaviruses exhibit many mechanisms of genetic innovation1-5, including the acquisition of accessory genes that originate by capture of cellular genes or through duplication of existing viral genes6,7. Accessory genes influence viral host range and cellular tropism, but little is known about how selection acts on these variable regions of virus genomes. We used experimental evolution of mouse hepatitis virus (MHV) encoding a cellular AKAP7 phosphodiesterase and an inactive native phosphodiesterase, NS2 (ref 8) to simulate the capture of a host gene and analyze its evolution. After courses of serial infection, the gene encoding inactive NS2, ORF2, unexpectedly remained intact, suggesting it is under cryptic constraint uncoupled from the function of NS2. In contrast, AKAP7 was retained under strong selection but rapidly lost under relaxed selection. Guided by the retention of ORF2 and similar patterns in related betacoronaviruses, we analyzed ORF8 of SARS-CoV-2, which arose via gene duplication6 and contains premature stop codons in several globally successful lineages. As with MHV ORF2, the coding-defective SARS-CoV-2 ORF8 gene remains largely intact, mirroring patterns observed during MHV experimental evolution, challenging assumptions on the dynamics of gene loss in virus genomes and extending these findings to viruses currently adapting to humans.

4.
AIDS ; 38(2): 245-254, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-37890471

RESUMO

OBJECTIVES: This study investigates primary peer-referral engagement (PRE) strategies to assess which strategy results in engaging higher numbers of people with HIV (PWH) who are virally unsuppressed. DESIGN: We develop a modeling study that simulates an HIV epidemic (transmission, disease progression, and viral evolution) over 6 years using an agent-based model followed by simulating PRE strategies. We investigate two PRE strategies where referrals are based on social network strategies (SNS) or sexual partner contact tracing (SPCT). METHODS: We parameterize, calibrate, and validate our study using data from Chicago on Black sexual minority men to assess these strategies for a population with high incidence and prevalence of HIV. For each strategy, we calculate the number of PWH recruited who are undiagnosed or out-of-care (OoC) and the number of direct or indirect transmissions. RESULTS: SNS and SPCT identified 256.5 [95% confidence interval (CI) 234-279] and 15 (95% CI 7-27) PWH, respectively. Of these, SNS identified 159 (95% CI 142-177) PWH OoC and 32 (95% CI 21-43) PWH undiagnosed compared with 9 (95% CI 3-18) and 2 (95% CI 0-5) for SPCT. SNS identified 15.5 (95% CI 6-25) and 7.5 (95% CI 2-11) indirect and direct transmission pairs, whereas SPCT identified 6 (95% CI 0-8) and 5 (95% CI 0-8), respectively. CONCLUSION: With no testing constraints, SNS is the more effective strategy to identify undiagnosed and OoC PWH. Neither strategy is successful at identifying sufficient indirect or direct transmission pairs to investigate transmission networks.


Assuntos
Infecções por HIV , Minorias Sexuais e de Gênero , Masculino , Humanos , Infecções por HIV/epidemiologia , Parceiros Sexuais , Rede Social , Busca de Comunicante
5.
Cell ; 186(26): 5690-5704.e20, 2023 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-38101407

RESUMO

The maturation of genomic surveillance in the past decade has enabled tracking of the emergence and spread of epidemics at an unprecedented level. During the COVID-19 pandemic, for example, genomic data revealed that local epidemics varied considerably in the frequency of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) lineage importation and persistence, likely due to a combination of COVID-19 restrictions and changing connectivity. Here, we show that local COVID-19 epidemics are driven by regional transmission, including across international boundaries, but can become increasingly connected to distant locations following the relaxation of public health interventions. By integrating genomic, mobility, and epidemiological data, we find abundant transmission occurring between both adjacent and distant locations, supported by dynamic mobility patterns. We find that changing connectivity significantly influences local COVID-19 incidence. Our findings demonstrate a complex meaning of "local" when investigating connected epidemics and emphasize the importance of collaborative interventions for pandemic prevention and mitigation.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , COVID-19/transmissão , COVID-19/virologia , Genômica , Pandemias/prevenção & controle , Saúde Pública , SARS-CoV-2/genética , Controle de Infecções , Geografia
6.
bioRxiv ; 2023 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-37745602

RESUMO

Zoonotic spillovers of viruses have occurred through the animal trade worldwide. The start of the COVID-19 pandemic was traced epidemiologically to the Huanan Wholesale Seafood Market, the site with the most reported wildlife vendors in the city of Wuhan, China. Here, we analyze publicly available qPCR and sequencing data from environmental samples collected in the Huanan market in early 2020. We demonstrate that the SARS-CoV-2 genetic diversity linked to this market is consistent with market emergence, and find increased SARS-CoV-2 positivity near and within a particular wildlife stall. We identify wildlife DNA in all SARS-CoV-2 positive samples from this stall. This includes species such as civets, bamboo rats, porcupines, hedgehogs, and one species, raccoon dogs, known to be capable of SARS-CoV-2 transmission. We also detect other animal viruses that infect raccoon dogs, civets, and bamboo rats. Combining metagenomic and phylogenetic approaches, we recover genotypes of market animals and compare them to those from other markets. This analysis provides the genetic basis for a short list of potential intermediate hosts of SARS-CoV-2 to prioritize for retrospective serological testing and viral sampling.

7.
Bioinformatics ; 39(7)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37369033

RESUMO

MOTIVATION: Driven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets. RESULTS: We propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA's tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools. AVAILABILITY AND IMPLEMENTATION: HOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc.


Assuntos
Software , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Proteínas , Peptídeos/química , Ferramenta de Busca , Algoritmos , Biblioteca de Peptídeos
8.
J Proteome Res ; 22(6): 1639-1648, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37166120

RESUMO

As current shotgun proteomics experiments can produce gigabytes of mass spectrometry data per hour, processing these massive data volumes has become progressively more challenging. Spectral clustering is an effective approach to speed up downstream data processing by merging highly similar spectra to minimize data redundancy. However, because state-of-the-art spectral clustering tools fail to achieve optimal runtimes, this simply moves the processing bottleneck. In this work, we present a fast spectral clustering tool, HyperSpec, based on hyperdimensional computing (HDC). HDC shows promising clustering capability while only requiring lightweight binary operations with high parallelism that can be optimized using low-level hardware architectures, making it possible to run HyperSpec on graphics processing units to achieve extremely efficient spectral clustering performance. Additionally, HyperSpec includes optimized data preprocessing modules to reduce the spectrum preprocessing time, which is a critical bottleneck during spectral clustering. Based on experiments using various mass spectrometry data sets, HyperSpec produces results with comparable clustering quality as state-of-the-art spectral clustering tools while achieving speedups by orders of magnitude, shortening the clustering runtime of over 21 million spectra from 4 h to only 24 min.


Assuntos
Algoritmos , Peptídeos , Peptídeos/análise , Espectrometria de Massas/métodos , Proteômica/métodos , Análise por Conglomerados
9.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37171896

RESUMO

MOTIVATION: In viral molecular epidemiology, reconstruction of consensus genomes from sequence data is critical for tracking mutations and variants of concern. However, as the number of samples that are sequenced grows rapidly, compute resources needed to reconstruct consensus genomes can become prohibitively large. RESULTS: ViralConsensus is a fast and memory-efficient tool for calling viral consensus genome sequences directly from read alignment data. ViralConsensus is orders of magnitude faster and more memory-efficient than existing methods. Further, unlike existing methods, ViralConsensus can pipe data directly from a read mapper via standard input and performs viral consensus calling on-the-fly, making it an ideal tool for viral sequencing pipelines. AVAILABILITY AND IMPLEMENTATION: ViralConsensus is freely available at https://github.com/niemasd/ViralConsensus as an open-source software project.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Sequência de DNA/métodos , Consenso , Genoma Viral , Algoritmos
10.
Lancet Reg Health Am ; 19: 100449, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36844610

RESUMO

Background: Schools are high-risk settings for SARS-CoV-2 transmission, but necessary for children's educational and social-emotional wellbeing. Previous research suggests that wastewater monitoring can detect SARS-CoV-2 infections in controlled residential settings with high levels of accuracy. However, its effective accuracy, cost, and feasibility in non-residential community settings is unknown. Methods: The objective of this study was to determine the effectiveness and accuracy of community-based passive wastewater and surface (environmental) surveillance to detect SARS-CoV-2 infection in neighborhood schools compared to weekly diagnostic (PCR) testing. We implemented an environmental surveillance system in nine elementary schools with 1700 regularly present staff and students in southern California. The system was validated from November 2020 to March 2021. Findings: In 447 data collection days across the nine sites 89 individuals tested positive for COVID-19, and SARS-CoV-2 was detected in 374 surface samples and 133 wastewater samples. Ninety-three percent of identified cases were associated with an environmental sample (95% CI: 88%-98%); 67% were associated with a positive wastewater sample (95% CI: 57%-77%), and 40% were associated with a positive surface sample (95% CI: 29%-52%). The techniques we utilized allowed for near-complete genomic sequencing of wastewater and surface samples. Interpretation: Passive environmental surveillance can detect the presence of COVID-19 cases in non-residential community school settings with a high degree of accuracy. Funding: County of San Diego, Health and Human Services Agency, National Institutes of Health, National Science Foundation, Centers for Disease Control.

11.
medRxiv ; 2023 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-34704096

RESUMO

Background: Schools are high-risk settings for SARS-CoV-2 transmission, but necessary for children's educational and social-emotional wellbeing. Previous research suggests that wastewater monitoring can detect SARS-CoV-2 infections in controlled residential settings with high levels of accuracy. However, its effective accuracy, cost, and feasibility in non-residential community settings is unknown. Methods: The objective of this study was to determine the effectiveness and accuracy of community-based passive wastewater and surface (environmental) surveillance to detect SARS-CoV-2 infection in neighborhood schools compared to weekly diagnostic (PCR) testing. We implemented an environmental surveillance system in nine elementary schools with 1700 regularly present staff and students in southern California. The system was validated from November 2020 - March 2021. Findings: In 447 data collection days across the nine sites 89 individuals tested positive for COVID-19, and SARS-CoV-2 was detected in 374 surface samples and 133 wastewater samples. Ninety-three percent of identified cases were associated with an environmental sample (95% CI: 88% - 98%); 67% were associated with a positive wastewater sample (95% CI: 57% - 77%), and 40% were associated with a positive surface sample (95% CI: 29% - 52%). The techniques we utilized allowed for near-complete genomic sequencing of wastewater and surface samples. Interpretation: Passive environmental surveillance can detect the presence of COVID-19 cases in non-residential community school settings with a high degree of accuracy. Funding: County of San Diego, Health and Human Services Agency, National Institutes of Health, National Science Foundation, Centers for Disease Control.

12.
Science ; 377(6609): 960-966, 2022 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-35881005

RESUMO

Understanding the circumstances that lead to pandemics is important for their prevention. We analyzed the genomic diversity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) early in the coronavirus disease 2019 (COVID-19) pandemic. We show that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted "A" and "B." Phylodynamic rooting methods, coupled with epidemic simulations, reveal that these lineages were the result of at least two separate cross-species transmission events into humans. The first zoonotic transmission likely involved lineage B viruses around 18 November 2019 (23 October to 8 December), and the separate introduction of lineage A likely occurred within weeks of this event. These findings indicate that it is unlikely that SARS-CoV-2 circulated widely in humans before November 2019 and define the narrow window between when SARS-CoV-2 first jumped into humans and when the first cases of COVID-19 were reported. As with other coronaviruses, SARS-CoV-2 emergence likely resulted from multiple zoonotic events.


Assuntos
COVID-19 , Pandemias , SARS-CoV-2 , Zoonoses Virais , Animais , COVID-19/epidemiologia , COVID-19/transmissão , COVID-19/virologia , Simulação por Computador , Variação Genética , Genômica/métodos , Humanos , Epidemiologia Molecular , Filogenia , SARS-CoV-2/classificação , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Zoonoses Virais/epidemiologia , Zoonoses Virais/virologia
13.
Nature ; 609(7925): 101-108, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35798029

RESUMO

As SARS-CoV-2 continues to spread and evolve, detecting emerging variants early is critical for public health interventions. Inferring lineage prevalence by clinical testing is infeasible at scale, especially in areas with limited resources, participation, or testing and/or sequencing capacity, which can also introduce biases1-3. SARS-CoV-2 RNA concentration in wastewater successfully tracks regional infection dynamics and provides less biased abundance estimates than clinical testing4,5. Tracking virus genomic sequences in wastewater would improve community prevalence estimates and detect emerging variants. However, two factors limit wastewater-based genomic surveillance: low-quality sequence data and inability to estimate relative lineage abundance in mixed samples. Here we resolve these critical issues to perform a high-resolution, 295-day wastewater and clinical sequencing effort, in the controlled environment of a large university campus and the broader context of the surrounding county. We developed and deployed improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. We detected emerging variants of concern up to 14 days earlier in wastewater samples, and identified multiple instances of virus spread not captured by clinical genomic surveillance. Our study provides a scalable solution for wastewater genomic surveillance that allows early detection of SARS-CoV-2 variants and identification of cryptic transmission.


Assuntos
COVID-19 , SARS-CoV-2 , Vigilância Epidemiológica Baseada em Águas Residuárias , Águas Residuárias , COVID-19/epidemiologia , COVID-19/transmissão , COVID-19/virologia , Humanos , RNA Viral/análise , RNA Viral/genética , SARS-CoV-2/classificação , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Análise de Sequência de RNA , Águas Residuárias/virologia
14.
J Chem Theory Comput ; 18(7): 4047-4069, 2022 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-35710099

RESUMO

Atomistic Molecular Dynamics (MD) simulations provide researchers the ability to model biomolecular structures such as proteins and their interactions with drug-like small molecules with greater spatiotemporal resolution than is otherwise possible using experimental methods. MD simulations are notoriously expensive computational endeavors that have traditionally required massive investment in specialized hardware to access biologically relevant spatiotemporal scales. Our goal is to summarize the fundamental algorithms that are employed in the literature to then highlight the challenges that have affected accelerator implementations in practice. We consider three broad categories of accelerators: Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and Application Specific Integrated Circuits (ASICs). These categories are comparatively studied to facilitate discussion of their relative trade-offs and to gain context for the current state of the art. We conclude by providing insights into the potential of emerging hardware platforms and algorithms for MD.


Assuntos
Algoritmos , Simulação de Dinâmica Molecular , Computadores
15.
Viruses ; 14(4)2022 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-35458504

RESUMO

The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime.


Assuntos
Filogenia , Humanos , Epidemiologia Molecular , Alinhamento de Sequência , Fluxo de Trabalho
16.
medRxiv ; 2022 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-35411350

RESUMO

As SARS-CoV-2 continues to spread and evolve, detecting emerging variants early is critical for public health interventions. Inferring lineage prevalence by clinical testing is infeasible at scale, especially in areas with limited resources, participation, or testing/sequencing capacity, which can also introduce biases. SARS-CoV-2 RNA concentration in wastewater successfully tracks regional infection dynamics and provides less biased abundance estimates than clinical testing. Tracking virus genomic sequences in wastewater would improve community prevalence estimates and detect emerging variants. However, two factors limit wastewater-based genomic surveillance: low-quality sequence data and inability to estimate relative lineage abundance in mixed samples. Here, we resolve these critical issues to perform a high-resolution, 295-day wastewater and clinical sequencing effort, in the controlled environment of a large university campus and the broader context of the surrounding county. We develop and deploy improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. We detect emerging variants of concern up to 14 days earlier in wastewater samples, and identify multiple instances of virus spread not captured by clinical genomic surveillance. Our study provides a scalable solution for wastewater genomic surveillance that allows early detection of SARS-CoV-2 variants and identification of cryptic transmission.

17.
Sci Rep ; 12(1): 5077, 2022 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-35332213

RESUMO

Throughout the COVID-19 pandemic, massive sequencing and data sharing efforts enabled the real-time surveillance of novel SARS-CoV-2 strains throughout the world, the results of which provided public health officials with actionable information to prevent the spread of the virus. However, with great sequencing comes great computation, and while cloud computing platforms bring high-performance computing directly into the hands of all who seek it, optimal design and configuration of a cloud compute cluster requires significant system administration expertise. We developed ViReflow, a user-friendly viral consensus sequence reconstruction pipeline enabling rapid analysis of viral sequence datasets leveraging Amazon Web Services (AWS) cloud compute resources and the Reflow system. ViReflow was developed specifically in response to the COVID-19 pandemic, but it is general to any viral pathogen. Importantly, when utilized with sufficient compute resources, ViReflow can trim, map, call variants, and call consensus sequences from amplicon sequence data from 1000 SARS-CoV-2 samples at 1000X depth in < 10 min, with no user intervention. ViReflow's simplicity, flexibility, and scalability make it an ideal tool for viral molecular epidemiological efforts.


Assuntos
COVID-19 , Software , COVID-19/epidemiologia , Genoma Viral/genética , Humanos , Pandemias , SARS-CoV-2/genética
18.
mSystems ; 7(2): e0137821, 2022 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-35293792

RESUMO

Increasing data volumes on high-throughput sequencing instruments such as the NovaSeq 6000 leads to long computational bottlenecks for common metagenomics data preprocessing tasks such as adaptor and primer trimming and host removal. Here, we test whether faster recently developed computational tools (Fastp and Minimap2) can replace widely used choices (Atropos and Bowtie2), obtaining dramatic accelerations with additional sensitivity and minimal loss of specificity for these tasks. Furthermore, the taxonomic tables resulting from downstream processing provide biologically comparable results. However, we demonstrate that for taxonomic assignment, Bowtie2's specificity is still required. We suggest that periodic reevaluation of pipeline components, together with improvements to standardized APIs to chain them together, will greatly enhance the efficiency of common bioinformatics tasks while also facilitating incorporation of further optimized steps running on GPUs, FPGAs, or other architectures. We also note that a detailed exploration of available algorithms and pipeline components is an important step that should be taken before optimization of less efficient algorithms on advanced or nonstandard hardware. IMPORTANCE In shotgun metagenomics studies that seek to relate changes in microbial DNA across samples, processing the data on a computer often takes longer than obtaining the data from the sequencing instrument. Recently developed software packages that perform individual steps in the pipeline of data processing in principle offer speed advantages, but in practice they may contain pitfalls that prevent their use, for example, they may make approximations that introduce unacceptable errors in the data. Here, we show that differences in choices of these components can speed up overall data processing by 5-fold or more on the same hardware while maintaining a high degree of correctness, greatly reducing the time taken to interpret results. This is an important step for using the data in clinical settings, where the time taken to obtain the results may be critical for guiding treatment.


Assuntos
Metagenômica , Software , Metagenômica/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos
19.
GigaByte ; 2022: gigabyte37, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36968795

RESUMO

Epidemic simulations require the ability to sample contact networks from various random graph models. Existing methods can simulate city-scale or even country-scale contact networks, but they are unable to feasibly simulate global-scale contact networks due to high memory consumption. NiemaGraphGen (NGG) is a memory-efficient graph generation tool that enables the simulation of global-scale contact networks. NGG avoids storing the entire graph in memory and is instead intended to be used in a data streaming pipeline, resulting in memory consumption that is orders of magnitude smaller than existing tools. NGG provides a massively-scalable solution for simulating social contact networks, enabling global-scale epidemic simulation studies.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...