Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 33(7): 1175-1187, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-36990779

RESUMO

Seed-chain-extend with k-mer seeds is a powerful heuristic technique for sequence alignment used by modern sequence aligners. Although effective in practice for both runtime and accuracy, theoretical guarantees on the resulting alignment do not exist for seed-chain-extend. In this work, we give the first rigorous bounds for the efficacy of seed-chain-extend with k-mers in expectation Assume we are given a random nucleotide sequence of length ∼n that is indexed (or seeded) and a mutated substring of length ∼m ≤ n with mutation rate θ < 0.206. We prove that we can find a k = Θ(log n) for the k-mer size such that the expected runtime of seed-chain-extend under optimal linear-gap cost chaining and quadratic time gap extension is O(mn f (θ) log n), where f(θ) < 2.43 · θ holds as a loose bound. The alignment also turns out to be good; we prove that more than [Formula: see text] fraction of the homologous bases is recoverable under an optimal chain. We also show that our bounds work when k-mers are sketched, that is, only a subset of all k-mers is selected, and that sketching reduces chaining time without increasing alignment time or decreasing accuracy too much, justifying the effectiveness of sketching as a practical speedup in sequence alignment. We verify our results in simulation and on real noisy long-read data and show that our theoretical runtimes can predict real runtimes accurately. We conjecture that our bounds can be improved further, and in particular, f(θ) can be further reduced.


Assuntos
Algoritmos , Heurística , Simulação por Computador , Alinhamento de Sequência , Análise de Sequência de DNA/métodos
2.
Nat Methods ; 20(11): 1661-1665, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37735570

RESUMO

Sequence comparison tools for metagenome-assembled genomes (MAGs) struggle with high-volume or low-quality data. We present skani ( https://github.com/bluenote-1577/skani ), a method for determining average nucleotide identity (ANI) via sparse approximate alignments. skani outperforms FastANI in accuracy and speed (>20× faster) for fragmented, incomplete MAGs. skani can query genomes against >65,000 prokaryotic genomes in seconds and 6 GB memory. skani unlocks higher-resolution insights for extensive, noisy metagenomic datasets.


Assuntos
Metagenoma , Células Procarióticas , Metagenômica/métodos
3.
Bioinformatics ; 40(Supplement_1): i30-i38, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940183

RESUMO

SUMMARY: Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly. Benchmarking evaluations on synthetic metagenomes show that Floria is > 3× faster and recovers 21% more strain content than base-level assembly methods (Strainberry) while being over an order of magnitude faster when only phasing is required. Applying Floria to a set of 109 deeply sequenced nanopore metagenomes took <20 min on average per sample and identified several species that have consistent strain heterogeneity. Applying Floria's short-read haplotyping to a longitudinal gut metagenomics dataset revealed a dynamic multi-strain Anaerostipes hadrus community with frequent strain loss and emergence events over 636 days. With Floria, accurate haplotyping of metagenomic datasets takes mere minutes on standard workstations, paving the way for extensive strain-level metagenomic analyses. AVAILABILITY AND IMPLEMENTATION: Floria is available at https://github.com/bluenote-1577/floria, and the Floria-PL pipeline is available at https://github.com/jsgounot/Floria_analysis_workflow along with code for reproducing the benchmarks.


Assuntos
Metagenoma , Metagenômica , Metagenômica/métodos , Haplótipos , Software , Humanos , Genoma Bacteriano , Microbiota/genética , Bactérias/genética , Bactérias/classificação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
4.
BMC Bioinformatics ; 25(1): 161, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38649836

RESUMO

BACKGROUND: Taxonomic classification of reads obtained by metagenomic sequencing is often a first step for understanding a microbial community, but correctly assigning sequencing reads to the strain or sub-species level has remained a challenging computational problem. RESULTS: We introduce Mora, a MetagenOmic read Re-Assignment algorithm capable of assigning short and long metagenomic reads with high precision, even at the strain level. Mora is able to accurately re-assign reads by first estimating abundances through an expectation-maximization algorithm and then utilizing abundance information to re-assign query reads. The key idea behind Mora is to maximize read re-assignment qualities while simultaneously minimizing the difference from estimated abundance levels, allowing Mora to avoid over assigning reads to the same genomes. On simulated diverse reads, this allows Mora to achieve F1 scores comparable to other algorithms while having less runtime. However, Mora significantly outshines other algorithms on very similar reads. We show that the high penalty of over assigning reads to a common reference genome allows Mora to accurately infer correct strains for real data in the form of E. coli reads. CONCLUSIONS: Mora is a fast and accurate read re-assignment algorithm that is modularized, allowing it to be incorporated into general metagenomics and genomics workflows. It is freely available at https://github.com/AfZheng126/MORA .


Assuntos
Algoritmos , Metagenômica , Metagenômica/métodos , Escherichia coli/genética , Análise de Sequência de DNA/métodos , Software , Metagenoma/genética , Genoma Bacteriano
5.
Bioinformatics ; 39(2)2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36702468

RESUMO

MOTIVATION: We face an increasing flood of genetic sequence data, from diverse sources, requiring rapid computational analysis. Rapid analysis can be achieved by sampling a subset of positions in each sequence. Previous sequence-sampling methods, such as minimizers, syncmers and minimally overlapping words, were developed by heuristic intuition, and are not optimal. RESULTS: We present a sequence-sampling approach that provably optimizes sensitivity for a whole class of sequence comparison methods, for randomly evolving sequences. It is likely near-optimal for a wide range of alignment-based and alignment-free analyses. For real biological DNA, it increases specificity by avoiding simple repeats. Our approach generalizes universal hitting sets (which guarantee to sample a sequence at least once) and polar sets (which guarantee to sample a sequence at most once). This helps us understand how to do rapid sequence analysis as accurately as possible. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://gitlab.com/mcfrith/noverlap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Análise de Sequência de DNA/métodos
6.
Bioinformatics ; 38(20): 4659-4669, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-36124869

RESUMO

MOTIVATION: Selecting a subset of k-mers in a string in a local manner is a common task in bioinformatics tools for speeding up computation. Arguably the most well-known and common method is the minimizer technique, which selects the 'lowest-ordered' k-mer in a sliding window. Recently, it has been shown that minimizers may be a sub-optimal method for selecting subsets of k-mers when mutations are present. There is, however, a lack of understanding behind the theory of why certain methods perform well. RESULTS: We first theoretically investigate the conservation metric for k-mer selection methods. We derive an exact expression for calculating the conservation of a k-mer selection method. This turns out to be tractable enough for us to prove closed-form expressions for a variety of methods, including (open and closed) syncmers, (a, b, n)-words, and an upper bound for minimizers. As a demonstration of our results, we modified the minimap2 read aligner to use a more conserved k-mer selection method and demonstrate that there is up to an 8.2% relative increase in number of mapped reads. However, we found that the k-mers selected by more conserved methods are also more repetitive, leading to a runtime increase during alignment. We give new insight into how one might use new k-mer selection methods as a reparameterization to optimize for speed and alignment quality. AVAILABILITY AND IMPLEMENTATION: Simulations and supplementary methods are available at https://github.com/bluenote-1577/local-kmer-selection-results. os-minimap2 is a modified version of minimap2 and available at https://github.com/bluenote-1577/os-minimap2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Mutação , Análise de Sequência de DNA/métodos
7.
Value Health ; 26(10): 1543-1548, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37422075

RESUMO

OBJECTIVES: Patient-reported outcome (PRO) data are critical in understanding treatments from the patient perspective in cancer clinical trials. The potential benefits and methodological approaches to the collection of PRO data after treatment discontinuation (eg, because of progressive disease or unacceptable drug toxicity) are less clear. The purpose of this article is to describe the Food and Drug Administration's Oncology Center of Excellence and the Critical Path Institute cosponsored 2-hour virtual roundtable, held in 2020, to discuss this specific issue. METHODS: We summarize key points from this discussion with 16 stakeholders representing academia, clinical practice, patients, international regulatory agencies, health technology assessment bodies/payers, industry, and PRO instrument development. RESULTS: Stakeholders recognized that any PRO data collection after treatment discontinuation should have clearly defined objectives to ensure that data can be analyzed and reported. CONCLUSIONS: Data collection after discontinuation without a justification for its use wastes patients' time and effort and is unethical.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Neoplasias , Humanos , Neoplasias/tratamento farmacológico , Oncologia , Coleta de Dados , Medidas de Resultados Relatados pelo Paciente
9.
Ergonomics ; 61(10): 1299-1310, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29637835

RESUMO

Physical employment standards evaluate whether a worker possesses the physical abilities to safely and efficiently perform all critical on-the-job tasks. Initial Attack (IA) wildland fire fighters (WFF) must perform such critical tasks in all terrains. Following a physical demands analysis, IA WFF (n = 946 out of a possible 965) from all fire jurisdictions ranked the most demanding tasks and identified mountains, muskeg and rolling hills as the most challenging terrains. Experimental trials found the oxygen cost (mean ± SD V˙ O2 mL·kg-1·min-1) while performing the hose pack back carry to be 40 ± 7 in steep mountains, 34 ± 5 in muskeg and 34 ± 2 in rolling hills (n = 168). Back-carrying and hand-carrying a 28.5 kg pump, back-carrying a 25 kg hose pack and advancing charged hose were the most demanding tasks. Performing the same emergency IA WFF tasks was significantly more demanding in mountains (p ≤ 0.05), and these higher demands must be taken into account when developing a physical employment standard for Canadian wildland fire fighters. Practitioner Summary: Physical employment standards evaluate whether an applicant or incumbent possesses the physical and physiological abilities to safely and efficiently perform the critical on-the-job tasks. This paper details the process used to undertake a physical demands analysis and characterise tasks for the development of a  circuit test and fitness employment standard for IA WFF.


Assuntos
Emprego/normas , Bombeiros , Aptidão Física , Incêndios Florestais , Canadá , Humanos , Inquéritos e Questionários
10.
J Comput Biol ; 29(2): 195-211, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35041529

RESUMO

Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition problem, which is a more flexible graphical metric compared with the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization model, which is a probabilistic analogue of the MEC model. We incorporate both formulations into a long-read based polyploid haplotype phasing method called flopp. We show that flopp compares favorably with state-of-the-art algorithms-up to 30 times faster with 2 times fewer switch errors on 6 × ploidy simulated data. Further, we show using real nanopore data that flopp can quickly reveal reasonable haplotype structures from the autotetraploid Solanum tuberosum (potato).


Assuntos
Algoritmos , Haplótipos , Poliploidia , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Genoma de Planta , Modelos Genéticos , Modelos Estatísticos , Família Multigênica , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/estatística & dados numéricos , Software , Solanum tuberosum/genética
11.
Nat Biotechnol ; 37(8): 937-944, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31359005

RESUMO

Characterization of microbiomes has been enabled by high-throughput metagenomic sequencing. However, existing methods are not designed to combine reads from short- and long-read technologies. We present a hybrid metagenomic assembler named OPERA-MS that integrates assembly-based metagenome clustering with repeat-aware, exact scaffolding to accurately assemble complex communities. Evaluation using defined in vitro and virtual gut microbiomes revealed that OPERA-MS assembles metagenomes with greater base pair accuracy than long-read (>5×; Canu), higher contiguity than short-read (~10× NGA50; MEGAHIT, IDBA-UD, metaSPAdes) and fewer assembly errors than non-metagenomic hybrid assemblers (2×; hybridSPAdes). OPERA-MS provides strain-resolved assembly in the presence of multiple genomes of the same species, high-quality reference genomes for rare species (<1%) with ~9× long-read coverage and near-complete genomes with higher coverage. We used OPERA-MS to assemble 28 gut metagenomes of antibiotic-treated patients, and showed that the inclusion of long nanopore reads produces more contiguous assemblies (200× improvement over short-read assemblies), including more than 80 closed plasmid or phage sequences and a new 263 kbp jumbo phage. High-quality hybrid assemblies enable an exquisitely detailed view of the gut resistome in human patients.


Assuntos
Bactérias/efeitos dos fármacos , Bactérias/genética , Metagenômica/métodos , Microbiota/efeitos dos fármacos , Análise de Sequência de DNA/métodos , Antibacterianos/farmacologia , Farmacorresistência Bacteriana , Fezes/microbiologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Metagenoma , Nanoporos , Software
14.
Med Sci Sports Exerc ; 42(7): 1345-54, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20019629

RESUMO

INTRODUCTION: The purpose of this study was to characterize the physiological demands of recreational off-road vehicle riding under typical riding conditions using habitual recreational off-road vehicle riders (n = 128). METHODS: Comparisons of the physical demands of off-road vehicle riding were made between vehicle types (all-terrain vehicle (ATV) and off-road motorcycle (ORM)) to the demands of common recreational activities. Habitual riders (ATV = 56, ORM = 72) performed strength assessments before and after a representative trail ride (48 +/- 24.2 min), and ambulatory oxygen consumption was measured during one lap (24.2 +/- 11.8 min) of the ride. RESULTS: The mean VO2 requirement (mL x kg(-1) x min(-1)) while riding an off-road vehicle was 12.1 +/- 4.9 for ATV and 21.3 +/- 7.1 for ORM (P = 0.002), which is comparable to the VO2 required of many common recreational activities. Temporal analysis of activity intensity revealed approximately 14% of an ATV ride and 38% of an ORM ride are within the intensity range (940% VO2 reserve) required to achieve changes in aerobic fitness. Riding on a representative course also led to muscular fatigue, particularly in the upper body. CONCLUSIONS: On the basis of the measured metabolic demands, evidence of muscular strength requirements, and the associated caloric expenditures with off-road vehicle riding, this alternative form of activity conforms to the recommended physical activity guidelines and can be effective for achieving beneficial changes in health and fitness.


Assuntos
Exercício Físico/fisiologia , Veículos Off-Road , Esforço Físico/fisiologia , Recreação/fisiologia , Adolescente , Adulto , Feminino , Força da Mão , Humanos , Masculino , Pessoa de Meia-Idade , Fadiga Muscular , Consumo de Oxigênio , Adulto Jovem
15.
Appl Physiol Nutr Metab ; 35(1): 45-58, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20130666

RESUMO

The purpose of this investigation was to identify the critical tasks encountered by correctional officers (COs) on the job and to conduct a comprehensive assessment and characterization of the physical demands of these tasks. These are the first steps in developing a fitness screening test for COs in compliance with recent legislation. The most important, physically demanding, and frequently occurring tasks were identified using Delphi methodology, focus groups, and questionnaire responses from 190 experienced front-line COs. These tasks were structured into emergency response scenarios for which a physical and physiological characterization was conducted to verify their relative physical demands analysis. Oxygen consumption and the forces exerted by COs were quantified while they were responding and then controlling and restraining inmates. The female COs used less force than the male COs did to control and restrain the same inmates (body control = 46 vs. 60 kg, wrist hold = 32 vs. 49 kg, and arm retraction = 37 vs. 47 kg) and did not exert their maximal strength during their control and restraint activities. The mean oxygen consumption of the female and male COs while performing the on-the-job tasks was similar (39.5 vs. 38.5 mL.kg-1.min-1). We concluded that the essential components of a fitness screening protocol for CO applicants are cell search, expeditious response, body control, arm restraint, inmate relocation, and an assessment of aerobic fitness. The criterion performance standards for completing these tasks in a circuit were set at the job performance level of safe and efficient female COs.


Assuntos
Esforço Físico/fisiologia , Aptidão Física/fisiologia , Prisões , Competência Profissional/estatística & dados numéricos , Adulto , Análise de Variância , Canadá , Feminino , Grupos Focais/métodos , Humanos , Masculino , Pessoa de Meia-Idade , Consumo de Oxigênio/fisiologia , Restrição Física/métodos , Restrição Física/estatística & dados numéricos , Fatores Sexuais , Inquéritos e Questionários , Análise e Desempenho de Tarefas , Recursos Humanos , Adulto Jovem
17.
Genet Epidemiol ; 31 Suppl 1: S12-21, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18046771

RESUMO

The papers in presentation group 2 of Genetic Analysis Workshop 15 (GAW15) conducted association analyses of rheumatoid arthritis data. The analyses were carried out primarily in the data provided by the North American Rheumatoid Arthritis Consortium (NARAC). One group conducted analyses in the data provided by the Canadian Rheumatoid Arthritis Genetics Study (CRAGS). Analysis strategies included genome-wide scans, the examination of candidate genes, and investigations of a region of interest on chromosome 18q21. Most authors employed relatively new methods, proposed extensions of existing methods, or introduced completely novel methods for aspects of association analysis. There were several common observations; a group of papers using a variety of methods found stronger association, on chromosomes 6 and 18 and in candidate gene PTPN22 among women with early onset. Generally, models that considered haplotypes or multiple markers showed stronger evidence for association than did single marker analyses.


Assuntos
Artrite Reumatoide/genética , Algoritmos , Cromossomos Humanos Par 18 , Cromossomos Humanos Par 6 , Genoma Humano , Haplótipos , Humanos , Fenótipo , Proteína Tirosina Fosfatase não Receptora Tipo 22/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA