Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Pattern Recognit Lett ; 168: 1-7, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37034964

RESUMO

Region expansion-the growth of regions to include all points within a certain distance of their perimeters-is a basic, widely applicable operation, but is expensive to perform exactly. It has been shown that, if the solution is approximated by relaxing the distance metric to the L∞-norm, efficiency can be greatly improved using properties of quadtrees. The method as described, however, requires the quadtrees to be square, both for the metric and the particular details of the algorithm. In some cases, such as spherical surface approximation, it is desirable for the quadtree nodes to be triangular instead. In this work, we thus describe an adaptation of the L∞-norm metric and the previously described algorithm to allow efficient approximation of region expansion in images represented as regular triangulated meshes. Like the original method for square quadtrees, our algorithm achieves sublinear time with respect to expansion radius.

2.
Sci Data ; 10(1): 8, 2023 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-36599892

RESUMO

Though exponentially growing health-related literature has been made available to a broad audience online, the language of scientific articles can be difficult for the general public to understand. Therefore, adapting this expert-level language into plain language versions is necessary for the public to reliably comprehend the vast health-related literature. Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed for proper evaluation. Proposed datasets thus far consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This leads to a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. The dataset contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark automatic adaptation on the dataset with state-of-the-art Deep Learning approaches, setting baselines for future research.

3.
J Am Med Inform Assoc ; 29(11): 1976-1988, 2022 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-36083212

RESUMO

OBJECTIVE: Plain language in medicine has long been advocated as a way to improve patient understanding and engagement. As the field of Natural Language Processing has progressed, increasingly sophisticated methods have been explored for the automatic simplification of existing biomedical text for consumers. We survey the literature in this area with the goals of characterizing approaches and applications, summarizing existing resources, and identifying remaining challenges. MATERIALS AND METHODS: We search English language literature using lists of synonyms for both the task (eg, "text simplification") and the domain (eg, "biomedical"), and searching for all pairs of these synonyms using Google Scholar, Semantic Scholar, PubMed, ACL Anthology, and DBLP. We expand search terms based on results and further include any pertinent papers not in the search results but cited by those that are. RESULTS: We find 45 papers that we deem relevant to the automatic simplification of biomedical text, with data spanning 7 natural languages. Of these (nonexclusively), 32 describe tools or methods, 13 present data sets or resources, and 9 describe impacts on human comprehension. Of the tools or methods, 22 are chiefly procedural and 10 are chiefly neural. CONCLUSIONS: Though neural methods hold promise for this task, scarcity of parallel data has led to continued development of procedural methods. Various low-resource mitigations have been proposed to advance neural methods, including paragraph-level and unsupervised models and augmentation of neural models with procedural elements drawing from knowledge bases. However, high-quality parallel data will likely be crucial for developing fully automated biomedical text simplification.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Humanos , Idioma , PubMed , Semântica
4.
IEEE Trans Vis Comput Graph ; 27(2): 1073-1083, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33095716

RESUMO

Data visualizations convert numbers into visual marks so that our visual system can extract data from an image instead of raw numbers. Clearly, the visual system does not compute these values as a computer would, as an arithmetic mean or a correlation. Instead, it extracts these patterns using perceptual proxies; heuristic shortcuts of the visual marks, such as a center of mass or a shape envelope. Understanding which proxies people use would lead to more effective visualizations. We present the results of a series of crowdsourced experiments that measure how powerfully a set of candidate proxies can explain human performance when comparing the mean and range of pairs of data series presented as bar charts. We generated datasets where the correct answer-the series with the larger arithmetic mean or range-was pitted against an "adversarial" series that should be seen as larger if the viewer uses a particular candidate proxy. We used both Bayesian logistic regression models and a robust Bayesian mixed-effects linear model to measure how strongly each adversarial proxy could drive viewers to answer incorrectly and whether different individuals may use different proxies. Finally, we attempt to construct adversarial datasets from scratch, using an iterative crowdsourcing procedure to perform black-box optimization.

5.
IEEE Trans Vis Comput Graph ; 26(1): 1012-1021, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31443016

RESUMO

Perceptual tasks in visualizations often involve comparisons. Of two sets of values depicted in two charts, which set had values that were the highest overall? Which had the widest range? Prior empirical work found that the performance on different visual comparison tasks (e.g., "biggest delta", "biggest correlation") varied widely across different combinations of marks and spatial arrangements. In this paper, we expand upon these combinations in an empirical evaluation of two new comparison tasks: the "biggest mean" and "biggest range" between two sets of values. We used a staircase procedure to titrate the difficulty of the data comparison to assess which arrangements produced the most precise comparisons for each task. We find visual comparisons of biggest mean and biggest range are supported by some chart arrangements more than others, and that this pattern is substantially different from the pattern for other tasks. To synthesize these dissonant findings, we argue that we must understand which features of a visualization are actually used by the human visual system to solve a given task. We call these perceptual proxies. For example, when comparing the means of two bar charts, the visual system might use a "Mean length" proxy that isolates the actual lengths of the bars and then constructs a true average across these lengths. Alternatively, it might use a "Hull Area" proxy that perceives an implied hull bounded by the bars of each chart and then compares the areas of these hulls. We propose a series of potential proxies across different tasks, marks, and spatial arrangements. Simple models of these proxies can be empirically evaluated for their explanatory power by matching their performance to human performance across these marks, arrangements, and tasks. We use this process to highlight candidates for perceptual proxies that might scale more broadly to explain performance in visual comparison.


Assuntos
Gráficos por Computador , Percepção Visual/fisiologia , Crowdsourcing , Humanos , Modelos Biológicos , Análise e Desempenho de Tarefas
6.
Genome Biol ; 20(1): 232, 2019 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-31690338

RESUMO

The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.


Assuntos
Contaminação por DNA , Ensaios de Triagem em Larga Escala , Metagenômica/métodos , Algoritmos , Humanos , Polyomavirus/isolamento & purificação , Proteoma
7.
Microbiome ; 6(1): 197, 2018 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-30396371

RESUMO

The Mid-Atlantic Microbiome Meet-up (M3) organization brings together academic, government, and industry groups to share ideas and develop best practices for microbiome research. In January of 2018, M3 held its fourth meeting, which focused on recent advances in biodefense, specifically those relating to infectious disease, and the use of metagenomic methods for pathogen detection. Presentations highlighted the utility of next-generation sequencing technologies for identifying and tracking microbial community members across space and time. However, they also stressed the current limitations of genomic approaches for biodefense, including insufficient sensitivity to detect low-abundance pathogens and the inability to quantify viable organisms. Participants discussed ways in which the community can improve software usability and shared new computational tools for metagenomic processing, assembly, annotation, and visualization. Looking to the future, they identified the need for better bioinformatics toolkits for longitudinal analyses, improved sample processing approaches for characterizing viruses and fungi, and more consistent maintenance of database resources. Finally, they addressed the necessity of improving data standards to incentivize data sharing. Here, we summarize the presentations and discussions from the meeting, identifying the areas where microbiome analyses have improved our ability to detect and manage biological threats and infectious disease, as well as gaps of knowledge in the field that require future funding and focus.


Assuntos
Armas Biológicas , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Humanos , Microbiota/fisiologia , Análise de Sequência de DNA/métodos
8.
Artigo em Inglês | MEDLINE | ID: mdl-30136952

RESUMO

Data are often viewed as a single set of values, but those values frequently must be compared with another set. The existing evaluations of designs that facilitate these comparisons tend to be based on intuitive reasoning, rather than quantifiable measures. We build on this work with a series of crowdsourced experiments that use low-level perceptual comparison tasks that arise frequently in comparisons within data visualizations (e.g., which value changes the most between the two sets of data?). Participants completed these tasks across a variety of layouts: overlaid, two arrangements of juxtaposed small multiples, mirror-symmetric small multiples, and animated transitions. A staircase procedure sought the difficulty level (e.g., value change delta) that led to equivalent accuracy for each layout. Confirming prior intuition, we observe high levels of performance for overlaid versus standard small multiples. However, we also find performance improvements for both mirror symmetric small multiples and animated transitions. While some results are incongruent with common wisdom in data visualization, they align with previous work in perceptual psychology, and thus have potentially strong implications for visual comparison designs.

9.
PeerJ ; 6: e4892, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29868286

RESUMO

When performing bioforensic casework, it is important to be able to reliably detect the presence of a particular organism in a metagenomic sample, even if the organism is only present in a trace amount. For this task, it is common to use a sequence classification program that determines the taxonomic affiliation of individual sequence reads by comparing them to reference database sequences. As metagenomic data sets often consist of millions or billions of reads that need to be compared to reference databases containing millions of sequences, such sequence classification programs typically use search heuristics and databases with reduced sequence diversity to speed up the analysis, which can lead to incorrect assignments. Thus, in a bioforensic setting where correct assignments are paramount, assignments of interest made by "first-pass" classifiers should be confirmed using the most precise methods and comprehensive databases available. In this study we present a BLAST-based method for validating the assignments made by less precise sequence classification programs, with optimal parameters for filtering of BLAST results determined via simulation of sequence reads from genomes of interest, and we apply the method to the detection of four pathogenic organisms. The software implementing the method is open source and freely available.

10.
Genome Biol ; 17(1): 132, 2016 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-27323842

RESUMO

Mash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences and sequence sets to small, representative sketches, from which global mutation distances can be rapidly estimated. We demonstrate several use cases, including the clustering of all 54,118 NCBI RefSeq genomes in 33 CPU h; real-time database search using assembled or unassembled Illumina, Pacific Biosciences, and Oxford Nanopore data; and the scalable clustering of hundreds of metagenomic samples by composition. Mash is freely released under a BSD license ( https://github.com/marbl/mash ).


Assuntos
Evolução Molecular , Genoma , Genômica/métodos , Metagenoma , Metagenômica/métodos , Software , Análise por Conglomerados , Bases de Dados de Ácidos Nucleicos , Filogenia
11.
Viruses ; 7(11): 5875-88, 2015 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-26569291

RESUMO

High consequence human pathogenic viruses must be handled at biosafety level 2, 3 or 4 and must be rendered non-infectious before they can be utilized for molecular or immunological applications at lower biosafety levels. Here we evaluate psoralen-inactivated Arena-, Bunya-, Corona-, Filo-, Flavi- and Orthomyxoviruses for their suitability as antigen in immunological processes and as template for reverse transcription PCR and sequencing. The method of virus inactivation using a psoralen molecule appears to have broad applicability to RNA viruses and to leave both the particle and RNA of the treated virus intact, while rendering the virus non-infectious.


Assuntos
Antivirais/metabolismo , Ficusina/metabolismo , Viabilidade Microbiana/efeitos dos fármacos , Vírus de RNA/efeitos dos fármacos , Vírus de RNA/fisiologia , Inativação de Vírus , Animais , Antígenos Virais/imunologia , Linhagem Celular , Humanos , Vírus de RNA/genética , Vírus de RNA/imunologia , RNA Viral/genética
12.
Genome Announc ; 2(6)2014 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-25377701

RESUMO

Staphylococcus aureus subsp. aureus ATCC 25923 is commonly used as a control strain for susceptibility testing to antibiotics and as a quality control strain for commercial products. We present the completed genome sequence for the strain, consisting of the chromosome and a 27.5-kb plasmid.

13.
Genome Biol ; 15(11): 524, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25410596

RESUMO

Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.


Assuntos
Bactérias/genética , Genoma Bacteriano/genética , Filogenia , Alinhamento de Sequência , Algoritmos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Software
14.
Genome Announc ; 1(1)2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23405332

RESUMO

The Bacillus anthracis Carbosap genome, which includes the pXO1 and pXO2 plasmids, has been shown to encode the major B. anthracis virulence factors, yet this strain's attenuation has not yet been explained. Here we report the draft genome sequence of this strain, and a comparison to fully virulent B. anthracis.

15.
Genome Biol ; 14(1): R2, 2013 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-23320958

RESUMO

We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Software
16.
PLoS One ; 7(8): e43350, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22937038

RESUMO

BACKGROUND: Although genome-wide transcriptional analysis has been used for many years to study bacterial gene expression, many aspects of the bacterial transcriptome remain undefined. One example is antisense transcription, which has been observed in a number of bacteria, though the function of antisense transcripts, and their distribution across the bacterial genome, is still unclear. METHODOLOGY/PRINCIPAL FINDINGS: Single-stranded RNA-seq results revealed a widespread and non-random pattern of antisense transcription covering more than two thirds of the B. anthracis genome. Our analysis revealed a variety of antisense structural patterns, suggesting multiple mechanisms of antisense transcription. The data revealed several instances of sense and antisense expression changes in different growth conditions, suggesting that antisense transcription may play a role in the ways in which B. anthracis responds to its environment. Significantly, genome-wide antisense expression occurred at consistently higher levels on the lagging strand, while the leading strand showed very little antisense activity. Intrasample gene expression comparisons revealed a gene dosage effect in all growth conditions, where genes farthest from the origin showed the lowest overall range of expression for both sense and antisense directed transcription. Additionally, transcription from both strands was verified using a novel strand-specific assay. The variety of structural patterns we observed in antisense transcription suggests multiple mechanisms for this phenomenon, suggesting that some antisense transcription may play a role in regulating the expression of key genes, while some may be due to chromosome replication dynamics and transcriptional noise. CONCLUSIONS/SIGNIFICANCE: Although the variety of structural patterns we observed in antisense transcription suggest multiple mechanisms for antisense expression, our data also clearly indicate that antisense transcription may play a genome-wide role in regulating the expression of key genes in Bacillus species. This study illustrates the surprising complexity of prokaryotic RNA abundance for both strands of a bacterial chromosome.


Assuntos
Bacillus anthracis/genética , RNA Antissenso/genética , RNA/genética , Análise de Sequência com Séries de Oligonucleotídeos
17.
BMC Bioinformatics ; 12: 385, 2011 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-21961884

RESUMO

BACKGROUND: A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. RESULTS: Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. CONCLUSIONS: Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.


Assuntos
Internet , Metagenômica/métodos , Software , Biologia Computacional , Trato Gastrointestinal/microbiologia , Humanos
18.
Bioinformatics ; 26(15): 1901-2, 2010 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-20562417

RESUMO

SUMMARY: Bisulfite sequencing allows cytosine methylation, an important epigenetic marker, to be detected via nucleotide substitutions. Since the Applied Biosystems SOLiD System uses a unique di-base encoding that increases confidence in the detection of nucleotide substitutions, it is a potentially advantageous platform for this application. However, the di-base encoding also makes reads with many nucleotide substitutions difficult to align to a reference sequence with existing tools, preventing the platform's potential utility for bisulfite sequencing from being realized. Here, we present SOCS-B, a reference-based, un-gapped alignment algorithm for the SOLiD System that is tolerant of both bisulfite-induced nucleotide substitutions and a parametric number of sequencing errors, facilitating bisulfite sequencing on this platform. An implementation of the algorithm has been integrated with the previously reported SOCS alignment tool, and was used to align CpG methylation-enriched Arabidopsis thaliana bisulfite sequence data, exhibiting a 2-fold increase in sensitivity compared to existing methods for aligning SOLiD bisulfite data. AVAILABILITY: Executables, source code, and sample data are available at http://solidsoftwaretools.com/gf/project/socs/


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sulfitos , Arabidopsis/genética , Alinhamento de Sequência/instrumentação , Análise de Sequência de DNA/instrumentação
19.
J Bacteriol ; 191(10): 3203-11, 2009 May.
Artigo em Inglês | MEDLINE | ID: mdl-19304856

RESUMO

Although gene expression has been studied in bacteria for decades, many aspects of the bacterial transcriptome remain poorly understood. Transcript structure, operon linkages, and information on absolute abundance all provide valuable insights into gene function and regulation, but none has ever been determined on a genome-wide scale for any bacterium. Indeed, these aspects of the prokaryotic transcriptome have been explored on a large scale in only a few instances, and consequently little is known about the absolute composition of the mRNA population within a bacterial cell. Here we report the use of a high-throughput sequencing-based approach in assembling the first comprehensive, single-nucleotide resolution view of a bacterial transcriptome. We sampled the Bacillus anthracis transcriptome under a variety of growth conditions and showed that the data provide an accurate and high-resolution map of transcript start sites and operon structure throughout the genome. Further, the sequence data identified previously nonannotated regions with significant transcriptional activity and enhanced the accuracy of existing genome annotations. Finally, our data provide estimates of absolute transcript abundance and suggest that there is significant transcriptional heterogeneity within a clonal, synchronized bacterial population. Overall, our results offer an unprecedented view of gene expression and regulation in a bacterial cell.


Assuntos
Bacillus anthracis/genética , Biologia Computacional , Perfilação da Expressão Gênica/métodos , Regulação Bacteriana da Expressão Gênica/genética , Dados de Sequência Molecular , Óperon/genética , RNA Mensageiro/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA
20.
Bioinformatics ; 24(23): 2776-7, 2008 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-18842598

RESUMO

UNLABELLED: Here, we report the development of SOCS (short oligonucleotide color space), a program designed for efficient and flexible mapping of Applied Biosystems SOLiD sequence data onto a reference genome. SOCS performs its mapping within the context of 'color space', and it maximizes usable data by allowing a user-specified number of mismatches. Sequence census functions facilitate a variety of functional genomics applications, including transcriptome mapping and profiling, as well as ChIP-Seq. AVAILABILITY: Executables, source code, and sample data are available at http://socs.biology.gatech.edu/


Assuntos
Genoma , Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Bases de Dados Genéticas , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA