Pesquisa | Portal Regional da BVS

1.

COVID-19 Health Beliefs Regarding Mask Wearing and Vaccinations on Twitter: Deep Learning Approach.

Ke, Si Yang; Neeley-Tass, E Shannon; Barnes, Michael; Hanson, Carl L; Giraud-Carrier, Christophe; Snell, Quinn.

JMIR Infodemiology ; 2(2): e37861, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36348979

RESUMO

Background: Amid the global COVID-19 pandemic, a worldwide infodemic also emerged with large amounts of COVID-19-related information and misinformation spreading through social media channels. Various organizations, including the World Health Organization (WHO) and the Centers for Disease Control and Prevention (CDC), and other prominent individuals issued high-profile advice on preventing the further spread of COVID-19. Objective: The purpose of this study is to leverage machine learning and Twitter data from the pandemic period to explore health beliefs regarding mask wearing and vaccines and the influence of high-profile cues to action. Methods: A total of 646,885,238 COVID-19-related English tweets were filtered, creating a mask-wearing data set and a vaccine data set. Researchers manually categorized a training sample of 3500 tweets for each data set according to their relevance to Health Belief Model (HBM) constructs and used coded tweets to train machine learning models for classifying each tweet in the data sets. Results: In total, 5 models were trained for both the mask-related and vaccine-related data sets using the XLNet transformer model, with each model achieving at least 81% classification accuracy. Health beliefs regarding perceived benefits and barriers were most pronounced for both mask wearing and immunization; however, the strength of those beliefs appeared to vary in response to high-profile cues to action. Conclusions: During both the COVID-19 pandemic and the infodemic, health beliefs related to perceived benefits and barriers observed through Twitter using a big data machine learning approach varied over time and in response to high-profile cues to action from prominent organizations and individuals.

2.

Predicting suicidal thoughts and behavior among adolescents using the risk and protective factor framework: A large-scale machine learning approach.

Weller, Orion; Sagers, Luke; Hanson, Carl; Barnes, Michael; Snell, Quinn; Tass, E Shannon.

PLoS One ; 16(11): e0258535, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34731169

RESUMO

INTRODUCTION: Addressing the problem of suicidal thoughts and behavior (STB) in adolescents requires understanding the associated risk factors. While previous research has identified individual risk and protective factors associated with many adolescent social morbidities, modern machine learning approaches can help identify risk and protective factors that interact (group) to provide predictive power for STB. This study aims to develop a prediction algorithm for STB among adolescents using the risk and protective factor framework and social determinants of health. METHODS: The sample population consisted of more than 179,000 high school students living in Utah and participating in the Communities That Care (CTC) Youth Survey from 2011-2017. The dataset includes responses to 300+ questions from the CTC and 8000+ demographic factors from the American Census Survey for a total of 1.2 billion values. Machine learning techniques were employed to extract the survey questions that were best able to predict answers indicative of STB, using recent work in interpretable machine learning. RESULTS: Analysis showed strong predictive power, with the ability to predict individuals with STB with 91% accuracy. After extracting the top ten questions that most affected model predictions, questions fell into four main categories: familial life, drug consumption, demographics, and peer acceptance at school. CONCLUSIONS: Modern machine learning approaches provide new methods for understanding the interaction between root causes and outcomes, such as STB. The model developed in this study showed significant improvement in predictive accuracy compared to previous research. Results indicate that certain risk and protective factors, such as adolescents being threatened or harassed through digital media or bullied at school, and exposure or involvement in serious arguments and yelling at home are the leading predictors of STB and can help narrow and reaffirm priority prevention programming and areas of focused policymaking.

Assuntos

Aprendizado de Máquina , Ideação Suicida , Tentativa de Suicídio/psicologia , Suicídio/psicologia , Adolescente , Bullying/psicologia , Cannabis/efeitos adversos , Feminino , Previsões , Humanos , Internet , Masculino , Fatores de Risco , Instituições Acadêmicas , Estudantes/psicologia , Tentativa de Suicídio/prevenção & controle , Inquéritos e Questionários , Utah , Adulto Jovem , Prevenção do Suicídio

3.

The OGCleaner: filtering false-positive homology clusters.

Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Snell, Quinn; Bybee, Seth M.

Bioinformatics ; 33(1): 125-127, 2017 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-27614349

RESUMO

Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies. AVAILABILITY AND IMPLEMENTATION: https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Proteínas/química , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Anotação de Sequência Molecular , Filogenia , Conformação Proteica , Proteínas/genética , Proteínas/metabolismo

4.

ScaffoldScaffolder: solving contig orientation via bidirected to directed graph reduction.

Bodily, Paul M; Fujimoto, M Stanley; Snell, Quinn; Ventura, Dan; Clement, Mark J.

Bioinformatics ; 32(1): 17-24, 2016 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-26382194

RESUMO

MOTIVATION: The contig orientation problem, which we formally define as the MAX-DIR problem, has at times been addressed cursorily and at times using various heuristics. In setting forth a linear-time reduction from the MAX-CUT problem to the MAX-DIR problem, we prove the latter is NP-complete. We compare the relative performance of a novel greedy approach with several other heuristic solutions. RESULTS: Our results suggest that our greedy heuristic algorithm not only works well but also outperforms the other algorithms due to the nature of scaffold graphs. Our results also demonstrate a novel method for identifying inverted repeats and inversion variants, both of which contradict the basic single-orientation assumption. Such inversions have previously been noted as being difficult to detect and are directly involved in the genetic mechanisms of several diseases. AVAILABILITY AND IMPLEMENTATION: http://bioresearch.byu.edu/scaffoldscaffolder. CONTACT: paulmbodily@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Mapeamento de Sequências Contíguas/métodos

5.

Heterozygous genome assembly via binary classification of homologous sequence.

Bodily, Paul M; Fujimoto, M; Ortega, Cameron; Okuda, Nozomu; Price, Jared C; Clement, Mark J; Snell, Quinn.

BMC Bioinformatics ; 16 Suppl 7: S5, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25952609

RESUMO

BACKGROUND: Genome assemblers to date have predominantly targeted haploid reference reconstruction from homozygous data. When applied to diploid genome assembly, these assemblers perform poorly, owing to the violation of assumptions during both the contigging and scaffolding phases. Effective tools to overcome these problems are in growing demand. Increasing parameter stringency during contigging is an effective solution to obtaining haplotype-specific contigs; however, effective algorithms for scaffolding such contigs are lacking. METHODS: We present a stand-alone scaffolding algorithm, ScaffoldScaffolder, designed specifically for scaffolding diploid genomes. The algorithm identifies homologous sequences as found in "bubble" structures in scaffold graphs. Machine learning classification is used to then classify sequences in partial bubbles as homologous or non-homologous sequences prior to reconstructing haplotype-specific scaffolds. We define four new metrics for assessing diploid scaffolding accuracy: contig sequencing depth, contig homogeneity, phase group homogeneity, and heterogeneity between phase groups. RESULTS: We demonstrate the viability of using bubbles to identify heterozygous homologous contigs, which we term homolotigs. We show that machine learning classification trained on these homolotig pairs can be used effectively for identifying homologous sequences elsewhere in the data with high precision (assuming error-free reads). CONCLUSION: More work is required to comparatively analyze this approach on real data with various parameters and classifiers against other diploid genome assembly methods. However, the initial results of ScaffoldScaffolder supply validity to the idea of employing machine learning in the difficult task of diploid genome assembly. Software is available at http://bioresearch.byu.edu/scaffoldscaffolder.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Diploide , Genoma Humano , Heterozigoto , Análise de Sequência de DNA/métodos , Homologia de Sequência , Software , Algoritmos , Inteligência Artificial , Sequenciamento de Nucleotídeos em Larga Escala , Humanos

6.

Effects of error-correction of heterozygous next-generation sequencing data.

Fujimoto, M; Bodily, Paul M; Okuda, Nozomu; Clement, Mark J; Snell, Quinn.

BMC Bioinformatics ; 15 Suppl 7: S3, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25077414

RESUMO

BACKGROUND: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes. RESULTS: Quake and ECHO perform well and were able to correct many errors found within the data. However, errors that occur at heterozygous positions had unique trends. Errors at these positions were sometimes corrected incorrectly, introducing errors into the dataset with the possibility of creating a chimeric read. Quake was much less likely to create chimeric reads. Quake's read trimming removed a large portion of the original data and often left reads with few heterozygous markers. ECHO resulted in more chimeric reads and introduced more errors than Quake but preserved heterozygous markers. CONCLUSIONS: These findings suggest that Quake and ECHO both have strengths and weaknesses when applied to heterozygous data. With the increased interest in haplotype specific analysis, new tools that are designed to be haplotype-aware are necessary that do not have the weaknesses of Quake and ECHO.

Assuntos

Genômica/métodos , Heterozigoto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Diploide , Genoma , Haplótipos , Humanos

7.

Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data.

Hong, Changjin; Clement, Nathan L; Clement, Spencer; Hammoud, Saher Sue; Carrell, Douglas T; Cairns, Bradley R; Snell, Quinn; Clement, Mark J; Johnson, William Evan.

BMC Bioinformatics ; 14: 337, 2013 Nov 21.

Artigo em Inglês | MEDLINE | ID: mdl-24261665

RESUMO

BACKGROUND: DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample. RESULTS: Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods. CONCLUSIONS: The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap. The software runs on multiple threads and multiple processors to increase the alignment speed.

Assuntos

Alinhamento de Sequência/normas , Análise de Sequência de DNA , Sulfitos/química , Algoritmos , Inteligência Artificial , Sequência de Bases , Simulação por Computador , Metilação de DNA , Genoma Humano , Humanos , Probabilidade , Software , Sulfitos/normas

8.

Probabilistic inference and ranking of gene regulatory pathways as a shortest-path problem.

Jensen, James D; Jensen, Daniel M; Clement, Mark J; Snell, Quinn O.

BMC Bioinformatics ; 14 Suppl 13: S5, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24266986

RESUMO

BACKGROUND: Since the advent of microarray technology, numerous methods have been devised to infer gene regulatory relationships from gene expression data. Many approaches that infer entire regulatory networks. This produces results that are rich in information and yet so complex that they are often of limited usefulness for researchers. One alternative unit of regulatory interactions is a linear path between genes. Linear paths are more comprehensible than networks and still contain important information. Such paths can be extracted from inferred regulatory networks or inferred directly. Since criteria for inferring networks generally differs from criteria for inferring paths, indirect and direct inference of paths may achieve different results. RESULTS: This paper explores a strategy to infer linear pathways by converting the path inference problem into a shortest-path problem. The edge weights used are the negative log-transformed probabilities of directness derived from the posterior joint distributions of pairwise mutual information between gene expression levels. Directness is inferred using the data processing inequality. The method was designed with two goals. One is to achieve better accuracy in path inference than extraction of paths from inferred networks. The other is to facilitate priorization of interactions for laboratory validation. A method is proposed for achieving this by ranking paths according to the joint probability of directness of each path's edges. The algorithm is evaluated using simulated expression data and is compared to extraction of shortest paths from networks inferred by two alternative methods, ARACNe and a minimum spanning tree algorithm. CONCLUSIONS: Direct path inference appears to achieve accuracy competitive with that obtained by extracting paths from networks inferred by the other methods. Preliminary exploration of the use of joint edge probabilities to rank paths is largely inconclusive. Suggestions for a better framework for such comparisons are discussed.

Assuntos

Biologia Computacional/métodos , Árvores de Decisões , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Modelos Lineares , Algoritmos , Expressão Gênica , Humanos , Especificidade da Espécie

9.

Pathoscope: species identification and strain attribution with unassembled sequencing data.

Francis, Owen E; Bendall, Matthew; Manimaran, Solaiappan; Hong, Changjin; Clement, Nathan L; Castro-Nallar, Eduardo; Snell, Quinn; Schaalje, G Bruce; Clement, Mark J; Crandall, Keith A; Johnson, W Evan.

Genome Res ; 23(10): 1721-9, 2013 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-23843222

RESUMO

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly--which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico "environmental" samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.

Assuntos

Bactérias/classificação , Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Bacteriano , Análise de Sequência de DNA , Software , Algoritmos , Bacillus anthracis/genética , Teorema de Bayes , Bioterrorismo , Burkholderia mallei/genética , Burkholderia pseudomallei/genética , Clostridium botulinum/genética , Escherichia coli/genética , Infecções por Escherichia coli/microbiologia , Europa (Continente) , Francisella tularensis/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Especificidade da Espécie , Yersinia pestis/genética

10.

Phylogenetic search through partial tree mixing.

Sundberg, Kenneth; Clement, Mark; Snell, Quinn; Ventura, Dan; Whiting, Michael; Crandall, Keith.

BMC Bioinformatics ; 13 Suppl 13: S8, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23320449

RESUMO

BACKGROUND: Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. RESULTS: When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda CONCLUSIONS: The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution.

Assuntos

Algoritmos , Filogenia , Humanos , Software

11.

Accelerated large-scale multiple sequence alignment.

Lloyd, Scott; Snell, Quinn O.

BMC Bioinformatics ; 12: 466, 2011 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-22151470

RESUMO

BACKGROUND: Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware. RESULTS: We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor. CONCLUSIONS: Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from http://dna.cs.byu.edu/msa/.

Assuntos

Algoritmos , Alinhamento de Sequência/métodos , Análise de Sequência/métodos , DNA/química , Genômica , Motivos de Nucleotídeos , Proteínas/química , RNA/química

12.

Parallel Mapping Approaches for GNUMAP.

Clement, Nathan L; Clement, Mark J; Snell, Quinn; Johnson, W Evan.

Proc IPDPS (Conf) ; 2011: 435-443, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-23396612

RESUMO

Mapping short next-generation reads to reference genomes is an important element in SNP calling and expression studies. A major limitation to large-scale whole-genome mapping is the large memory requirements for the algorithm and the long run-time necessary for accurate studies. Several parallel implementations have been performed to distribute memory on different processors and to equally share the processing requirements. These approaches are compared with respect to their memory footprint, load balancing, and accuracy. When using MPI with multi-threading, linear speedup can be achieved for up to 256 processors.

13.

Analysis of long branch extraction and long branch shortening.

O'Connor, Timothy; Sundberg, Kenneth; Carroll, Hyrum; Clement, Mark; Snell, Quinn.

BMC Genomics ; 11 Suppl 2: S14, 2010 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-21047381

RESUMO

BACKGROUND: Long branch attraction (LBA) is a problem that afflicts both the parsimony and maximum likelihood phylogenetic analysis techniques. Research has shown that parsimony is particularly vulnerable to inferring the wrong tree in Felsenstein topologies. The long branch extraction method is a procedure to detect a data set suffering from this problem so that Maximum Likelihood could be used instead of Maximum Parsimony. RESULTS: The long branch extraction method has been well cited and used by many authors in their analysis but no strong validation has been performed as to its accuracy. We performed such an analysis by an extensive search of the branch length search space under two topologies of six taxa, a Felsenstein-like topology and Farris-like topology. We also examine a long branch shortening method. CONCLUSIONS: The long branch extraction method seems to mask the majority of the search space rendering it ineffective as a detection method of LBA. A proposed alternative, the long branch shortening method, is also ineffective in predicting long branch attraction for all tree topologies.

Assuntos

Biologia Computacional/métodos , Genômica/métodos , Funções Verossimilhança , Filogenia

14.

Inferring gene regulatory networks from asynchronous microarray data with AIRnet.

Oviatt, David; Clement, Mark; Snell, Quinn; Sundberg, Kenneth; Lai, Chun Wan J; Allen, Jared; Roper, Randall.

BMC Genomics ; 11 Suppl 2: S6, 2010 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-21047387

RESUMO

BACKGROUND: Modern approaches to treating genetic disorders, cancers and even epidemics rely on a detailed understanding of the underlying gene signaling network. Previous work has used time series microarray data to infer gene signaling networks given a large number of accurate time series samples. Microarray data available for many biological experiments is limited to a small number of arrays with little or no time series guarantees. When several samples are averaged to examine differences in mean value between a diseased and normal state, information from individual samples that could indicate a gene relationship can be lost. RESULTS: Asynchronous Inference of Regulatory Networks (AIRnet) provides gene signaling network inference using more practical assumptions about the microarray data. By learning correlation patterns for the changes in microarray values from all pairs of samples, accurate network reconstructions can be performed with data that is normally available in microarray experiments. CONCLUSIONS: By focussing on the changes between microarray samples, instead of absolute values, increased information can be gleaned from expression data.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos , Algoritmos , Animais , Perfilação da Expressão Gênica , Camundongos

15.

On the use of cartographic projections in visualizing phylo-genetic tree space.

Sundberg, Kenneth; Clement, Mark; Snell, Quinn.

Algorithms Mol Biol ; 5(1): 26, 2010 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-20529355

RESUMO

Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger data sets.

16.

PathGen: a transitive gene pathway generator.

Clement, Kendell; Gustafson, Nathaniel; Berbert, Amanda; Carroll, Hyrum; Merris, Christopher; Olsen, Ammon; Clement, Mark; Snell, Quinn; Allen, Jared; Roper, Randall J.

Bioinformatics ; 26(3): 423-5, 2010 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-19965882

RESUMO

SUMMARY: Many online sources of gene interaction networks supply rich visual data regarding gene pathways that can aid in the study of biological processes, disease research and drug discovery. PathGen incorporates data from several sources to create transitive connections that span multiple gene interaction databases. Results are displayed in a comprehensible graphical format, showing gene interaction type and strength, database source and microarray expression data. These features make PathGen a valuable tool for in silico discovery of novel gene interaction pathways, which can be experimentally tested and verified. The usefulness of PathGen interaction analyses was validated using genes connected to the altered facial development related to Down syndrome. AVAILABILITY: http://dna.cs.byu.edu/pathgen. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Further information is available at http://dna.cs.byu.edu/pathgen/PathGenSupplemental.pdf.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Software , Bases de Dados Genéticas , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Genes , Análise de Sequência com Séries de Oligonucleotídeos , Saccharomyces cerevisiae/genética

17.

The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing.

Clement, Nathan L; Snell, Quinn; Clement, Mark J; Hollenhorst, Peter C; Purwar, Jahnvi; Graves, Barbara J; Cairns, Bradley R; Johnson, W Evan.

Bioinformatics ; 26(1): 38-45, 2010 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-19861355

RESUMO

MOTIVATION: The advent of next-generation sequencing technologies has increased the accuracy and quantity of sequence data, opening the door to greater opportunities in genomic research. RESULTS: In this article, we present GNUMAP (Genomic Next-generation Universal MAPper), a program capable of overcoming two major obstacles in the mapping of reads from next-generation sequencing runs. First, we have created an algorithm that probabilistically maps reads to repeat regions in the genome on a quantitative basis. Second, we have developed a probabilistic Needleman-Wunsch algorithm which utilizes _prb.txt and _int.txt files produced in the Solexa/Illumina pipeline to improve the mapping accuracy for lower quality reads and increase the amount of usable data produced in a given experiment. AVAILABILITY: The source code for the software can be downloaded from http://dna.cs.byu.edu/gnumap.

Assuntos

Algoritmos , Mapeamento Cromossômico/métodos , DNA/genética , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Interpretação Estatística de Dados , Dados de Sequência Molecular

18.

An open source phylogenetic search and alignment package.

Carroll, Hyrum; Teichert, Adam R; Krein, Jonathan; Sundberg, Kenneth; Snell, Quinn; Clement, Mark.

Int J Bioinform Res Appl ; 5(3): 349-64, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19525205

RESUMO

PSODA is a comprehensive phylogenetics package, including alignment, phylogenetic search under both parsimony and maximum likelihood, and visualisation and analysis tools. PSODA offers performance comparable to PAUP* in an open source package that aims to provide a foundation for researchers examining new phylogenetic algorithms. A key new feature is PsodaScript, an extension to the nearly ubiquitous NEXUS format, that includes conditional and loop constructs; thereby allowing complex meta-search techniques like the parsimony ratchet to be easily and compactly implemented. PSODA promises to be a valuable tool in the future development of novel phylogenetic techniques. This paper seeks to familiarise researchers with PSODA and its features, in particular the internal scripting language, PsodaScript. PSODA is freely available from the PSODA.

Assuntos

Algoritmos , Biologia Computacional/métodos , Filogenia , Alinhamento de Sequência , Software , Interface Usuário-Computador

19.

Parsimony accelerated maximum likelihood searches.

Sundberg, Kenneth; O'Connor, Timothy; Carroll, Hyrum; Clement, Mark; Snell, Quinn.

Int J Comput Biol Drug Des ; 1(1): 74-87, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-20055002

RESUMO

Phylogenetic search is a key tool used in a variety of biological research endeavours. However, this search problem is known to be computationally difficult, due to the astronomically large search space, making the use of heuristic methods necessary. The performance of heuristic methods for finding Maximum Likelihood (ML) trees can be improved by using parsimony as an initial estimator for ML. The time spent in performing the parsimony search to boost performance is insignificant compared to the time spent in the ML search, leading to an overall gain in search time. These parsimony boosted ML searches lead to topologies with scores statistically similar to the unboosted searches, but in less time.

Assuntos

Funções Verossimilhança , Filogenia , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas , Modelos Genéticos , Alinhamento de Sequência/estatística & dados numéricos , Software

20.

Phylogenies scores for exhaustive searches and parsimony scores searches.

Carroll, Hyrum D; Ridge, Perry G; Clement, Mark J; Snell, Quinn O.

Int J Bioinform Res Appl ; 3(4): 493-503, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-18048315

RESUMO

Fundamental to Multiple Sequence Alignment (MSA) algorithms is modelling insertions and deletions (gaps). The most prevalent model is to use Gap Open Penalties (GOP) and Gap Extension Penalties (GEP). While GOP and GEP are well understood conceptually, their effects on MSA and consequently on phylogeny scores are not as well understood. We use exhaustive phylogeny searching to explore the effects of varying the GOP and GEP for three nuclear ribosomal data sets. Particular attention is given to optimal maximum likelihood and parsimony phylogeny scores for various alignments of a range of GOP and GEP and their respective distribution of phylogeny scores.

Assuntos

Biologia Computacional/métodos , Proteômica/métodos , Núcleo Celular/metabolismo , Análise por Conglomerados , DNA/química , Deleção de Genes , Funções Verossimilhança , Modelos Genéticos , Modelos Estatísticos , Modelos Teóricos , Filogenia , Reprodutibilidade dos Testes , Ribossomos/metabolismo , Análise de Sequência de DNA

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA