Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Syst Biol ; 71(3): 526-546, 2022 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-34324671

RESUMEN

Introgression is an important biological process affecting at least 10% of the extant species in the animal kingdom. Introgression significantly impacts inference of phylogenetic species relationships where a strictly binary tree model cannot adequately explain reticulate net-like species relationships. Here, we use phylogenomic approaches to understand patterns of introgression along the evolutionary history of a unique, nonmodel insect system: dragonflies and damselflies (Odonata). We demonstrate that introgression is a pervasive evolutionary force across various taxonomic levels within Odonata. In particular, we show that the morphologically "intermediate" species of Anisozygoptera (one of the three primary suborders within Odonata besides Zygoptera and Anisoptera), which retain phenotypic characteristics of the other two suborders, experienced high levels of introgression likely coming from zygopteran genomes. Additionally, we find evidence for multiple cases of deep inter-superfamilial ancestral introgression. [Gene flow; Odonata; phylogenomics; reticulate evolution.].


Asunto(s)
Odonata , Animales , Genoma , Insectos/anatomía & histología , Odonata/anatomía & histología , Odonata/genética , Filogenia
2.
Front Microbiol ; 11: 257, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32153541

RESUMEN

Bacterial antibiotic resistance is becoming a significant health threat, and rapid identification of antibiotic-resistant bacteria is essential to save lives and reduce the spread of antibiotic resistance. This paper analyzes the ability of machine learning algorithms (MLAs) to process data from a novel spectroscopic diagnostic device to identify antibiotic-resistant genes and bacterial species by comparison to available bacterial DNA sequences. Simulation results show that the algorithms attain from 92% accuracy (for genes) up to 99% accuracy (for species). This novel approach identifies genes and species by optically reading the percentage of A, C, G, T bases in 1000s of short 10-base DNA oligomers instead of relying on conventional DNA sequencing in which the sequence of bases in long oligomers provides genetic information. The identification algorithms are robust in the presence of simulated random genetic mutations and simulated random experimental errors. Thus, these algorithms can be used to identify bacterial species, to reveal antibiotic resistance genes, and to perform other genomic analyses. Some MLAs evaluated here are shown to be better than others at accurate gene identification and avoidance of false negative identification of antibiotic resistance.

3.
Bioinformatics ; 33(1): 125-127, 2017 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-27614349

RESUMEN

Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies. AVAILABILITY AND IMPLEMENTATION: https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Proteínas/química , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Anotación de Secuencia Molecular , Filogenia , Conformación Proteica , Proteínas/genética , Proteínas/metabolismo
4.
Mol Ecol ; 26(5): 1306-1322, 2017 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-27758014

RESUMEN

Gene duplication plays a central role in adaptation to novel environments by providing new genetic material for functional divergence and evolution of biological complexity. Several evolutionary models have been proposed for gene duplication to explain how new gene copies are preserved by natural selection, but these models have rarely been tested using empirical data. Opsin proteins, when combined with a chromophore, form a photopigment that is responsible for the absorption of light, the first step in the phototransduction cascade. Adaptive gene duplications have occurred many times within the animal opsins' gene family, leading to novel wavelength sensitivities. Consequently, opsins are an attractive choice for the study of gene duplication evolutionary models. Odonata (dragonflies and damselflies) have the largest opsin repertoire of any insect currently known. Additionally, there is tremendous variation in opsin copy number between species, particularly in the long-wavelength-sensitive (LWS) class. Using comprehensive phylotranscriptomic and statistical approaches, we tested various evolutionary models of gene duplication. Our results suggest that both the blue-sensitive (BS) and LWS opsin classes were subjected to strong positive selection that greatly weakens after multiple duplication events, a pattern that is consistent with the permanent heterozygote model. Due to the immense interspecific variation and duplicability potential of opsin genes among odonates, they represent a unique model system to test hypotheses regarding opsin gene duplication and diversification at the molecular level.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Odonata/genética , Opsinas/genética , Animales , Genes de Insecto , Heterocigoto , Filogenia
5.
BMC Bioinformatics ; 17 Suppl 7: 268, 2016 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-27453991

RESUMEN

BACKGROUND: Genome-wide association studies (GWAS) have effectively identified genetic factors for many diseases. Many diseases, including Alzheimer's disease (AD), have epistatic causes, requiring more sophisticated analyses to identify groups of variants which together affect phenotype. RESULTS: Based on the GWAS statistical model, we developed a multi-SNP GWAS analysis to identify pairs of variants whose common occurrence signaled the Alzheimer's disease phenotype. CONCLUSIONS: Despite not having sufficient data to demonstrate significance, our preliminary experimentation identified a high correlation between GRIA3 and HLA-DRB5 (an AD gene). GRIA3 has not been previously reported in association with AD, but is known to play a role in learning and memory.


Asunto(s)
Enfermedad de Alzheimer/genética , Biología Computacional/métodos , Epistasis Genética , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Enfermedad de Alzheimer/metabolismo , Femenino , Predisposición Genética a la Enfermedad , Cadenas HLA-DRB5/genética , Humanos , Masculino , Modelos Estadísticos , Receptores AMPA/genética
6.
BMC Bioinformatics ; 17: 101, 2016 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-26911862

RESUMEN

BACKGROUND: Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. RESULTS: In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. CONCLUSIONS: Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.


Asunto(s)
Aprendizaje Automático , Homología de Secuencia , Reacciones Falso Positivas , Alineación de Secuencia
7.
Bioinformatics ; 32(1): 17-24, 2016 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-26382194

RESUMEN

MOTIVATION: The contig orientation problem, which we formally define as the MAX-DIR problem, has at times been addressed cursorily and at times using various heuristics. In setting forth a linear-time reduction from the MAX-CUT problem to the MAX-DIR problem, we prove the latter is NP-complete. We compare the relative performance of a novel greedy approach with several other heuristic solutions. RESULTS: Our results suggest that our greedy heuristic algorithm not only works well but also outperforms the other algorithms due to the nature of scaffold graphs. Our results also demonstrate a novel method for identifying inverted repeats and inversion variants, both of which contradict the basic single-orientation assumption. Such inversions have previously been noted as being difficult to detect and are directly involved in the genetic mechanisms of several diseases. AVAILABILITY AND IMPLEMENTATION: http://bioresearch.byu.edu/scaffoldscaffolder. CONTACT: paulmbodily@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Mapeo Contig/métodos
8.
BMC Genomics ; 16: 353, 2015 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-25943316

RESUMEN

BACKGROUND: Improvement of crop production is needed to feed the growing world population as the amount and quality of agricultural land decreases and soil salinity increases. This has stimulated research on salt tolerance in plants. Most crops tolerate a limited amount of salt to survive and produce biomass, while halophytes (salt-tolerant plants) have the ability to grow with saline water utilizing specific biochemical mechanisms. However, little is known about the genes involved in salt tolerance. We have characterized the transcriptome of Suaeda fruticosa, a halophyte that has the ability to sequester salts in its leaves. Suaeda fruticosa is an annual shrub in the family Chenopodiaceae found in coastal and inland regions of Pakistan and Mediterranean shores. This plant is an obligate halophyte that grows optimally from 200-400 mM NaCl and can grow at up to 1000 mM NaCl. High throughput sequencing technology was performed to provide understanding of genes involved in the salt tolerance mechanism. De novo assembly of the transcriptome and analysis has allowed identification of differentially expressed and unique genes present in this non-conventional crop. RESULTS: Twelve sequencing libraries prepared from control (0 mM NaCl treated) and optimum (300 mM NaCl treated) plants were sequenced using Illumina Hiseq 2000 to investigate differential gene expression between shoots and roots of Suaeda fruticosa. The transcriptome was assembled de novo using Velvet and Oases k-45 and clustered using CDHIT-EST. There are 54,526 unigenes; among these 475 genes are downregulated and 44 are upregulated when samples from plants grown under optimal salt are compared with those grown without salt. BLAST analysis identified the differentially expressed genes, which were categorized in gene ontology terms and their pathways. CONCLUSIONS: This work has identified potential genes involved in salt tolerance in Suaeda fruticosa, and has provided an outline of tools to use for de novo transcriptome analysis. The assemblies that were used provide coverage of a considerable proportion of the transcriptome, which allows analysis of differential gene expression and identification of genes that may be involved in salt tolerance. The transcriptome may serve as a reference sequence for study of other succulent halophytes.


Asunto(s)
Chenopodiaceae/genética , Chenopodiaceae/fisiología , Perfilación de la Expresión Génica , Salinidad , Cloruro de Sodio/farmacología , Chenopodiaceae/efectos de los fármacos , Chenopodiaceae/metabolismo , Etiquetas de Secuencia Expresada/metabolismo , Ontología de Genes , Anotación de Secuencia Molecular , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , ARN de Planta/genética
9.
BMC Bioinformatics ; 16 Suppl 7: S5, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25952609

RESUMEN

BACKGROUND: Genome assemblers to date have predominantly targeted haploid reference reconstruction from homozygous data. When applied to diploid genome assembly, these assemblers perform poorly, owing to the violation of assumptions during both the contigging and scaffolding phases. Effective tools to overcome these problems are in growing demand. Increasing parameter stringency during contigging is an effective solution to obtaining haplotype-specific contigs; however, effective algorithms for scaffolding such contigs are lacking. METHODS: We present a stand-alone scaffolding algorithm, ScaffoldScaffolder, designed specifically for scaffolding diploid genomes. The algorithm identifies homologous sequences as found in "bubble" structures in scaffold graphs. Machine learning classification is used to then classify sequences in partial bubbles as homologous or non-homologous sequences prior to reconstructing haplotype-specific scaffolds. We define four new metrics for assessing diploid scaffolding accuracy: contig sequencing depth, contig homogeneity, phase group homogeneity, and heterogeneity between phase groups. RESULTS: We demonstrate the viability of using bubbles to identify heterozygous homologous contigs, which we term homolotigs. We show that machine learning classification trained on these homolotig pairs can be used effectively for identifying homologous sequences elsewhere in the data with high precision (assuming error-free reads). CONCLUSION: More work is required to comparatively analyze this approach on real data with various parameters and classifiers against other diploid genome assembly methods. However, the initial results of ScaffoldScaffolder supply validity to the idea of employing machine learning in the difficult task of diploid genome assembly. Software is available at http://bioresearch.byu.edu/scaffoldscaffolder.


Asunto(s)
Mapeo Contig/métodos , Diploidia , Genoma Humano , Heterocigoto , Análisis de Secuencia de ADN/métodos , Homología de Secuencia , Programas Informáticos , Algoritmos , Inteligencia Artificial , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos
10.
BMC Bioinformatics ; 15 Suppl 7: S3, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25077414

RESUMEN

BACKGROUND: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes. RESULTS: Quake and ECHO perform well and were able to correct many errors found within the data. However, errors that occur at heterozygous positions had unique trends. Errors at these positions were sometimes corrected incorrectly, introducing errors into the dataset with the possibility of creating a chimeric read. Quake was much less likely to create chimeric reads. Quake's read trimming removed a large portion of the original data and often left reads with few heterozygous markers. ECHO resulted in more chimeric reads and introduced more errors than Quake but preserved heterozygous markers. CONCLUSIONS: These findings suggest that Quake and ECHO both have strengths and weaknesses when applied to heterozygous data. With the increased interest in haplotype specific analysis, new tools that are designed to be haplotype-aware are necessary that do not have the weaknesses of Quake and ECHO.


Asunto(s)
Genómica/métodos , Heterocigoto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Diploidia , Genoma , Haplotipos , Humanos
11.
BMC Bioinformatics ; 14: 337, 2013 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-24261665

RESUMEN

BACKGROUND: DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample. RESULTS: Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods. CONCLUSIONS: The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap. The software runs on multiple threads and multiple processors to increase the alignment speed.


Asunto(s)
Alineación de Secuencia/normas , Análisis de Secuencia de ADN , Sulfitos/química , Algoritmos , Inteligencia Artificial , Secuencia de Bases , Simulación por Computador , Metilación de ADN , Genoma Humano , Humanos , Probabilidad , Programas Informáticos , Sulfitos/normas
12.
BMC Bioinformatics ; 14 Suppl 13: S5, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24266986

RESUMEN

BACKGROUND: Since the advent of microarray technology, numerous methods have been devised to infer gene regulatory relationships from gene expression data. Many approaches that infer entire regulatory networks. This produces results that are rich in information and yet so complex that they are often of limited usefulness for researchers. One alternative unit of regulatory interactions is a linear path between genes. Linear paths are more comprehensible than networks and still contain important information. Such paths can be extracted from inferred regulatory networks or inferred directly. Since criteria for inferring networks generally differs from criteria for inferring paths, indirect and direct inference of paths may achieve different results. RESULTS: This paper explores a strategy to infer linear pathways by converting the path inference problem into a shortest-path problem. The edge weights used are the negative log-transformed probabilities of directness derived from the posterior joint distributions of pairwise mutual information between gene expression levels. Directness is inferred using the data processing inequality. The method was designed with two goals. One is to achieve better accuracy in path inference than extraction of paths from inferred networks. The other is to facilitate priorization of interactions for laboratory validation. A method is proposed for achieving this by ranking paths according to the joint probability of directness of each path's edges. The algorithm is evaluated using simulated expression data and is compared to extraction of shortest paths from networks inferred by two alternative methods, ARACNe and a minimum spanning tree algorithm. CONCLUSIONS: Direct path inference appears to achieve accuracy competitive with that obtained by extracting paths from networks inferred by the other methods. Preliminary exploration of the use of joint edge probabilities to rank paths is largely inconclusive. Suggestions for a better framework for such comparisons are discussed.


Asunto(s)
Biología Computacional/métodos , Árboles de Decisión , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Modelos Lineales , Algoritmos , Expresión Génica , Humanos , Especificidad de la Especie
13.
Genome Res ; 23(10): 1721-9, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-23843222

RESUMEN

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly--which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico "environmental" samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.


Asunto(s)
Bacterias/clasificación , Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Bacteriano , Análisis de Secuencia de ADN , Programas Informáticos , Algoritmos , Bacillus anthracis/genética , Teorema de Bayes , Bioterrorismo , Burkholderia mallei/genética , Burkholderia pseudomallei/genética , Clostridium botulinum/genética , Escherichia coli/genética , Infecciones por Escherichia coli/microbiología , Europa (Continente) , Francisella tularensis/genética , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Especificidad de la Especie , Yersinia pestis/genética
14.
Am J Med Genet A ; 161A(8): 1866-74, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23843306

RESUMEN

Trisomy 21 in humans causes cognitive impairment, craniofacial dysmorphology, and heart defects collectively referred to as Down syndrome. Yet, the pathophysiology of these phenotypes is not well understood. Craniofacial alterations may lead to complications in breathing, eating, and communication. Ts65Dn mice exhibit craniofacial alterations that model Down syndrome including a small mandible. We show that Ts65Dn embryos at 13.5 days gestation (E13.5) have a smaller mandibular precursor but a normal sized tongue as compared to euploid embryos, suggesting a relative instead of actual macroglossia originates during development. Neurological tissues were also altered in E13.5 trisomic embryos. Our array analysis found 155 differentially expressed non-trisomic genes in the trisomic E13.5 mandible, including 20 genes containing a homeobox DNA binding domain. Additionally, Sox9, important in skeletal formation and cell proliferation, was upregulated in Ts65Dn mandible precursors. Our results suggest trisomy causes altered expression of non-trisomic genes in development leading to structural changes associated with DS. Identification of genetic pathways disrupted by trisomy is an important step in proposing rational therapies at relevant time points to ameliorate craniofacial abnormalities in DS and other congenital disorders.


Asunto(s)
Anomalías Craneofaciales/genética , Modelos Animales de Enfermedad , Síndrome de Down/genética , Embrión de Mamíferos/metabolismo , Trisomía/genética , Animales , Biomarcadores/metabolismo , Proliferación Celular , Anomalías Craneofaciales/metabolismo , Anomalías Craneofaciales/patología , Embrión de Mamíferos/patología , Femenino , Perfilación de la Expresión Génica , Mandíbula/anomalías , Mandíbula/metabolismo , Mandíbula/patología , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos , Fenotipo , ARN Mensajero/genética , Reacción en Cadena en Tiempo Real de la Polimerasa , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Factor de Transcripción SOX9
15.
Mol Plant Microbe Interact ; 25(8): 1026-33, 2012 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-22746823

RESUMEN

The genetic rules that dictate legume-rhizobium compatibility have been investigated for decades, but the causes of incompatibility occurring at late stages of the nodulation process are not well understood. An evaluation of naturally diverse legume (genus Medicago) and rhizobium (genus Sinorhizobium) isolates has revealed numerous instances in which Sinorhizobium strains induce and occupy nodules that are only minimally beneficial to certain Medicago hosts. Using these ineffective strain-host pairs, we identified gain-of-compatibility (GOC) rhizobial variants. We show that GOC variants arise by loss of specific large accessory plasmids, which we call HR plasmids due to their effect on symbiotic host range. Transfer of HR plasmids to a symbiotically effective rhizobium strain can convert it to incompatibility, indicating that HR plasmids can act autonomously in diverse strain backgrounds. We provide evidence that HR plasmids may encode machinery for their horizontal transfer. On hosts in which HR plasmids impair N fixation, the plasmids also enhance competitiveness for nodule occupancy, showing that naturally occurring, transferrable accessory genes can convert beneficial rhizobia to a more exploitative lifestyle. This observation raises important questions about agricultural management, the ecological stability of mutualisms, and the genetic factors that distinguish beneficial symbionts from parasites.


Asunto(s)
Medicago/microbiología , Fijación del Nitrógeno/genética , Rhizobium/genética , Simbiosis/genética , Transferencia de Gen Horizontal , Datos de Secuencia Molecular , Fenotipo , Plásmidos , Nódulos de las Raíces de las Plantas/microbiología , Sinorhizobium/genética
16.
BMC Bioinformatics ; 13 Suppl 13: S8, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23320449

RESUMEN

BACKGROUND: Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. RESULTS: When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda CONCLUSIONS: The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution.


Asunto(s)
Algoritmos , Filogenia , Humanos , Programas Informáticos
17.
Genome Biol Evol ; 3: 1312-23, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22002916

RESUMEN

Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach.


Asunto(s)
Filogenia , Análisis de Secuencia de ADN/métodos , Animales , Biología Computacional , Crustáceos/genética , Perfilación de la Expresión Génica/métodos , Genoma , Transcriptoma
18.
Discov Med ; 12(62): 41-55, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21794208

RESUMEN

Exome sequencing has identified the causes of several Mendelian diseases, although it has rarely been used in a clinical setting to diagnose the genetic cause of an idiopathic disorder in a single patient. We performed exome sequencing on a pedigree with several members affected with attention deficit/hyperactivity disorder (ADHD), in an effort to identify candidate variants predisposing to this complex disease. While we did identify some rare variants that might predispose to ADHD, we have not yet proven the causality for any of them. However, over the course of the study, one subject was discovered to have idiopathic hemolytic anemia (IHA), which was suspected to be genetic in origin. Analysis of this subject's exome readily identified two rare non-synonymous mutations in PKLR gene as the most likely cause of the IHA, although these two mutations had not been documented before in a single individual. We further confirmed the deficiency by functional biochemical testing, consistent with a diagnosis of red blood cell pyruvate kinase deficiency. Our study implies that exome and genome sequencing will certainly reveal additional rare variation causative for even well-studied classical Mendelian diseases, while also revealing variants that might play a role in complex diseases. Furthermore, our study has clinical and ethical implications for exome and genome sequencing in a research setting; how to handle unrelated findings of clinical significance, in the context of originally planned complex disease research, remains a largely uncharted area for clinicians and researchers.


Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad/genética , Ética en Investigación , Exones/genética , Análisis de Secuencia de ADN/métodos , Secuencia de Aminoácidos , Anemia Hemolítica Autoinmune/complicaciones , Anemia Hemolítica Autoinmune/enzimología , Anemia Hemolítica Autoinmune/genética , Trastorno por Déficit de Atención con Hiperactividad/complicaciones , Variaciones en el Número de Copia de ADN/genética , Femenino , Genoma Humano/genética , Humanos , Masculino , Datos de Secuencia Molecular , Linaje , Piruvato Quinasa/química , Piruvato Quinasa/genética , Reproducibilidad de los Resultados , Programas Informáticos
19.
Proc IPDPS (Conf) ; 2011: 435-443, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-23396612

RESUMEN

Mapping short next-generation reads to reference genomes is an important element in SNP calling and expression studies. A major limitation to large-scale whole-genome mapping is the large memory requirements for the algorithm and the long run-time necessary for accurate studies. Several parallel implementations have been performed to distribute memory on different processors and to equally share the processing requirements. These approaches are compared with respect to their memory footprint, load balancing, and accuracy. When using MPI with multi-threading, linear speedup can be achieved for up to 256 processors.

20.
BMC Genomics ; 11 Suppl 2: S14, 2010 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-21047381

RESUMEN

BACKGROUND: Long branch attraction (LBA) is a problem that afflicts both the parsimony and maximum likelihood phylogenetic analysis techniques. Research has shown that parsimony is particularly vulnerable to inferring the wrong tree in Felsenstein topologies. The long branch extraction method is a procedure to detect a data set suffering from this problem so that Maximum Likelihood could be used instead of Maximum Parsimony. RESULTS: The long branch extraction method has been well cited and used by many authors in their analysis but no strong validation has been performed as to its accuracy. We performed such an analysis by an extensive search of the branch length search space under two topologies of six taxa, a Felsenstein-like topology and Farris-like topology. We also examine a long branch shortening method. CONCLUSIONS: The long branch extraction method seems to mask the majority of the search space rendering it ineffective as a detection method of LBA. A proposed alternative, the long branch shortening method, is also ineffective in predicting long branch attraction for all tree topologies.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Funciones de Verosimilitud , Filogenia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...