Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Appl Plant Sci ; 10(1): e11455, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35228913

RESUMO

PREMISE: DNA-based species identification is critical when morphological identification is restricted, but DNA-based identification pipelines typically rely on the ability to compare homologous sequence data across species. Because many clades lack robust genomic resources, we present here a bioinformatics pipeline capable of generating genome-wide single-nucleotide polymorphism (SNP) data while circumventing the need for any reference genome or annotation data. METHODS: Using the SISRS bioinformatics pipeline, we generated de novo ortholog data for the genus Carya, isolating sites where genetic variation was restricted to a single Carya species (i.e., species-informative SNPs). We leveraged these SNPs to identify both full-species and hybrid Carya specimens, even at very low sequencing depths. RESULTS: We identified between 46,000 and 476,000 species-identifying SNPs for each of eight diploid Carya species, and all species identifications were concordant with the species of record. For all putative F1 hybrid specimens, both parental species were correctly identified in all cases, and more punctate patterns of introgression were detectable in more cryptic crosses. DISCUSSION: Bioinformatics pipelines that use only short-read sequencing data provide vital new tools enabling rapid expansion of DNA identification assays for model and non-model clades alike.

2.
Philos Trans R Soc Lond B Biol Sci ; 376(1825): 20200164, 2021 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-33813893

RESUMO

Genomic structural variation is an important source of genetic and phenotypic diversity, playing a critical role in evolution. The recent availability of a high-quality reference genome for the eastern oyster, Crassostrea virginica, and whole-genome sequence data of samples from across the species range in the USA, provides an opportunity to explore structural variation across the genome of this species. Our analysis shows significantly greater individual-level duplications of regions across the genome than that of most model vertebrate species. Duplications are widespread across all ten chromosomes with variation in frequency per chromosome. The eastern oyster shows a large interindividual variation in duplications as well as particular chromosomal regions with a higher density of duplications. A high percentage of duplications seen in C. virginica lie completely within genes and exons, suggesting the potential for impacts on gene function. These results support the hypothesis that structural changes may play a significant role in standing genetic variation in C. virginica, and potentially have a role in their adaptive and evolutionary success. Altogether, these results suggest that copy number variation plays an important role in the genomic variation of C. virginica. This article is part of the Theo Murphy meeting issue 'Molluscan genomics: broad insights and future directions for a neglected phylum'.


Assuntos
Crassostrea/genética , Variações do Número de Cópias de DNA , Duplicação Gênica , Genoma , Animais , Cromossomos
3.
F1000Res ; 8: 1854, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32025290

RESUMO

Many biologists are interested in teaching computing skills or using computing in the classroom, despite not being formally trained in these skills themselves. Thus biologists may find themselves researching how to teach these skills, and therefore many individuals are individually attempting to discover resources and methods to do so. Recent years have seen an expansion of new technologies to assist in delivering course content interactively. Educational research provides insights into how learners absorb and process information during interactive learning. In this review, we discuss the value of teaching foundational computing skills to biologists, and strategies and tools to do so. Additionally, we review the literature on teaching practices to support the development of these skills. We pay special attention to meeting the needs of diverse learners, and consider how different ways of delivering course content can be leveraged to provide a more inclusive classroom experience. Our goal is to enable biologists to teach computational skills and use computing in the classroom successfully.


Assuntos
Biologia , Metodologias Computacionais , Biologia/educação , Sistemas Computacionais
4.
Bioinformatics ; 33(15): 2322-2329, 2017 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-28334373

RESUMO

MOTIVATION: Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. RESULTS: We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. AVAILABILITY AND IMPLEMENTATION: Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). CONTACT: cartwright@asu.edu. SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.


Assuntos
Variações do Número de Cópias de DNA , Genoma Humano , Modelos Estatísticos , Sequenciamento Completo do Genoma/métodos , Genômica/métodos , Genômica/normas , Técnicas de Genotipagem/métodos , Técnicas de Genotipagem/normas , Humanos , Sensibilidade e Especificidade , Distribuições Estatísticas , Sequenciamento Completo do Genoma/normas
5.
BMC Evol Biol ; 17(1): 45, 2017 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-28173751

RESUMO

BACKGROUND: Blindness has evolved repeatedly in cave-dwelling organisms, and many hypotheses have been proposed to explain this observation, including both accumulation of neutral loss-of-function mutations and adaptation to darkness. Investigating the loss of sight in cave dwellers presents an opportunity to understand the operation of fundamental evolutionary processes, including drift, selection, mutation, and migration. RESULTS: Here we model the evolution of blindness in caves. This model captures the interaction of three forces: (1) selection favoring alleles causing blindness, (2) immigration of sightedness alleles from a surface population, and (3) mutations creating blindness alleles. We investigated the dynamics of this model and determined selection-strength thresholds that result in blindness evolving in caves despite immigration of sightedness alleles from the surface. We estimate that the selection coefficient for blindness would need to be at least 0.005 (and maybe as high as 0.5) for blindness to evolve in the model cave-organism, Astyanax mexicanus. CONCLUSIONS: Our results indicate that strong selection is required for the evolution of blindness in cave-dwelling organisms, which is consistent with recent work suggesting a high metabolic cost of eye development.


Assuntos
Adaptação Biológica , Evolução Biológica , Characidae/fisiologia , Visão Ocular , Alelos , Animais , Cavernas , Characidae/genética , Escuridão , Modelos Genéticos
6.
Infect Genet Evol ; 38: 101-109, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26708057

RESUMO

Leishmania, a genus of parasites transmitted to human hosts and mammalian/reptilian reservoirs by an insect vector, is the causative agent of the human disease complex leishmaniasis. The evolutionary relationships within the genus Leishmania and its origins are the source of ongoing debate, reflected in conflicting phylogenetic and biogeographic reconstructions. This study employs a recently described bioinformatics method, SISRS, to identify over 200,000 informative sites across the genome from newly sequenced and publicly available Leishmania data. This dataset is used to reconstruct the evolutionary relationships of this genus. Additionally, we constructed a large multi-gene dataset, using it to reconstruct the phylogeny and estimate divergence dates for species. We conclude that the genus Leishmania evolved at least 90-100 million years ago, supporting a modified version of the Multiple Origins hypothesis that we call the Supercontinent hypothesis. According to this scenario, separate Leishmania clades emerged prior to, and during, the breakup of Gondwana. Additionally, we confirm that reptile-infecting Leishmania are derived from mammalian forms and that the species that infect porcupines and sloths form a clade long separated from other species. Finally, we firmly place the guinea-pig infecting species, Leishmaniaenriettii, the globally dispersed Leishmaniasiamensis, and the newly identified Australian species from a kangaroo, as sibling species whose distribution arises from the ancient connection between Australia, Antarctica, and South America.


Assuntos
Genoma Helmíntico , Genômica , Leishmania/classificação , Leishmania/genética , Filogenia , Evolução Molecular , Genes de Helmintos , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Leishmaniose/parasitologia
7.
PLoS Negl Trop Dis ; 9(12): e0004252, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26709695

RESUMO

Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations it affects. The control and eventual elimination of P. vivax are global health priorities. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole-genome data from eight field samples from a region in Cordóba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this species. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 polymorphic and easy to score microsatellite loci that can be used in epidemiological investigations in South America.


Assuntos
Variação Genética , Genoma de Protozoário/genética , Malária Vivax/parasitologia , Repetições de Microssatélites/genética , Plasmodium vivax/genética , Adolescente , Antimaláricos/uso terapêutico , Sequência de Bases , Criança , Colômbia/epidemiologia , Combinação de Medicamentos , Resistência a Medicamentos , Feminino , Estudo de Associação Genômica Ampla , Humanos , Malária Vivax/tratamento farmacológico , Malária Vivax/epidemiologia , Dados de Sequência Molecular , Plasmodium vivax/efeitos dos fármacos , Plasmodium vivax/isolamento & purificação , Pirimetamina/uso terapêutico , Análise de Sequência de DNA , Sulfadoxina/uso terapêutico
8.
BMC Bioinformatics ; 16: 193, 2015 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-26062548

RESUMO

BACKGROUND: Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. RESULTS: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. CONCLUSIONS: SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Hominidae/genética , Mamíferos/genética , Filogenia , Análise de Sequência de DNA/métodos , Software , Animais , Genoma , Genômica/métodos
9.
Nat Methods ; 10(10): 985-7, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23975140

RESUMO

We present DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis and fragment information to identify the parental origin of germ-line mutations. We used DeNovoGear on human whole-genome sequencing data to produce a set of predicted de novo insertion and/or deletion (indel) mutations with a 95% validation rate.


Assuntos
Genoma Humano/genética , Mutação INDEL , Modelos Genéticos , Mutação Puntual , Software , Exoma , Deleção de Genes , Projeto Genoma Humano , Humanos , Funções Verossimilhança , Mutagênese Insercional
10.
PLoS One ; 5(3): e9649, 2010 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-20300176

RESUMO

BACKGROUND: The observation of variation in substitution rates among lineages has led to (1) a general rejection of the molecular clock model, and (2) the suggestion that a number of biological characteristics of organisms can cause rate variation. Accurate estimates of rate variation, and thus accurate inferences regarding the causes of rate variation, depend on accurate estimates of substitution rates. However, theory suggests that even when the substitution process is clock-like, variable numbers of substitutions can occur among lineages because the substitution process is stochastic. Furthermore, substitution rates along lineages can be misestimated, particularly when multiple substitutions occur at some sites. Although these potential causes of error in rate estimation are well understood in theory, such error has not been examined in detail; consequently, empirical studies that estimate rate variation among lineages have been unable to determine whether their results could be impacted by estimation error. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate the extent to which error in rate estimation could erroneously suggest rate variation among lineages, we examined rate variation estimated for datasets simulated under a molecular clock on trees with equal and variable branch lengths. Thus, any apparent rate variation in these datasets reflects error in rate estimation rather than true differences in the underlying substitution process. We observed substantial rate variation among lineages in our simulations; however, we did not observe rate variation when average substitution rates were compared between different clades. CONCLUSIONS/SIGNIFICANCE: Our results confirm previous theoretical work suggesting that observations of among lineage rate variation in empirical data may be due to the stochastic substitution process and error in the estimation of substitution rates, rather than true differences in the underlying substitution process among lineages. However, conclusions regarding rate variation drawn from rates averaged across multiple branches are likely due to real, systematic variation in rates between groups.


Assuntos
DNA/genética , Mutação , Algoritmos , Animais , Teorema de Bayes , Relógios Biológicos , Linhagem da Célula , Simulação por Computador , Modelos Genéticos , Distribuição de Poisson , Reprodutibilidade dos Testes , Urodelos
11.
Mol Phylogenet Evol ; 54(3): 849-56, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20045073

RESUMO

Variation in substitution rates among evolutionary lineages (among-lineage rate variation or ALRV) has been reported to negatively affect the estimation of phylogenies. When the substitution processes underlying ALRV are modeled inadequately, non-sister taxa with similar substitution rates are estimated incorrectly as sister species due to long-branch attraction. Recent advances in modeling site-specific rate variation (heterotachy) have reduced the impacts of ALRV on phylogeny estimation in several empirical and simulated datasets. However, the addition of parameters to the substitution model reduces power to estimate each parameter correctly, which can also lead to incorrect phylogeny estimation. A potential solution to this problem is to identify the levels of ALRV that negatively impact phylogeny estimation such that molecular markers with non-deleterious levels of ALRV can be identified. To this end, we used analyses of empirical and simulated gene datasets to evaluate whether levels of ALRV identified in a mitochondrial genomic dataset for salamanders negatively impacted phylogeny estimation. We simulated data with and without ALRV, holding all other evolutionary parameters constant, and compared the phylogenetic performance of both simulated and empirical datasets. Overall, we found limited, positive effects of ALRV on phylogeny estimation in this dataset, the majority of which resulted from an increase in substitution rate on short branches. We conclude that ALRV does not always negatively impact phylogeny estimation. Therefore, ALRV can likely be disregarded as a criterion for marker selection in comparable phylogenetic studies.


Assuntos
Simulação por Computador , Evolução Molecular , Modelos Genéticos , Filogenia , DNA Mitocondrial/genética , Marcadores Genéticos , Análise de Sequência de DNA
12.
BMC Evol Biol ; 10: 5, 2010 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-20064267

RESUMO

BACKGROUND: Estimates of divergence dates between species improve our understanding of processes ranging from nucleotide substitution to speciation. Such estimates are frequently based on molecular genetic differences between species; therefore, they rely on accurate estimates of the number of such differences (i.e. substitutions per site, measured as branch length on phylogenies). We used simulations to determine the effects of dataset size, branch length heterogeneity, branch depth, and analytical framework on branch length estimation across a range of branch lengths. We then reanalyzed an empirical dataset for plethodontid salamanders to determine how inaccurate branch length estimation can affect estimates of divergence dates. RESULTS: The accuracy of branch length estimation varied with branch length, dataset size (both number of taxa and sites), branch length heterogeneity, branch depth, dataset complexity, and analytical framework. For simple phylogenies analyzed in a Bayesian framework, branches were increasingly underestimated as branch length increased; in a maximum likelihood framework, longer branch lengths were somewhat overestimated. Longer datasets improved estimates in both frameworks; however, when the number of taxa was increased, estimation accuracy for deeper branches was less than for tip branches. Increasing the complexity of the dataset produced more misestimated branches in a Bayesian framework; however, in an ML framework, more branches were estimated more accurately. Using ML branch length estimates to re-estimate plethodontid salamander divergence dates generally resulted in an increase in the estimated age of older nodes and a decrease in the estimated age of younger nodes. CONCLUSIONS: Branch lengths are misestimated in both statistical frameworks for simulations of simple datasets. However, for complex datasets, length estimates are quite accurate in ML (even for short datasets), whereas few branches are estimated accurately in a Bayesian framework. Our reanalysis of empirical data demonstrates the magnitude of effects of Bayesian branch length misestimation on divergence date estimates. Because the length of branches for empirical datasets can be estimated most reliably in an ML framework when branches are <1 substitution/site and datasets are > or =1 kb, we suggest that divergence date estimates using datasets, branch lengths, and/or analytical techniques that fall outside of these parameters should be interpreted with caution.


Assuntos
Teorema de Bayes , Evolução Molecular , Modelos Genéticos , Modelos Estatísticos , Animais , Simulação por Computador , Funções Verossimilhança , Filogenia , Urodelos/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA