Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Front Plant Sci ; 15: 1342739, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38525148

RESUMO

Introduction: Solanum chilense is a wild relative of tomato reported to exhibit resistance to biotic and abiotic stresses. There is potential to improve tomato cultivars via breeding with wild relatives, a process greatly accelerated by suitable genomic and genetic resources. Methods: In this study we generated a high-quality, chromosome-level, de novo assembly for the S. chilense accession LA1972 using a hybrid assembly strategy with ~180 Gbp of Illumina short reads and ~50 Gbp long PacBio reads. Further scaffolding was performed using Bionano optical maps and 10x Chromium reads. Results: The resulting sequences were arranged into 12 pseudomolecules using Hi-C sequencing. This resulted in a 901 Mbp assembly, with a completeness of 95%, as determined by Benchmarking with Universal Single-Copy Orthologs (BUSCO). Sequencing of RNA from multiple tissues resulting in ~219 Gbp of reads was used to annotate the genome assembly with an RNA-Seq guided gene prediction, and for a de novo transcriptome assembly. This chromosome-level, high-quality reference genome for S. chilense accession LA1972 will support future breeding efforts for more sustainable tomato production. Discussion: Gene sequences related to drought and salt resistance were compared between S. chilense and S. lycopersicum to identify amino acid variations with high potential for functional impact. These variants were subsequently analysed in 84 resequenced tomato lines across 12 different related species to explore the variant distributions. We identified a set of 7 putative impactful amino acid variants some of which may also impact on fruit development for example the ethylene-responsive transcription factor WIN1 and ethylene-insensitive protein 2. These variants could be tested for their ability to confer functional phenotypes to cultivars that have lost these variants.

2.
Curr Genomics ; 23(6): 400-411, 2023 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-37920557

RESUMO

Background: The white-backed planthopper (WBPH), Sogatella furcifera, causes great damage to many crops (mainly rice) by direct feeding or transmitting plant viruses. The previous genome assembly was generated by second-generation sequencing technologies, with a contig N50 of only 51.5 kb, and contained a lot of heterozygous sequences. Methods: We utilized third-generation sequencing technologies and Hi-C data to generate a high-quality chromosome-level assembly. We also provide a large amount of transcriptome data for full-length transcriptome analysis and gender differential expression analysis. Results: The final assembly comprised 56.38 Mb, with a contig N50 of 2.20 Mb and a scaffold N50 of 45.25 Mb. Fourteen autosomes and one X chromosome were identified. More than 99.5% of the assembled bases located on the 15 chromosomes. 95.9% of the complete BUSCO Hemiptera genes were detected in the final assembly and 16,880 genes were annotated. 722 genes were relatively highly expressed in males, while 60 in the females. Conclusion: The integrated genome, definite sex chromosomes, comprehensive transcriptome profiles, high efficiency of RNA interference and short life cycle substantially made WBPH an efficient research object for functional genomics.

3.
G3 (Bethesda) ; 13(8)2023 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-37310934

RESUMO

DNA is compacted into individual particles or chromosomes that form the basic units of inheritance. However, different animals and plants have widely different numbers of chromosomes. This means that we cannot readily tell which chromosomes are related to which. Here, we describe a simple technique that looks at the similarity of genes on each chromosome and thus gives us a true picture of their homology or similarity through evolutionary time. We use this new system to look at the chromosomes of butterflies and moths or Lepidoptera. We term the associated synteny units, Lepidopteran Synteny Units (LSUs). Using a sample of butterfly and moth genomes from across evolutionary time, we show that LSUs form a simple and reliable method of tracing chromosomal homology back through time. Surprisingly, this technique reveals that butterfly and moth chromosomes show conserved blocks dating back to their sister group the Trichoptera. As Lepidoptera have holocentric chromosomes, it will be interesting to see if similar levels of synteny are shown in groups of animals with monocentric chromosomes. The ability to define homology via LSU analysis makes it considerably easier to approach many questions in chromosomal evolution.


Assuntos
Borboletas , Mariposas , Animais , Borboletas/genética , Sintenia , Mariposas/genética , Cromossomos , Genoma , Evolução Molecular
4.
Int J Mol Sci ; 24(8)2023 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-37108472

RESUMO

Root-lesion nematodes (genus Pratylenchus) belong to a diverse group of plant-parasitic nematodes (PPN) with a worldwide distribution. Despite being an economically important PPN group of more than 100 species, genome information related to Pratylenchus genus is scarcely available. Here, we report the draft genome assembly of Pratylenchus scribneri generated on the PacBio Sequel IIe System using the ultra-low DNA input HiFi sequencing workflow. The final assembly created using 500 nematodes consisted of 276 decontaminated contigs, with an average contig N50 of 1.72 Mb and an assembled draft genome size of 227.24 Mb consisting of 51,146 predicted protein sequences. The benchmarking universal single-copy ortholog (BUSCO) analysis with 3131 nematode BUSCO groups indicated that 65.4% of the BUSCOs were complete, whereas 24.0%, 41.4%, and 1.8% were single-copy, duplicated, and fragmented, respectively, and 32.8% were missing. The outputs from GenomeScope2 and Smudgeplots converged towards a diploid genome for P. scribneri. The data provided here will facilitate future studies on host plant-nematode interactions and crop protection at the molecular level.


Assuntos
Parasitos , Tylenchoidea , Animais , Anotação de Sequência Molecular , Análise de Sequência de DNA , Genoma , Sequência de Bases , Tylenchoidea/genética , Parasitos/genética
5.
Mol Plant Microbe Interact ; 36(7): 393-396, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36947747

RESUMO

When comparing the requirements of diverse journals to publish microbial 'Genome Reports,' we noticed that some mostly focus on benchmarking universal single-copy orthologs scores as a quality measure, while the exclusion of possible contaminating sequences from genomic resources and the possible misidentification of the target microbes receive less attention. To deal with these quality issues, we suggest that DNA barcodes that are widely accepted for the identification of the target microbe species should be extracted from newly reported genome resources and included in phylogenetic analyses to confirm the identity of the sequenced microorganisms before Genome Reports are published. This approach, applied, for example, by the journal IMA Fungus, largely prevents the misidentification of the microbes that are targeted for whole-genome sequencing (WGS). In addition, contig similarity values, including GC content, remapping coverage of WGS reads, and BLASTN searches against the National Center for Biotechnology Information nucleotide database, would also reveal contamination issues. The values of these two recommendations to improve the publication criteria for microbial Genome Reports in diverse journals are demonstrated here through analyses of a draft genome published in Molecular Plant-Microbe Interactions and then retracted due to contaminations. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.


Assuntos
Genoma , Genômica , Filogenia , Sequenciamento Completo do Genoma , DNA
6.
Genome Biol Evol ; 15(1)2023 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-36582124

RESUMO

Mycoheterotrophy is an alternative nutritional strategy whereby plants obtain sugars and other nutrients from soil fungi. Mycoheterotrophy and associated loss of photosynthesis have evolved repeatedly in plants, particularly in monocots. Although reductive evolution of plastomes in mycoheterotrophs is well documented, the dynamics of nuclear genome evolution remains largely unknown. Transcriptome datasets were generated from four mycoheterotrophs in three families (Orchidaceae, Burmanniaceae, Triuridaceae) and related green plants and used for phylogenomic analyses to resolve relationships among the mycoheterotrophs, their relatives, and representatives across the monocots. Phylogenetic trees based on 602 genes were mostly congruent with plastome phylogenies, except for an Asparagales + Liliales clade inferred in the nuclear trees. Reduction and loss of chlorophyll synthesis and photosynthetic gene expression and relaxation of purifying selection on retained genes were progressive, with greater loss in older nonphotosynthetic lineages. One hundred seventy-four of 1375 plant benchmark universally conserved orthologous genes were undetected in any mycoheterotroph transcriptome or the genome of the mycoheterotrophic orchid Gastrodia but were expressed in green relatives, providing evidence for massively convergent gene loss in nonphotosynthetic lineages. We designate this set of deleted or undetected genes Missing in Mycoheterotrophs (MIM). MIM genes encode not only mainly photosynthetic or plastid membrane proteins but also a diverse set of plastid processes, genes of unknown function, mitochondrial, and cellular processes. Transcription of a photosystem II gene (psb29) in all lineages implies a nonphotosynthetic function for this and other genes retained in mycoheterotrophs. Nonphotosynthetic plants enable novel insights into gene function as well as gene expression shifts, gene loss, and convergence in nuclear genomes.


Assuntos
Genomas de Plastídeos , Orchidaceae , Humanos , Idoso , Filogenia , Genes de Plantas , Proteínas de Plantas/genética , Orchidaceae/genética
7.
Front Plant Sci ; 13: 876779, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36483967

RESUMO

We assess relationships among 192 species in all 12 monocot orders and 72 of 77 families, using 602 conserved single-copy (CSC) genes and 1375 benchmarking single-copy ortholog (BUSCO) genes extracted from genomic and transcriptomic datasets. Phylogenomic inferences based on these data, using both coalescent-based and supermatrix analyses, are largely congruent with the most comprehensive plastome-based analysis, and nuclear-gene phylogenomic analyses with less comprehensive taxon sampling. The strongest discordance between the plastome and nuclear gene analyses is the monophyly of a clade comprising Asparagales and Liliales in our nuclear gene analyses, versus the placement of Asparagales and Liliales as successive sister clades to the commelinids in the plastome tree. Within orders, around six of 72 families shifted positions relative to the recent plastome analysis, but four of these involve poorly supported inferred relationships in the plastome-based tree. In Poales, the nuclear data place a clade comprising Ecdeiocoleaceae+Joinvilleaceae as sister to the grasses (Poaceae); Typhaceae, (rather than Bromeliaceae) are resolved as sister to all other Poales. In Commelinales, nuclear data place Philydraceae sister to all other families rather than to a clade comprising Haemodoraceae+Pontederiaceae as seen in the plastome tree. In Liliales, nuclear data place Liliaceae sister to Smilacaceae, and Melanthiaceae are placed sister to all other Liliales except Campynemataceae. Finally, in Alismatales, nuclear data strongly place Tofieldiaceae, rather than Araceae, as sister to all the other families, providing an alternative resolution of what has been the most problematic node to resolve using plastid data, outside of those involving achlorophyllous mycoheterotrophs. As seen in numerous prior studies, the placement of orders Acorales and Alismatales as successive sister lineages to all other extant monocots. Only 21.2% of BUSCO genes were demonstrably single-copy, yet phylogenomic inferences based on BUSCO and CSC genes did not differ, and overall functional annotations of the two sets were very similar. Our analyses also reveal significant gene tree-species tree discordance despite high support values, as expected given incomplete lineage sorting (ILS) related to rapid diversification. Our study advances understanding of monocot relationships and the robustness of phylogenetic inferences based on large numbers of nuclear single-copy genes that can be obtained from transcriptomes and genomes.

8.
Gigascience ; 112022 02 25.
Artigo em Inglês | MEDLINE | ID: mdl-35217859

RESUMO

BACKGROUND: Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. To guide forthcoming genome generation efforts and promote efficient prioritization of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. FINDINGS: Here we present an automated analysis workflow that surveys genome assemblies from the United States NCBI, assesses their completeness using the relevant BUSCO datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, examine how key assembly metrics relate to gene content completeness, and compare results from using different BUSCO lineage datasets. CONCLUSIONS: These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritizations for ongoing and future sampling, sequencing, and genome generation initiatives.


Assuntos
Ecossistema , Genoma , Sequência de Bases , Mapeamento Cromossômico , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
9.
IMA Fungus ; 13(1): 2, 2022 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-35109929

RESUMO

Here we describe a new, haploid and stroma forming species within the genus Epichloë, as Epichloë scottii sp. nov. The fungus was isolated from Melica uniflora growing in Bad Harzburg, Germany. Phylogenetic reconstruction using a combined dataset of the tubB and tefA genes strongly support that E. scottii is a distinct species and the so far unknown ancestor species of the hybrid E. disjuncta. A distribution analysis showed a high infection rate in close vicinity of the initial sampling site and only two more spots with low infection rates. Genetic variations in key genes required for alkaloid production suggested that E. scottii sp. nov. might not be capable of producing any of the major alkaloids including ergot alkaloid, loline, indole-diterpene and peramine. All isolates and individuals found in the distribution analysis were identified as mating-type B explaining the lack of mature stromata during this study. We further release a telomere-to-telomere de novo assembly of all seven chromosomes and the mitogenome of E. scottii sp. nov.

10.
Methods Mol Biol ; 2443: 211-232, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35037208

RESUMO

Next-generation sequencing (NGS) technologies can generate billions of reads in a single sequencing run. However, with such high-throughput comes quality issues which have to be addressed before undertaking downstream analysis. Quality control on short reads is usually performed at default settings due to a lack of in-depth understanding of a particular software's parameters and their effect if changed on the output. Here we demonstrate how to optimize read trimming using Trimmomatic. We highlight the benefits of trimming by comparing the quality of transcripts assembled using trimmed and untrimmed reads.


Assuntos
Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Controle de Qualidade , RNA-Seq , Sequenciamento do Exoma
11.
Data Brief ; 40: 107800, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35059482

RESUMO

The sago palm (Metroxylon sagu Rottboll) is a tropical halophytic starch-producing, economically important crop palm mainly located in Southeast Asian countries. Recently, a genome survey was conducted on this palm using the Illumina sequencing platform, with a very low (21.5%) BUSCO genome completeness score, and most of them (∼78%) are either fragmented or missing. Thus, in this study, the sago palm genome completeness was further improved with the utilization of the Nanopore sequencing platform that produced longer reads. A hybrid genome assembly was conducted, and the outcome was a much complete sago palm genome with BUSCO completeness achieved at as high as 97.9%, with only ∼2% of them either fragmented or missing. The estimated genome size of the sago palm is 509,812,790 bp in this study. A sum of 33,242 protein-coding genes was revealed from the sago palm genome and around 96.39% of them had been functionally annotated. An investigation on the carbohydrate metabolism KEGG pathways also unearthed that starch synthesis was one of the major sago palm activities. The genome data obtained from this work is indispensable for future molecular evolutionary and genome-wide association studies on the economically important sago palm.

13.
Front Plant Sci ; 12: 667678, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34354718

RESUMO

Chia (Salvia hispanica L.), now a popular superfood and a pseudocereal, is one of the richest sources of dietary nutrients such as protein, fiber, and polyunsaturated fatty acids (PUFAs). At present, the genomic and genetic information available in the public domain for this crop are scanty, which hinders an understanding of its growth and development and genetic improvement. We report an RNA-sequencing (RNA-Seq)-based comprehensive transcriptome atlas of Chia sampled from 13 tissue types covering vegetative and reproductive growth stages. We used ~355 million high-quality reads of total ~394 million raw reads from transcriptome sequencing to generate de novo reference transcriptome assembly and the tissue-specific transcript assemblies. After the quality assessment of the merged assemblies and implementing redundancy reduction methods, 82,663 reference transcripts were identified. About 65,587 of 82,663 transcripts were translated into 99,307 peptides, and we were successful in assigning InterPro annotations to 45,209 peptides and gene ontology (GO) terms to 32,638 peptides. The assembled transcriptome is estimated to have the complete sequence information for ~86% of the genes found in the Chia genome. Furthermore, the analysis of 53,200 differentially expressed transcripts (DETs) revealed their distinct expression patterns in Chia's vegetative and reproductive tissues; tissue-specific networks and developmental stage-specific networks of transcription factors (TFs); and the regulation of the expression of enzyme-coding genes associated with important metabolic pathways. In addition, we identified 2,411 simple sequence repeats (SSRs) as potential genetic markers from the transcripts. Overall, this study provides a comprehensive transcriptome atlas, and SSRs, contributing to building essential genomic resources to support basic research, genome annotation, functional genomics, and molecular breeding of Chia.

14.
Mol Ecol ; 30(23): 5923-5934, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34432923

RESUMO

The recent development of ecological studies has been fueled by the introduction of massive information based on chromosome-scale genome sequences, even for species for which genetic linkage is not accessible. This was enabled mainly by the application of Hi-C, a method for genome-wide chromosome conformation capture that was originally developed for investigating the long-range interaction of chromatins. Performing genomic scaffolding using Hi-C data is highly resource-demanding and employs elaborate laboratory steps for sample preparation. It starts with building a primary genome sequence assembly as an input, which is followed by computation for genome scaffolding using Hi-C data, requiring careful validation. This article presents technical considerations for obtaining optimal Hi-C scaffolding results and provides a test case of its application to a reptile species, the Madagascar ground gecko (Paroedura picta). Among the metrics that are frequently used for evaluating scaffolding results, we investigate the validity of the completeness assessment of chromosome-scale genome assemblies using single-copy reference orthologues.


Assuntos
Cromossomos , Genoma , Animais , Cromatina , Cromossomos/genética , Genoma/genética , Genômica , Madagáscar
15.
Insect Biochem Mol Biol ; 138: 103622, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34252570

RESUMO

The diamondback moth, Plutella xylostella (L.), is a highly mobile brassica crop pest with worldwide distribution and can rapidly evolve resistance to insecticides, including group 28 diamides. Reference genomes assembled using Illumina sequencing technology have provided valuable resources to advance our knowledge regarding the biology, origin and movement of diamondback moth, and more recently with its sister species, Plutella australiana. Here we apply a trio binning approach to sequence and annotate a chromosome level reference genome of P. xylostella using PacBio Sequel and Dovetail Hi-C sequencing technology and identify a point mutation that causes resistance to commercial diamides. A P. xylostella population collected from brassica crops in the Lockyer Valley, Australia (LV-R), was reselected for chlorantraniliprole resistance then a single male was crossed to a P. australiana female and a hybrid pupa sequenced. A chromosome level 328 Mb P. xylostella genome was assembled with 98.1% assigned to 30 autosomes and the Z chromosome. The genome was highly complete with 98.4% of BUSCO Insecta genes identified and RNAseq informed protein prediction annotated 19,002 coding genes. The LV-R strain survived recommended field application doses of chlorantraniliprole, flubendiamide and cyclaniliprole. Some hybrids also survived these doses, indicating significant departure from recessivity, which has not been previously documented for diamides. Diamide chemicals modulate insect Ryanodine Receptors (RyR), disrupting calcium homeostasis, and we identified an amino acid substitution (I4790K) recently reported to cause diamide resistance in a strain from Japan. This chromosome level assembly provides a new resource for insect comparative genomics and highlights the emergence of diamide resistance in Australia. Resistance management plans need to account for the fact that resistance is not completely recessive.


Assuntos
Cromossomos de Insetos , Diamida/farmacologia , Genoma , Resistência a Inseticidas/genética , Inseticidas/farmacologia , Mariposas/genética , Animais , Haploidia , Mariposas/efeitos dos fármacos , Mariposas/crescimento & desenvolvimento , Pupa/efeitos dos fármacos , Pupa/crescimento & desenvolvimento
16.
Mar Genomics ; 58: 100842, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34217485

RESUMO

The genus Procambarus represents a diverse genus of freshwater crayfish that includes epigean species, stygobitic species, and at least one parthenogenic species. Despite its evolutionary, ecological, and economic importance, most genomic and transcriptomic resources for this genus are limited to a couple of model species. We sequenced the transcriptome of a non-model species, P. erythrops, a geographically restricted stygobitic species from Florida. RNA isolated from gill, muscle and eye tissue was pooled to create a de novo transcriptome assembly using Single Molecule Real-Time sequencing (PacBio), resulting in 19,442 full-length isoforms. The assembly has been deposited in the NCBI (BioProject PRJNA657230). These data will make an important contribution to the comparative study of transcriptome evolution in crayfish and crustaceans.


Assuntos
Astacoidea/genética , Transcriptoma , Animais , Florida , Análise de Sequência de RNA
17.
BMC Genomics ; 22(1): 216, 2021 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-33765927

RESUMO

BACKGROUND: The nematode Pristionchus pacificus is an established model organism for comparative studies with Caenorhabditis elegans. Over the past years, it developed into an independent animal model organism for elucidating the genetic basis of phenotypic plasticity. Community-based curations were employed recently to improve the quality of gene annotations of P. pacificus and to more easily facilitate reverse genetic studies using candidate genes from C. elegans. RESULTS: Here, I demonstrate that the reannotation of phylogenomic data from nine related nematode species using the community-curated P. pacificus gene set as homology data substantially improves the quality of gene annotations. Benchmarking of universal single copy orthologs (BUSCO) estimates a median completeness of 84% which corresponds to a 9% increase over previous annotations. Nevertheless, the ability to infer gene models based on homology already drops beyond the genus level reflecting the rapid evolution of nematode lineages. This also indicates that the highly curated C. elegans genome is not optimally suited for annotating non-Caenorhabditis genomes based on homology. Furthermore, comparative genomic analysis of apparently missing BUSCO genes indicates a failure of ortholog detection by the BUSCO pipeline due to the insufficient sample size and phylogenetic breadth of the underlying OrthoDB data set. As a consequence, the quality of multiple divergent nematode genomes might be underestimated. CONCLUSIONS: This study highlights the need for optimizing gene annotation protocols and it demonstrates the benefit of a high quality genome for phylogenomic data of related species.


Assuntos
Nematoides , Rabditídios , Animais , Caenorhabditis elegans/genética , Genoma , Anotação de Sequência Molecular , Nematoides/genética , Filogenia , Rabditídios/genética
18.
Mol Ecol Resour ; 21(5): 1416-1421, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33629477

RESUMO

With the ever-increasing number of publicly available eukaryotic genome assemblies and user-friendly bioinformatics tools, there are increasing opportunities for researchers to use genomic resources in their research. While there are multiple dimensions to genome quality, it is often reduced to a single score that may not be correlated with other metrics, or appropriate for all applications of an assembly. To assess whether the commonly reported N50 value could reliably predict a separate dimension of genome quality, gene space completeness, we performed a meta-analysis of 611 published articles on eukaryotic genomes that used BUSCO scores, in addition to the typical N50 score. We found that although assemblies with relatively high contig and scaffold N50 values consistently had high BUSCO scores, a high BUSCO score could also be obtained from assemblies with a low N50. This reinforces that despite its ubiquity, N50 is not a perfect proxy for all measures of genome accuracy. Our data also suggests that variations in BUSCO scores among assemblies with poor N50 scores may be related to the number of introns in conserved eukaryotic genes. We stress the importance of screening and evaluating assembly quality based on the appropriate tools and urge increased reporting of additional genome assessment metrics in addition to N50. We also discuss the potential limitations of BUSCO and suggest improvements for assessing gene space within genome assemblies.


Assuntos
Biologia Computacional , Genoma , Genômica , Genômica/métodos
19.
BMC Evol Biol ; 20(1): 141, 2020 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-33138771

RESUMO

BACKGROUND: The Drosophilidae family is traditionally divided into two subfamilies: Drosophilinae and Steganinae. This division is based on morphological characters, and the two subfamilies have been treated as monophyletic in most of the literature, but some molecular phylogenies have suggested Steganinae to be paraphyletic. To test the paraphyletic-Steganinae hypothesis, here, we used genomic sequences of eight Drosophilidae (three Steganinae and five Drosophilinae) and two Ephydridae (outgroup) species and inferred the phylogeny for the group based on a dataset of 1,028 orthologous genes present in all species (> 1,000,000 bp). This dataset includes three genera that broke the monophyly of the subfamilies in previous works. To investigate possible biases introduced by small sample sizes and automatic gene annotation, we used the same methods to infer species trees from a set of 10 manually annotated genes that are commonly used in phylogenetics. RESULTS: Most of the 1,028 gene trees depicted Steganinae as paraphyletic with distinct topologies, but the most common topology depicted it as monophyletic (43.7% of the gene trees). Despite the high levels of gene tree heterogeneity observed, species tree inference in ASTRAL, in PhyloNet, and with the concatenation approach strongly supported the monophyly of both subfamilies for the 1,028-gene dataset. However, when using the concatenation approach to infer a species tree from the smaller set of 10 genes, we recovered Steganinae as a paraphyletic group. The pattern of gene tree heterogeneity was asymmetrical and thus could not be explained solely by incomplete lineage sorting (ILS). CONCLUSIONS: Steganinae was clearly a monophyletic group in the dataset that we analyzed. In addition to ILS, gene tree discordance was possibly the result of introgression, suggesting complex branching processes during the early evolution of Drosophilidae with short speciation intervals and gene flow. Our study highlights the importance of genomic data in elucidating contentious phylogenetic relationships and suggests that phylogenetic inference for drosophilids based on small molecular datasets should be performed cautiously. Finally, we suggest an approach for the correction and cleaning of BUSCO-derived genomic datasets that will be useful to other researchers planning to use this tool for phylogenomic studies.


Assuntos
Drosophilidae/genética , Especiação Genética , Filogenia , Animais , Genômica
20.
BMC Evol Biol ; 20(1): 41, 2020 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-32228442

RESUMO

BACKGROUND: Advances in next-generation sequencing technologies have reduced the cost of whole transcriptome analyses, allowing characterization of non-model species at unprecedented levels. The rapid pace of transcriptomic sequencing has driven the public accumulation of a wealth of data for phylogenomic analyses, however lack of tools aimed towards phylogeneticists to efficiently identify orthologous sequences currently hinders effective harnessing of this resource. RESULTS: We introduce TOAST, an open source R software package that can utilize the ortholog searches based on the software Benchmarking Universal Single-Copy Orthologs (BUSCO) to assemble multiple sequence alignments of orthologous loci from transcriptomes for any group of organisms. By streamlining search, query, and alignment, TOAST automates the generation of locus and concatenated alignments, and also presents a series of outputs from which users can not only explore missing data patterns across their alignments, but also reassemble alignments based on user-defined acceptable missing data levels for a given research question. CONCLUSIONS: TOAST provides a comprehensive set of tools for assembly of sequence alignments of orthologs for comparative transcriptomic and phylogenomic studies. This software empowers easy assembly of public and novel sequences for any target database of candidate orthologs, and fills a critically needed niche for tools that enable quantification and testing of the impact of missing data. As open-source software, TOAST is fully customizable for integration into existing or novel custom informatic pipelines for phylogenomic inference. Software, a detailed manual, and example data files are available through github carolinafishes.github.io.


Assuntos
Conjuntos de Dados como Assunto , Alinhamento de Sequência/métodos , Software , Transcriptoma , Animais , Humanos , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA