RESUMO
Chloroplasts are photosynthetic organelles in algal and plant cells that contain their own genome. Chloroplast genomes are commonly used in evolutionary studies and taxonomic identification and are increasingly becoming a target for crop improvement studies. As DNA sequencing becomes more affordable, researchers are collecting vast swathes of high-quality whole-genome sequence data from laboratory and field settings alike. Whole tissue read libraries sequenced with the primary goal of understanding the nuclear genome will inadvertently contain many reads derived from the chloroplast genome. These whole-genome, whole-tissue read libraries can additionally be used to assemble chloroplast genomes with little to no extra cost. While several tools exist that make use of short-read second generation and third-generation long-read sequencing data for chloroplast genome assembly, these tools may have complex installation steps, inadequate error reporting, poor expandability, and/or lack scalability. Here, we present CLAW (Chloroplast Long-read Assembly Workflow), an easy to install, customise, and use Snakemake tool to assemble chloroplast genomes from chloroplast long-reads found in whole-genome read libraries (https://github.com/aaronphillips7493/CLAW). Using 19 publicly available reference chloroplast genome assemblies and long-read libraries from algal, monocot and eudicot species, we show that CLAW can rapidly produce chloroplast genome assemblies with high similarity to the reference assemblies. CLAW was designed such that users have complete control over parameterisation, allowing individuals to optimise CLAW to their specific use cases. We expect that CLAW will provide researchers (with varying levels of bioinformatics expertise) with an additional resource useful for contributing to the growing number of publicly available chloroplast genome assemblies.
Assuntos
Genoma de Cloroplastos , Humanos , Genoma de Cloroplastos/genética , Fluxo de Trabalho , Análise de Sequência de DNA , Biologia Computacional , Cloroplastos/genética , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
Using microscopy to investigate stomatal behaviour is common in plant physiology research. Manual inspection and measurement of stomatal pore features is low throughput, relies upon expert knowledge to record stomatal features accurately, requires significant researcher time and investment, and can represent a significant bottleneck to research pipelines. To alleviate this, we introduce StomaAI (SAI): a reliable, user-friendly and adaptable tool for stomatal pore and density measurements via the application of deep computer vision, which has been initially calibrated and deployed for the model plant Arabidopsis (dicot) and the crop plant barley (monocot grass). SAI is capable of producing measurements consistent with human experts and successfully reproduced conclusions of published datasets. SAI boosts the number of images that can be evaluated in a fraction of the time, so can obtain a more accurate representation of stomatal traits than is routine through manual measurement. An online demonstration of SAI is hosted at https://sai.aiml.team, and the full local application is publicly available for free on GitHub through https://github.com/xdynames/sai-app.
Assuntos
Arabidopsis , Humanos , Fenótipo , Computadores , Estômatos de Plantas/fisiologiaRESUMO
Six strains, KI11_D11T, KI4_B1, KI11_C11T, KI16_H9T, KI4_A6T and KI3_B9T, were isolated from insects and flowers on Kangaroo Island, South Australia. On the basis of 16S rRNA gene phylogeny, strains KI11_D11T, KI4_B1, KI11_C11T, KI16_H9T, KI4_A6T were found to be closely related to Fructilactobacillus ixorae Ru20-1T. Due to the lack of a whole genome sequence for this species, whole genome sequencing of Fructilactobacillus ixorae Ru20-1T was undertaken. KI3_B9T was found to be closely related to Fructobacillus tropaeoli F214-1T. Utilizing core gene phylogenetics and whole genome analyses, such as determination of AAI, ANI and dDDH, we propose that these six isolates represent five novel species with the names Fructilactobacillus cliffordii (KI11_D11T= LMG 32130T = NBRC 114988T), Fructilactobacillus hinvesii (KI11_C11T = LMG 32129T = NBRC 114987T), Fructilactobacillus myrtifloralis (KI16_H9T= LMG 32131T = NBRC 114989T) Fructilactobacillus carniphilus (KI4_A6T = LMG 32127T = NBRC 114985T) and Fructobacillus americanaquae (KI3_B9T = LMG 32124T = NBRC 114983T). Chemotaxonomic analyses detected no fructophilic characters for these strains of member of the genus Fructilactobacillus. KI3_B9T was found to be obligately fructophilic, similarly to its phylogenetic neighbours in the genus Fructobacillus. This study represents the first isolation, to our knowledge, of novel species in the family Lactobacillaceae from the Australian wild.
Assuntos
Lactobacillales , Animais , Lactobacillales/genética , Filogenia , RNA Ribossômico 16S/genética , Austrália do Sul , Análise de Sequência de DNA , DNA Bacteriano/genética , Composição de Bases , Ácidos Graxos/química , Austrália , Técnicas de Tipagem Bacteriana , Lactobacillus , Insetos , Flores/microbiologiaRESUMO
Four strains, SG5_A10T, SGEP1_A5T, SG4_D2T, and SG4_A1T, were isolated from the honey or homogenate of Australian stingless bee species Tetragonula carbonaria and Austroplebeia australis. Based on 16S rRNA gene phylogeny, core gene phylogenetics, whole genome analyses such as determination of amino acid identity (AAI), cAAI of conserved genes, average nucleotide identity (ANI), and digital DNA-DNA hybridization (dDDH), chemotaxonomic analyses, and the novel isolation sources and unique geography, we propose three new species and one genus with the names Apilactobacillus apisilvae sp. nov. (SG5_A10T = LMG 32133T = NBRC 114991T), Bombilactobacillus thymidiniphilus sp. nov. (SG4_A1T = LMG 32125T = NBRC 114984T), Bombilactobacillus folatiphilus sp. nov. (SG4_D2T = LMG 32126T = NBRC 115004T) and Nicolia spurrieriana sp. nov. (SGEP1_A5T = LMG 32134T = NBRC 114992T). Three out of the four strains were found to be fructophilic, where SG5_A10T and SGEP1_A5T belong to obligately fructophilic lactic acid bacteria, and SG4_D2T representing a new type denoted here as kinetically fructophilic. This study represents the first published lactic acid bacterial species associated with the unique niche of Australian stingless bees.
Assuntos
Lactobacillales , Animais , Austrália , Técnicas de Tipagem Bacteriana , Composição de Bases , Abelhas , DNA Bacteriano/genética , Ácidos Graxos/química , Ácido Láctico , Lactobacillales/genética , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência de DNARESUMO
EMBL Australia Bioinformatics Resource (EMBL-ABR) is a developing national research infrastructure, providing bioinformatics resources and support to life science and biomedical researchers in Australia. EMBL-ABR comprises 10 geographically distributed national nodes with one coordinating hub, with current funding provided through Bioplatforms Australia and the University of Melbourne for its initial 2-year development phase. The EMBL-ABR mission is to: (1) increase Australia's capacity in bioinformatics and data sciences; (2) contribute to the development of training in bioinformatics skills; (3) showcase Australian data sets at an international level and (4) enable engagement in international programs. The activities of EMBL-ABR are focussed in six key areas, aligning with comparable international initiatives such as ELIXIR, CyVerse and NIH Commons. These key areas-Tools, Data, Standards, Platforms, Compute and Training-are described in this article.
Assuntos
Disciplinas das Ciências Biológicas , Pesquisa Biomédica , Biologia Computacional/educação , Biologia Computacional/métodos , Curadoria de Dados/métodos , Austrália , HumanosRESUMO
Nuclear male-sterile mutants with non-conditional, recessive and strictly monogenic inheritance are useful for both hybrid and conventional breeding systems, and have long been a research focus for many crops. In allohexaploid wheat, however, genic redundancy results in rarity of such mutants, with the ethyl methanesulfonate-induced mutant ms5 among the few reported to date. Here, we identify TaMs5 as a glycosylphosphatidylinositol-anchored lipid transfer protein required for normal pollen exine development, and by transgenic complementation demonstrate that TaMs5-A restores fertility to ms5. We show ms5 locates to a centromere-proximal interval and has a sterility inheritance pattern modulated by TaMs5-D but not TaMs5-B. We describe two allelic forms of TaMs5-D, one of which is non-functional and confers mono-factorial inheritance of sterility. The second form is functional but shows incomplete dominance. Consistent with reduced functionality, transcript abundance in developing anthers was found to be lower for TaMs5-D than TaMs5-A. At the 3B homoeolocus, we found only non-functional alleles among 178 diverse hexaploid and tetraploid wheats that include landraces and Triticum dicoccoides. Apparent ubiquity of non-functional TaMs5-B alleles suggests loss-of-function arose early in wheat evolution and, therefore, at most knockout of two homoeoloci is required for sterility. This work provides genetic information, resources and tools required for successful implementation of ms5 sterility in breeding systems for bread and durum wheats.
Assuntos
Proteínas de Plantas/metabolismo , Triticum/metabolismo , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Infertilidade das Plantas/genética , Infertilidade das Plantas/fisiologia , Proteínas de Plantas/genética , Pólen/metabolismo , Pólen/fisiologia , Triticum/genética , Triticum/fisiologiaRESUMO
Comparing newly obtained and previously known nucleotide and amino-acid sequences underpins modern biological research. BLAST is a well-established tool for such comparisons but is challenging to use on new data sets. We combined a user-centric design philosophy with sustainable software development approaches to create Sequenceserver, a tool for running BLAST and visually inspecting BLAST results for biological interpretation. Sequenceserver uses simple algorithms to prevent potential analysis errors and provides flexible text-based and visual outputs to support researcher productivity. Our software can be rapidly installed for use by individuals or on shared servers.
Assuntos
Biologia Computacional/métodos , Técnicas Genéticas , SoftwareRESUMO
BACKGROUND: The CRISPR-Cas9 system is a powerful and versatile tool for crop genome editing. However, achieving highly efficient and specific editing in polyploid species can be a challenge. The efficiency and specificity of the CRISPR-Cas9 system depends critically on the gRNA used. Here, we assessed the activities and specificities of seven gRNAs targeting 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) in hexaploid wheat protoplasts. EPSPS is the biological target of the widely used herbicide glyphosate. RESULTS: The seven gRNAs differed substantially in their on-target activities, with mean indel frequencies ranging from 0% to approximately 20%. There was no obvious correlation between experimentally determined and in silico predicted on-target gRNA activity. The presence of a single mismatch within the seed region of the guide sequence greatly reduced but did not abolish gRNA activity, whereas the presence of an additional mismatch, or the absence of a PAM, all but abolished gRNA activity. Large insertions (≥20 bp) of DNA vector-derived sequence were detected at frequencies up to 8.5% of total indels. One of the gRNAs exhibited several properties that make it potentially suitable for the development of non-transgenic glyphosate resistant wheat. CONCLUSIONS: We have established a rapid and reliable method for gRNA validation in hexaploid wheat protoplasts. The method can be used to identify gRNAs that have favourable properties. Our approach is particularly suited to polyploid species, but should be applicable to any plant species amenable to protoplast transformation.
Assuntos
Sistemas CRISPR-Cas/genética , Edição de Genes/métodos , Genoma de Planta/genética , RNA Guia de Cinetoplastídeos/genética , Triticum/genética , Protoplastos/metabolismoRESUMO
The development and adoption of hybrid seed technology have led to dramatic increases in agricultural productivity. However, it has been a challenge to develop a commercially viable platform for the production of hybrid wheat (Triticum aestivum) seed due to wheat's strong inbreeding habit. Recently, a novel platform for commercial hybrid seed production was described. This hybridization platform utilizes nuclear male sterility to force outcrossing and has been applied to maize and rice. With the recent molecular identification of the wheat male fertility gene Ms1, it is now possible to extend the use of this novel hybridization platform to wheat. In this report, we used the CRISPR/Cas9 system to generate heritable, targeted mutations in Ms1. The introduction of biallelic frameshift mutations into Ms1 resulted in complete male sterility in wheat cultivars Fielder and Gladius, and several of the selected male-sterile lines were potentially non-transgenic. Our study demonstrates the utility of the CRISPR/Cas9 system for the rapid generation of male sterility in commercial wheat cultivars. This represents an important step towards capturing heterosis to improve wheat yields, through the production and use of hybrid seed on an industrial scale.
Assuntos
Sistemas CRISPR-Cas , Infertilidade das Plantas , Sementes , Triticum/genética , Mutação da Fase de Leitura , Técnicas de Inativação de Genes , Genes de Plantas , PoliploidiaRESUMO
The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP.
Assuntos
Biologia Computacional , Austrália , Sequenciamento de Nucleotídeos em Larga Escala , UniversidadesRESUMO
There is a clear demand for hands-on bioinformatics training. The development of bioinformatics workshop content is both time-consuming and expensive. Therefore, enabling trainers to develop bioinformatics workshops in a way that facilitates reuse is becoming increasingly important. The most widespread practice for sharing workshop content is through making PDF, PowerPoint and Word documents available online. While this effort is to be commended, such content is usually not so easy to reuse or repurpose and does not capture all the information required for a third party to rerun a workshop. We present an open, collaborative framework for developing and maintaining, reusable and shareable hands-on training workshop content.
Assuntos
Biologia Computacional , Comportamento Cooperativo , HumanosRESUMO
KEY MESSAGE: Elite wheat pollinators are critical for successful hybrid breeding. We identified Rht-B1 and Ppd-D1 loci affecting multiple pollinator traits and therefore represent major targets for improving hybrid seed production. Hybrid breeding has a great potential to significantly boost wheat yields. Ideal male pollinators would be taller in stature, contain many spikelets well-spaced along the spike and exhibit high extrusion of large anthers. Most importantly, flowering time would match with that of the female parent. Available genetic resources for developing an elite wheat pollinator are limited, and the genetic basis for many of these traits is largely unknown. Here, we report on the genetic analysis of pollinator traits using biparental mapping populations. We identified two anther extrusion QTLs of medium effect, one on chromosome 1BL and the other on 4BS coinciding with the semi-dwarfing Rht-B1 locus. The effect of Rht-B1 alleles on anther extrusion is genotype dependent, while tall plant Rht-B1a allele is consistently associated with large anthers. Multiple QTLs were identified at the Ppd-D1 locus for anther length, spikelet number and spike length, with the photoperiod-sensitive Ppd-D1b allele associated with favourable pollinator traits in the populations studied. We also demonstrated that homeoloci, Rht-D1 and Ppd-B1, influence anther length among other traits. These results suggest that combinations of Rht-B1 and Ppd-D1 alleles control multiple pollinator traits and should be major targets of hybrid wheat breeding programs.
Assuntos
Flores/genética , Polinização/genética , Locos de Características Quantitativas , Triticum/genética , Alelos , Mapeamento Cromossômico , Genes de Plantas , Genótipo , Fenótipo , FotoperíodoRESUMO
BACKGROUND: Democratising the growing body of whole genome sequencing data available for Triticum aestivum (bread wheat) has been impeded by the lack of a genome reference and the large computational requirements for analysing these data sets. RESULTS: DAWN (Diversity Among Wheat geNomes) integrates data from the T. aestivum Chinese Spring (CS) IWGSC RefSeq v1.0 genome with public WGS and exome data from 17 and 62 accessions respectively, enabling researchers and breeders alike to investigate genotypic differences between wheat accessions at the level of whole chromosomes down to individual genes. CONCLUSIONS: Using DAWN we show that it is possible to visualise small and large chromosomal deletions, identify haplotypes at a glance and spot the consequences of selective breeding. DAWN allows us to detect the break points of alien introgression segments brought into an accession when transferring desired genes. Furthermore, we can find possible explanations for reduced recombination in parts of a chromosome, we can predict regions with linkage drag, and also look at diversity in centromeric regions.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Triticum/genética , Centrômero/genética , Genótipo , Haplótipos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sequenciamento do ExomaRESUMO
Calcineurin B-like protein interacting protein kinases (CIPKs) are key regulators of pre-transcriptional and post-translational responses to abiotic stress. Arabidopsis thaliana CIPK16 (AtCIPK16) was identified from a forward genetic screen as a gene that mediates lower shoot salt accumulation and improved salinity tolerance in Arabidopsis and transgenic barley. Here, we aimed to gain an understanding of the evolution of AtCIPK16, and orthologues of CIPK16 in other plant species including barley, by conducting a phylogenetic analysis of terrestrial plant species. The resulting protein sequence based phylogenetic trees revealed a single clade that included AtCIPK16 along with two segmentally duplicated CIPKs, AtCIPK5 and AtCIPK25. No monocots had proteins that fell into this clade; instead the most closely related monocot proteins formed a group basal to the entire CIPK16, 5 and 25 clade. We also found that AtCIPK16 contains a core Brassicales specific indel and a putative nuclear localisation signal, which are synapomorphic characters of CIPK16 genes. In addition, we present a model that proposes the evolution of CIPK16, 5 and 25 clade.
Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Evolução Molecular , Proteínas Serina-Treonina Quinases/genética , Sequência de Aminoácidos , Arabidopsis/metabolismo , Proteínas de Arabidopsis/classificação , Proteínas de Arabidopsis/metabolismo , Éxons , Hordeum/genética , Íntrons , Filogenia , Proteínas Serina-Treonina Quinases/classificação , Proteínas Serina-Treonina Quinases/metabolismo , Tolerância ao Sal/genética , Alinhamento de SequênciaRESUMO
Bread wheat (Triticum aestivum L.) has a major salt tolerance locus, Kna1, responsible for the maintenance of a high cytosolic K(+) /Na(+) ratio in the leaves of salt stressed plants. The Kna1 locus encompasses a large DNA fragment, the distal 14% of chromosome 4DL. Limited recombination has been observed at this locus making it difficult to map genetically and identify the causal gene. Here, we decipher the function of TaHKT1;5-D, a candidate gene underlying the Kna1 locus. Transport studies using the heterologous expression systems Saccharomyces cerevisiae and Xenopus laevis oocytes indicated that TaHKT1;5-D is a Na(+) -selective transporter. Transient expression in Arabidopsis thaliana mesophyll protoplasts and in situ polymerase chain reaction indicated that TaHKT1;5-D is localised on the plasma membrane in the wheat root stele. RNA interference-induced silencing decreased the expression of TaHKT1;5-D in transgenic bread wheat lines which led to an increase in the Na(+) concentration in the leaves. This indicates that TaHKT1;5-D retrieves Na(+) from the xylem vessels in the root and has an important role in restricting the transport of Na(+) from the root to the leaves in bread wheat. Thus, TaHKT1;5-D confers the essential salinity tolerance mechanism in bread wheat associated with the Kna1 locus via shoot Na(+) exclusion and is critical in maintaining a high K(+) /Na(+) ratio in the leaves. These findings show there is potential to increase the salinity tolerance of bread wheat by manipulation of HKT1;5 genes.
Assuntos
Proteínas de Transporte de Cátions/genética , Regulação da Expressão Gênica de Plantas , Proteínas de Plantas/genética , Sódio/metabolismo , Simportadores/genética , Triticum/genética , Animais , Arabidopsis/genética , Arabidopsis/metabolismo , Sequência de Bases , Proteínas de Transporte de Cátions/metabolismo , Expressão Gênica , Dados de Sequência Molecular , Oócitos , Folhas de Planta/genética , Folhas de Planta/metabolismo , Proteínas de Plantas/metabolismo , Raízes de Plantas/genética , Raízes de Plantas/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Tolerância ao Sal , Análise de Sequência de DNA , Simportadores/metabolismo , Transgenes , Triticum/citologia , Triticum/metabolismo , Xenopus laevis , Xilema/metabolismoRESUMO
The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show-style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources.
Assuntos
Biologia Computacional/educação , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Austrália , Instrução por Computador/métodos , Comportamento Cooperativo , Currículo , EnsinoRESUMO
Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.
Assuntos
Mutação INDEL , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína , Software , Algoritmos , Internet , Proteínas/genéticaRESUMO
BACKGROUND: The annotation of many genomes is limited, with a large proportion of identified genes lacking functional assignments. The construction of gene co-expression networks is a powerful approach that presents a way of integrating information from diverse gene expression datasets into a unified analysis which allows inferences to be drawn about the role of previously uncharacterised genes. Using this approach, we generated a condition-free gene co-expression network for the chicken using data from 1,043 publically available Affymetrix GeneChip Chicken Genome Arrays. This data was generated from a diverse range of experiments, including different tissues and experimental conditions. Our aim was to identify gene co-expression modules and generate a tool to facilitate exploration of the functional chicken genome. RESULTS: Fifteen modules, containing between 24 and 473 genes, were identified in the condition-free network. Most of the modules showed strong functional enrichment for particular Gene Ontology categories. However, a few showed no enrichment. Transcription factor binding site enrichment was also noted. CONCLUSIONS: We have demonstrated that this chicken gene co-expression network is a useful tool in gene function prediction and the identification of putative novel transcription factors and binding sites. This work highlights the relevance of this methodology for functional prediction in poorly annotated genomes such as the chicken.
Assuntos
Galinhas/genética , Perfilação da Expressão Gênica , Genômica , Animais , Bases de Dados Genéticas , Regulação da Expressão Gênica , Redes Reguladoras de Genes/genética , Família Multigênica/genética , Motivos de Nucleotídeos/genética , SoftwareRESUMO
Plantago ovata is cultivated for production of its seed husk (psyllium). When wet, the husk transforms into a mucilage with properties suitable for pharmaceutical industries, utilised in supplements for controlling blood cholesterol levels, and food industries for making gluten-free products. There has been limited success in improving husk quantity and quality through breeding approaches, partly due to the lack of a reference genome. Here we constructed the first chromosome-scale reference assembly of P. ovata using a combination of 5.98 million PacBio and 636.5 million Hi-C reads. We also used corrected PacBio reads to estimate genome size and transcripts to generate gene models. The final assembly covers ~ 500 Mb with 99.3% gene set completeness. A total of 97% of the sequences are anchored to four chromosomes with an N50 of ~ 128.87 Mb. The P. ovata genome contains 61.90% repeats, where 40.04% are long terminal repeats. We identified 41,820 protein-coding genes, 411 non-coding RNAs, 108 ribosomal RNAs, and 1295 transfer RNAs. This genome will provide a resource for plant breeding programs to, for example, reduce agronomic constraints such as seed shattering, increase psyllium yield and quality, and overcome crop disease susceptibility.