Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes.

Zhang, Fan; Xue, Hongzhang; Dong, Xiaorui; Li, Min; Zheng, Xiaoming; Li, Zhikang; Xu, Jianlong; Wang, Wensheng; Wei, Chaochun

Zhang, Fan; Xue, Hongzhang; Dong, Xiaorui; Li, Min; Zheng, Xiaoming; Li, Zhikang; Xu, Jianlong; Wang, Wensheng; Wei, Chaochun.

Afiliación

Zhang F; Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
Xue H; College of Agronomy, Anhui Agricultural University, Hefei 230036, China.
Dong X; Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
Li M; Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
Zheng X; College of Agronomy, Anhui Agricultural University, Hefei 230036, China.
Li Z; Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
Xu J; Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
Wang W; College of Agronomy, Anhui Agricultural University, Hefei 230036, China.
Wei C; Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing 100081, China.

Genome Res ; 32(5): 853-863, 2022 05.

Article en En | MEDLINE | ID: mdl-35396275

RESUMEN

The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than Nipponbare, the rice reference genome (NipRG), but it is still disadvantaged by incompleteness and loss of genomic contexts. The third-generation sequencing (TGS) with long reads can help to construct better pan-genomes. In this paper, we report a high-quality rice pan-genome construction method by introducing a series of new steps to deal with the long-read data, including unmapped sequence block filtering, redundancy removing, and sequence block elongating. Compared to NipRG, the long-read sequencing-based pan-genome constructed from 105 rice accessions, which contains 604 Mb novel sequences, is much more comprehensive than the one constructed from â¼3000 rice genomes sequenced with short reads. The repetitive sequences are the main components of novel sequences, which partially explain the differences between the pan-genomes based on TGS and SGS. Adding six wild rice accessions, there are about 879 Mb novel sequences and 19,000 novel genes in the rice pan-genome in total. In addition, we have created high-quality reference genomes for all representative rice populations, including five gapless reference genomes. This study has made significant progress in our understanding of the rice pan-genome, and this pan-genome construction method for long-read data can be applied to accelerate a broad range of genomics studies.

Asunto(s)

Oryza; Genoma; Genómica/métodos; Secuenciación de Nucleótidos de Alto Rendimiento; Oryza/genética; Análisis de Secuencia de ADN

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Oryza Idioma: En Revista: Genome Res Asunto de la revista: BIOLOGIA MOLECULAR / GENETICA Año: 2022 Tipo del documento: Article País de afiliación: China Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google