Efficient COI barcoding using high throughput single-end 400 bp sequencing.

Yang, Chentao; Zheng, Yuxuan; Tan, Shangjin; Meng, Guanliang; Rao, Wei; Yang, Caiqing; Bourne, David G; O'Brien, Paul A; Xu, Junqiang; Liao, Sha; Chen, Ao; Chen, Xiaowei; Jia, Xinrui; Zhang, Ai-Bing; Liu, Shanlin

Yang, Chentao; Zheng, Yuxuan; Tan, Shangjin; Meng, Guanliang; Rao, Wei; Yang, Caiqing; Bourne, David G; O'Brien, Paul A; Xu, Junqiang; Liao, Sha; Chen, Ao; Chen, Xiaowei; Jia, Xinrui; Zhang, Ai-Bing; Liu, Shanlin.

Affiliation

Yang C; BGI-Shenzhen, Shenzhen, 518083, China.
Zheng Y; College of Life Sciences, Capital Normal University, Beijing, 100048, China.
Tan S; BGI-Shenzhen, Shenzhen, 518083, China.
Meng G; BGI-Shenzhen, Shenzhen, 518083, China.
Rao W; BGI-Shenzhen, Shenzhen, 518083, China.
Yang C; College of Life Sciences, Capital Normal University, Beijing, 100048, China.
Bourne DG; College of Science and Engineering, James Cook University, Townsville, QLD, Australia.
O'Brien PA; Australian Institute of Marine Science, Townsville, QLD, Australia.
Xu J; AIMS@JCU, Townsville, QLD, Australia.
Liao S; College of Science and Engineering, James Cook University, Townsville, QLD, Australia.
Chen A; Australian Institute of Marine Science, Townsville, QLD, Australia.
Chen X; AIMS@JCU, Townsville, QLD, Australia.
Jia X; BGI-Shenzhen, Shenzhen, 518083, China.
Zhang AB; BGI-Shenzhen, Shenzhen, 518083, China.
Liu S; BGI-Shenzhen, Shenzhen, 518083, China.

BMC Genomics ; 21(1): 862, 2020 Dec 04.

Article in En | MEDLINE | ID: mdl-33276723

ABSTRACT

BACKGROUND: Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina's MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio's SEQUEL II system). RESULTS: Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400 bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5' and 3' ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%. CONCLUSIONS: The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.

Subject(s)
Key words

Biodiversity; COI; DNA barcode; High-throughput sequencing; MGISEQ-2000; SE400

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Ecosystem / DNA Barcoding, Taxonomic Type of study: Prognostic_studies Limits: Animals Language: En Journal: BMC Genomics Journal subject: GENETICA Year: 2020 Document type: Article Affiliation country: Country of publication:

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google