Your browser doesn't support javascript.
loading
FSBC: fast string-based clustering for HT-SELEX data.
Kato, Shintaro; Ono, Takayoshi; Minagawa, Hirotaka; Horii, Katsunori; Shiratori, Ikuo; Waga, Iwao; Ito, Koichi; Aoki, Takafumi.
Affiliation
  • Kato S; NEC Solution Innovators, Ltd, 1-18-7 Shinkiba, Koto-ku, Tokyo, 136-8627, Japan. katou-s-mxn@nec.com.
  • Ono T; Graduate School of Information Sciences, Tohoku University, 6-6-05 Aramaki Aza Aoba, Aoba-ku, Sendai-shi, Miyagi, 980-8579, Japan. katou-s-mxn@nec.com.
  • Minagawa H; Graduate School of Information Sciences, Tohoku University, 6-6-05 Aramaki Aza Aoba, Aoba-ku, Sendai-shi, Miyagi, 980-8579, Japan.
  • Horii K; NEC Solution Innovators, Ltd, 1-18-7 Shinkiba, Koto-ku, Tokyo, 136-8627, Japan.
  • Shiratori I; NEC Solution Innovators, Ltd, 1-18-7 Shinkiba, Koto-ku, Tokyo, 136-8627, Japan.
  • Waga I; NEC Solution Innovators, Ltd, 1-18-7 Shinkiba, Koto-ku, Tokyo, 136-8627, Japan.
  • Ito K; NEC Solution Innovators, Ltd, 1-18-7 Shinkiba, Koto-ku, Tokyo, 136-8627, Japan.
  • Aoki T; Graduate School of Information Sciences, Tohoku University, 6-6-05 Aramaki Aza Aoba, Aoba-ku, Sendai-shi, Miyagi, 980-8579, Japan.
BMC Bioinformatics ; 21(1): 263, 2020 Jun 24.
Article in En | MEDLINE | ID: mdl-32580745
ABSTRACT

BACKGROUND:

The combination of systematic evolution of ligands by exponential enrichment (SELEX) and deep sequencing is termed high-throughput (HT)-SELEX, which enables searching aptamer candidates from a massive amount of oligonucleotide sequences. A clustering method is an important procedure to identify sequence groups including aptamer candidates for evaluation with experimental analysis. In general, aptamer includes a specific target binding region, which is necessary for binding to the target molecules. The length of the target binding region varies depending on the target molecules and/or binding styles. Currently available clustering methods for HT-SELEX only estimate clusters based on the similarity of full-length sequences or limited length of motifs as target binding regions. Hence, a clustering method considering the target binding region with different lengths is required. Moreover, to handle such huge data and to save sequencing cost, a clustering method with fast calculation from a single round of HT-SELEX data, not multiple rounds, is also preferred.

RESULTS:

We developed fast string-based clustering (FSBC) for HT-SELEX data. FSBC was designed to estimate clusters by searching various lengths of over-represented strings as target binding regions. FSBC was also designed for fast calculation with search space reduction from a single round, typically the final round, of HT-SELEX data considering imbalanced nucleobases of the aptamer selection process. The calculation time and clustering accuracy of FSBC were compared with those of four conventional clustering methods, FASTAptamer, AptaCluster, APTANI, and AptaTRACE, using HT-SELEX data (>15 million oligonucleotide sequences). FSBC, AptaCluster, and AptaTRACE could complete the clustering for all sequence data, and FSBC and AptaTRACE performed higher clustering accuracy. FSBC showed the highest clustering accuracy and had the second fastest calculation speed among all methods compared.

CONCLUSION:

FSBC is applicable to a large HT-SELEX dataset, which can facilitate the accurate identification of groups including aptamer candidates. AVAILABILITY OF DATA AND MATERIALS FSBC is available at http//www.aoki.ecei.tohoku.ac.jp/fsbc/.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: SELEX Aptamer Technique / High-Throughput Nucleotide Sequencing Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2020 Document type: Article Affiliation country: Japón

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: SELEX Aptamer Technique / High-Throughput Nucleotide Sequencing Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2020 Document type: Article Affiliation country: Japón