Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Bioinformatics ; 38(16): 4039-4041, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35771653

RESUMEN

SUMMARY: We present Mirage 2.0, which accurately estimates gene-content evolutionary history by considering heterogeneous evolutionary patterns among gene families. Notably, we introduce a deterministic pattern mixture model, which makes Mirage substantially faster and more memory-efficient to be applicable to large datasets with thousands of genomes. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at https://github.com/fukunagatsu/Mirage. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Programas Informáticos , Evolución Molecular , Evolución Biológica , Porcelana Dental
2.
Bioinformatics ; 38(7): 1794-1800, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35060594

RESUMEN

MOTIVATION: Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias. RESULTS: To reduce the spurious correlation bias, we developed metrics based on the inverse Potts model (IPM) for phylogenetic profiling. We also developed a metric based on both the IPM and a phylogenetic tree. In an empirical dataset analysis, we demonstrated that these IPM-based metrics improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several metrics, including the IPM-based metrics, had superior performance to a single metric. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at https://github.com/fukunagatsu/Ipm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Filogenia
3.
Bioinformatics ; 37(Suppl_1): i16-i24, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252954

RESUMEN

MOTIVATION: Accumulating evidence has highlighted the importance of microbial interaction networks. Methods have been developed for estimating microbial interaction networks, of which the generalized Lotka-Volterra equation (gLVE)-based method can estimate a directed interaction network. The previous gLVE-based method for estimating microbial interaction networks did not consider time-varying interactions. RESULTS: In this study, we developed unsupervised learning-based microbial interaction inference method using Bayesian estimation (Umibato), a method for estimating time-varying microbial interactions. The Umibato algorithm comprises Gaussian process regression (GPR) and a new Bayesian probabilistic model, the continuous-time regression hidden Markov model (CTRHMM). Growth rates are estimated by GPR, and interaction networks are estimated by CTRHMM. CTRHMM can estimate time-varying interaction networks using interaction states, which are defined as hidden variables. Umibato outperformed the existing methods on synthetic datasets. In addition, it yielded reasonable estimations in experiments on a mouse gut microbiota dataset, thus providing novel insights into the relationship between consumed diets and the gut microbiota. AVAILABILITY AND IMPLEMENTATION: The C++ and python source codes of the Umibato software are available at https://github.com/shion-h/Umibato. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Animales , Teorema de Bayes , Ratones , Interacciones Microbianas , Distribución Normal
4.
Bioinformatics ; 35(22): 4543-4552, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-30993319

RESUMEN

MOTIVATION: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a 'mutation signature.' Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. RESULTS: In this study, we present a novel method for estimating the number of mutation signatures-latent Dirichlet allocation with variational Bayes inference (VB-LDA)-where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported. AVAILABILITY AND IMPLEMENTATION: All the predicted mutation signatures with clustering results are freely available at http://www.f.waseda.jp/mhamada/MS/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https://github.com/qkirikigaku/MS_LDA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mutación , Programas Informáticos , Teorema de Bayes , Análisis por Conglomerados
5.
Mol Biol Evol ; 35(6): 1553-1555, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29668970

RESUMEN

Fish mitochondrial genome (mitogenome) data form a fundamental basis for revealing vertebrate evolution and hydrosphere ecology. Here, we report recent functional updates of MitoFish, which is a database of fish mitogenomes with a precise annotation pipeline MitoAnnotator. Most importantly, we describe implementation of MiFish pipeline for metabarcoding analysis of fish mitochondrial environmental DNA, which is a fast-emerging and powerful technology in fish studies. MitoFish, MitoAnnotator, and MiFish pipeline constitute a key platform for studies of fish evolution, ecology, and conservation, and are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed April 7th, 2018).


Asunto(s)
Código de Barras del ADN Taxonómico , Peces/genética , Genoma Mitocondrial , Animales
6.
BMC Genomics ; 19(1): 414, 2018 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-29843593

RESUMEN

BACKGROUND: Although the number of discovered long non-coding RNAs (lncRNAs) has increased dramatically, their biological roles have not been established. Many recent studies have used ribosome profiling data to assess the protein-coding capacity of lncRNAs. However, very little work has been done to identify ribosome-associated lncRNAs, here defined as lncRNAs interacting with ribosomes related to protein synthesis as well as other unclear biological functions. RESULTS: On average, 39.17% of expressed lncRNAs were observed to interact with ribosomes in human and 48.16% in mouse. We developed the ribosomal association index (RAI), which quantifies the evidence for ribosomal associability of lncRNAs over various tissues and cell types, to catalog 691 and 409 lncRNAs that are robustly associated with ribosomes in human and mouse, respectively. Moreover, we identified 78 and 42 lncRNAs with a high probability of coding peptides in human and mouse, respectively. Compared with ribosome-free lncRNAs, ribosome-associated lncRNAs were observed to be more likely to be located in the cytoplasm and more sensitive to nonsense-mediated decay. CONCLUSION: Our results suggest that RAI can be used as an integrative and evidence-based tool for distinguishing between ribosome-associated and free lncRNAs, providing a valuable resource for the study of lncRNA functions.


Asunto(s)
ARN Largo no Codificante/genética , Ribosomas/genética , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica , Células HeLa , Humanos
7.
Bioinformatics ; 33(17): 2666-2674, 2017 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-28459942

RESUMEN

MOTIVATION: LncRNAs play important roles in various biological processes. Although more than 58 000 human lncRNA genes have been discovered, most known lncRNAs are still poorly characterized. One approach to understanding the functions of lncRNAs is the detection of the interacting RNA target of each lncRNA. Because experimental detections of comprehensive lncRNA-RNA interactions are difficult, computational prediction of lncRNA-RNA interactions is an indispensable technique. However, the high computational costs of existing RNA-RNA interaction prediction tools prevent their application to large-scale lncRNA datasets. RESULTS: Here, we present 'RIblast', an ultrafast RNA-RNA interaction prediction method based on the seed-and-extension approach. RIblast discovers seed regions using suffix arrays and subsequently extends seed regions based on an RNA secondary structure energy model. Computational experiments indicate that RIblast achieves a level of prediction accuracy similar to those of existing programs, but at speeds over 64 times faster than existing programs. AVAILABILITY AND IMPLEMENTATION: The source code of RIblast is freely available at https://github.com/fukunagatsu/RIblast . CONTACT: t.fukunaga@kurenai.waseda.jp or mhamada@waseda.jp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Anotación de Secuencia Molecular/métodos , ARN Largo no Codificante/metabolismo , Programas Informáticos , Humanos , ARN Largo no Codificante/genética , ARN Mensajero/metabolismo , Análisis de Secuencia de ARN/métodos
8.
Nucleic Acids Res ; 44(W1): W302-7, 2016 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-27131356

RESUMEN

The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD.


Asunto(s)
Conformación de Ácido Nucleico , Pliegue del ARN , ARN/química , Programas Informáticos , Algoritmos , Emparejamiento Base , Secuencia de Bases , Gráficos por Computador , Internet , Mutación , ARN/genética , Análisis de Secuencia de ARN , Termodinámica
9.
BMC Bioinformatics ; 18(1): 46, 2017 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-28103804

RESUMEN

BACKGROUND: With rapid advances in genome sequencing and editing technologies, systematic and quantitative analysis of animal behavior is expected to be another key to facilitating data-driven behavioral genetics. The nematode Caenorhabditis elegans is a model organism in this field. Several video-tracking systems are available for automatically recording behavioral data for the nematode, but computational methods for analyzing these data are still under development. RESULTS: In this study, we applied the Gaussian mixture model-based binning method to time-series postural data for 322 C. elegans strains. We revealed that the occurrence patterns of the postural states and the transition patterns among these states have a relationship as expected, and such a relationship must be taken into account to identify strains with atypical behaviors that are different from those of wild type. Based on this observation, we identified several strains that exhibit atypical transition patterns that cannot be fully explained by their occurrence patterns of postural states. Surprisingly, we found that two simple factors-overall acceleration of postural movement and elimination of inactivity periods-explained the behavioral characteristics of strains with very atypical transition patterns; therefore, computational analysis of animal behavior must be accompanied by evaluation of the effects of these simple factors. Finally, we found that the npr-1 and npr-3 mutants have similar behavioral patterns that were not predictable by sequence homology, proving that our data-driven approach can reveal the functions of genes that have not yet been characterized. CONCLUSION: We propose that elimination of inactivity periods and overall acceleration of postural change speed can explain behavioral phenotypes of strains with very atypical postural transition patterns. Our methods and results constitute guidelines for effectively finding strains that show "truly" interesting behaviors and systematically uncovering novel gene functions by bioimage-informatic approaches.


Asunto(s)
Proteínas de Caenorhabditis elegans/fisiología , Caenorhabditis elegans/fisiología , Animales , Conducta Animal , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/genética , Modelos Teóricos , Mutación , Receptores de Neuropéptido Y/genética , Receptores de Neuropéptido Y/fisiología
10.
Mol Biol Evol ; 30(11): 2531-40, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-23955518

RESUMEN

Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface.


Asunto(s)
Bases de Datos Genéticas , Peces/genética , Genoma Mitocondrial , Anotación de Secuencia Molecular/métodos , Animales , Evolución Molecular , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Filogenia , ARN de Transferencia/genética , Programas Informáticos
11.
Methods Mol Biol ; 2586: 163-173, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36705904

RESUMEN

The computational prediction of RNA-RNA interactions has long been studied in RNA informatics. Most of the existing approaches focused on the interaction prediction of short RNAs in small datasets. However, in recent years, two fast prediction methods, RIsearch2 and RIblast, have been developed to predict transcriptome-scale interactions or long RNA interactions. The key idea of the software acceleration of these tools was the integration of a seed-and-extend method, which is used in fast sequence alignment tools, into RNA-RNA interaction prediction. As a result, the two software programs were ten to a thousand times faster than the existing tools; because of this acceleration, detection of genome-wide microRNA target sites or interaction partners of function-unknown long noncoding RNAs has become possible. In this review, we describe the basic concept of the algorithm, its applications, and the future perspectives of the fast RNA-RNA interaction prediction tools.


Asunto(s)
MicroARNs , ARN Largo no Codificante , Transcriptoma , Programas Informáticos , MicroARNs/genética , Algoritmos , ARN Largo no Codificante/genética , Biología Computacional/métodos
12.
Methods Mol Biol ; 2586: 175-195, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36705905

RESUMEN

Non-coding RNAs have various biological functions such as translational regulation, and RNA-RNA interactions play essential roles in the mechanisms of action of these RNAs. Therefore, RNA-RNA interaction prediction is an important problem in bioinformatics, and many tools have been developed for the computational prediction of RNA-RNA interactions. In addition to the development of novel algorithms with high accuracy, the development and maintenance of web services is essential for enhancing usability by experimental biologists. In this review, we survey web services for RNA-RNA interaction predictions and introduce how to use primary web services. We present various prediction tools, including general interaction prediction tools, prediction tools for specific RNA classes, and RNA-RNA interaction-based RNA design tools. Additionally, we discuss the future perspectives of the development of RNA-RNA interaction prediction tools and the sustainability of web services.


Asunto(s)
MicroARNs , ARN , ARN/genética , Algoritmos , Biología Computacional , MicroARNs/genética
13.
Front Bioinform ; 3: 1275787, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37881622

RESUMEN

RNA accessibility is a useful RNA secondary structural feature for predicting RNA-RNA interactions and translation efficiency in prokaryotes. However, conventional accessibility calculation tools, such as Raccess, are computationally expensive and require considerable computational time to perform transcriptome-scale analysis. In this study, we developed DeepRaccess, which predicts RNA accessibility based on deep learning methods. DeepRaccess was trained to take artificial RNA sequences as input and to predict the accessibility of these sequences as calculated by Raccess. Simulation and empirical dataset analyses showed that the accessibility predicted by DeepRaccess was highly correlated with the accessibility calculated by Raccess. In addition, we confirmed that DeepRaccess could predict protein abundance in E.coli with moderate accuracy from the sequences around the start codon. We also demonstrated that DeepRaccess achieved tens to hundreds of times software speed-up in a GPU environment. The source codes and the trained models of DeepRaccess are freely available at https://github.com/hmdlab/DeepRaccess.

14.
Front Immunol ; 14: 1185322, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37614230

RESUMEN

Primary sensory neurons regulate inflammatory processes in innervated regions through neuro-immune communication. However, how their immune-modulating functions are regulated in concert remains largely unknown. Here, we show that Neat1 long non-coding RNA (lncRNA) organizes the proinflammatory gene expressions in the dorsal root ganglion (DRG) in chronic intractable neuropathic pain in rats. Neat1 was abundantly expressed in the DRG and was upregulated after peripheral nerve injury. Neat1 overexpression in primary sensory neurons caused mechanical and thermal hypersensitivity, whereas its knockdown alleviated neuropathic pain. Bioinformatics analysis of comprehensive transcriptome changes indicated the inflammatory response was the most relevant function of genes upregulated through Neat1. Consistent with this, upregulation of proinflammatory genes in the DRG following nerve injury was suppressed by Neat1 knockdown. Expression changes of these proinflammatory genes were regulated through Neat1-mRNA interaction-dependent and -independent mechanisms. Notably, Neat1 increased proinflammatory genes by stabilizing its interacting mRNAs in neuropathic pain. Finally, Neat1 in primary sensory neurons contributed to spinal inflammatory processes that mediated peripheral neuropathic pain. These findings demonstrate that Neat1 lncRNA is a key regulator of neuro-immune communication in neuropathic pain.


Asunto(s)
Neuralgia , ARN Largo no Codificante , Traumatismos del Sistema Nervioso , Animales , Ratas , ARN Largo no Codificante/genética , Ganglios Espinales , Neuralgia/genética , ARN Mensajero , Transcriptoma
15.
Comput Struct Biotechnol J ; 21: 1774-1784, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36874163

RESUMEN

The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.

16.
Bioinform Adv ; 2(1): vbac078, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36699418

RESUMEN

Motivation: RNA consensus secondary structure prediction from aligned sequences is a powerful approach for improving the secondary structure prediction accuracy. However, because the computational complexities of conventional prediction tools scale with the cube of the alignment lengths, their application to long RNA sequences, such as viral RNAs or long non-coding RNAs, requires significant computational time. Results: In this study, we developed LinAliFold and CentroidLinAliFold, fast RNA consensus secondary structure prediction tools based on minimum free energy and maximum expected accuracy principles, respectively. We achieved software acceleration using beam search methods that were successfully used for fast secondary structure prediction from a single RNA sequence. Benchmark analyses showed that LinAliFold and CentroidLinAliFold were much faster than the existing methods while preserving the prediction accuracy. As an empirical application, we predicted the consensus secondary structure of coronaviruses with approximately 30 000 nt in 5 and 79 min by LinAliFold and CentroidLinAliFold, respectively. We confirmed that the predicted consensus secondary structure of coronaviruses was consistent with the experimental results. Availability and implementation: The source codes of LinAliFold and CentroidLinAliFold are freely available at https://github.com/fukunagatsu/LinAliFold-CentroidLinAliFold. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

17.
Methods Mol Biol ; 2509: 315-340, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35796972

RESUMEN

With a large number of annotated non-coding RNAs (ncRNAs), repetitive sequences are found to constitute functional components (termed as repetitive elements) in ncRNAs that perform specific biological functions. Bioinformatics analysis is a powerful tool for improving our understanding of the role of repetitive elements in ncRNAs. This chapter summarizes recent findings that reveal the role of repetitive elements in ncRNAs. Furthermore, relevant bioinformatics approaches are systematically reviewed, which promises to provide valuable resources for studying the functional impact of repetitive elements on ncRNAs.


Asunto(s)
Biología Computacional , ARN no Traducido , ARN no Traducido/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética
18.
Bioinform Adv ; 1(1): vbab014, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-36700099

RESUMEN

Motivation: Reconstruction of gene copy number evolution is an essential approach for understanding how complex biological systems have been organized. Although various models have been proposed for gene copy number evolution, existing evolutionary models have not appropriately addressed the fact that different gene families can have very different gene gain/loss rates. Results: In this study, we developed Mirage (MIxtuRe model for Ancestral Genome Estimation), which allows different gene families to have flexible gene gain/loss rates. Mirage can use three models for formulating heterogeneous evolution among gene families: the discretized Γ model, probability distribution-free model and pattern mixture (PM) model. Simulation analysis showed that Mirage can accurately estimate heterogeneous gene gain/loss rates and reconstruct gene-content evolutionary history. Application to empirical datasets demonstrated that the PM model fits genome data from various taxonomic groups better than the other heterogeneous models. Using Mirage, we revealed that metabolic function-related gene families displayed frequent gene gains and losses in all taxa investigated. Availability and implementation: The source code of Mirage is freely available at https://github.com/fukunagatsu/Mirage. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

19.
Biol Methods Protoc ; 6(1): bpab006, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33928190

RESUMEN

Advances in experimental technologies, such as DNA sequencing, have opened up new avenues for the applications of phylogenetic methods to various fields beyond their traditional application in evolutionary investigations, extending to the fields of development, differentiation, cancer genomics, and immunogenomics. Thus, the importance of phylogenetic methods is increasingly being recognized, and the development of a novel phylogenetic approach can contribute to several areas of research. Recently, the use of hyperbolic geometry has attracted attention in artificial intelligence research. Hyperbolic space can better represent a hierarchical structure compared to Euclidean space, and can therefore be useful for describing and analyzing a phylogenetic tree. In this study, we developed a novel metric that considers the characteristics of a phylogenetic tree for representation in hyperbolic space. We compared the performance of the proposed hyperbolic embeddings, general hyperbolic embeddings, and Euclidean embeddings, and confirmed that our method could be used to more precisely reconstruct evolutionary distance. We also demonstrate that our approach is useful for predicting the nearest-neighbor node in a partial phylogenetic tree with missing nodes. Furthermore, we proposed a novel approach based on our metric to integrate multiple trees for analyzing tree nodes or imputing missing distances. This study highlights the utility of adopting a geometric approach for further advancing the applications of phylogenetic methods.

20.
Comput Struct Biotechnol J ; 19: 3198-3208, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34141139

RESUMEN

Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA