Pesquisa | Portal de Pesquisa da BVS

METASEED: a novel approach to full-length 16S rRNA gene reconstruction from short read data.

Philip, Melcy; Rudi, Knut; Ormaasen, Ida; Angell, Inga Leena; Pettersen, Ragnhild; Keeley, Nigel B; Snipen, Lars-Gustav.

BMC Bioinformatics ; 25(1): 237, 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-38997633

RESUMO

BACKGROUND: With the emergence of Oxford Nanopore technology, now the on-site sequencing of 16S rRNA from environments is available. Due to the error level and structure, the analysis of such data demands some database of reference sequences. However, many taxa from complex and diverse environments, have poor representation in publicly available databases. In this paper, we propose the METASEED pipeline for the reconstruction of full-length 16S sequences from such environments, in order to improve the reference for the subsequent use of on-site sequencing. RESULTS: We show that combining high-precision short-read sequencing of both 16S and full metagenome from the same samples allow us to reconstruct high-quality 16S sequences from the more abundant taxa. A significant novelty is the carefully designed collection of metagenome reads that matches the 16S amplicons, based on a combination of uniqueness and abundance. Compared to alternative approaches this produces superior results. CONCLUSION: Our pipeline will facilitate numerous studies associated with various unknown microorganisms, thus allowing the comprehension of the diverse environments. The pipeline is a potential tool in generating a full length 16S rRNA gene database for any environment.

Assuntos

Metagenoma , RNA Ribossômico 16S , RNA Ribossômico 16S/genética , Metagenoma/genética , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas

A multi-modal data harmonisation approach for discovery of COVID-19 drug targets.

Chen, Tyrone; Philip, Melcy; Lê Cao, Kim-Anh; Tyagi, Sonika.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34036326

RESUMO

Despite the volume of experiments performed and data available, the complex biology of coronavirus SARS-COV-2 is not yet fully understood. Existing molecular profiling studies have focused on analysing functional omics data of a single type, which captures changes in a small subset of the molecular perturbations caused by the virus. As the logical next step, results from multiple such omics analysis may be aggregated to comprehensively interpret the molecular mechanisms of SARS-CoV-2. An alternative approach is to integrate data simultaneously in a parallel fashion to highlight the inter-relationships of disease-driving biomolecules, in contrast to comparing processed information from each omics level separately. We demonstrate that valuable information may be masked by using the former fragmented views in analysis, and biomarkers resulting from such an approach cannot provide a systematic understanding of the disease aetiology. Hence, we present a generic, reproducible and flexible open-access data harmonisation framework that can be scaled out to future multi-omics analysis to study a phenotype in a holistic manner. The pipeline source code, detailed documentation and automated version as a R package are accessible. To demonstrate the effectiveness of our pipeline, we applied it to a drug screening task. We integrated multi-omics data to find the lowest level of statistical associations between data features in two case studies. Strongly correlated features within each of these two datasets were used for drug-target analysis, resulting in a list of 84 drug-target candidates. Further computational docking and toxicity analyses revealed seven high-confidence targets, amsacrine, bosutinib, ceritinib, crizotinib, nintedanib and sunitinib as potential starting points for drug therapy and development.

Assuntos

Tratamento Farmacológico da COVID-19 , Genômica , Terapia de Alvo Molecular , SARS-CoV-2/efeitos dos fármacos , Algoritmos , Biomarcadores/química , COVID-19/genética , COVID-19/patologia , COVID-19/virologia , Biologia Computacional , Bases de Dados Genéticas , Humanos , SARS-CoV-2/química , SARS-CoV-2/genética , Software

Linc2function: A Comprehensive Pipeline and Webserver for Long Non-Coding RNA (lncRNA) Identification and Functional Predictions Using Deep Learning Approaches.

Ramakrishnaiah, Yashpal; Morris, Adam P; Dhaliwal, Jasbir; Philip, Melcy; Kuhlmann, Levin; Tyagi, Sonika.

Epigenomes ; 7(3)2023 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-37754274

RESUMO

Long non-coding RNAs (lncRNAs), comprising a significant portion of the human transcriptome, serve as vital regulators of cellular processes and potential disease biomarkers. However, the function of most lncRNAs remains unknown, and furthermore, existing approaches have focused on gene-level investigation. Our work emphasizes the importance of transcript-level annotation to uncover the roles of specific transcript isoforms. We propose that understanding the mechanisms of lncRNA in pathological processes requires solving their structural motifs and interactomes. A complete lncRNA annotation first involves discriminating them from their coding counterparts and then predicting their functional motifs and target bio-molecules. Current in silico methods mainly perform primary-sequence-based discrimination using a reference model, limiting their comprehensiveness and generalizability. We demonstrate that integrating secondary structure and interactome information, in addition to using transcript sequence, enables a comprehensive functional annotation. Annotating lncRNA for newly sequenced species is challenging due to inconsistencies in functional annotations, specialized computational techniques, limited accessibility to source code, and the shortcomings of reference-based methods for cross-species predictions. To address these challenges, we developed a pipeline for identifying and annotating transcript sequences at the isoform level. We demonstrate the effectiveness of the pipeline by comprehensively annotating the lncRNA associated with two specific disease groups. The source code of our pipeline is available under the MIT licensefor local use by researchers to make new predictions using the pre-trained models or to re-train models on new sequence datasets. Non-technical users can access the pipeline through a web server setup.

A Survey of Current Resources to Study lncRNA-Protein Interactions.

Philip, Melcy; Chen, Tyrone; Tyagi, Sonika.

Noncoding RNA ; 7(2)2021 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-34201302

RESUMO

Phenotypes are driven by regulated gene expression, which in turn are mediated by complex interactions between diverse biological molecules. Protein-DNA interactions such as histone and transcription factor binding are well studied, along with RNA-RNA interactions in short RNA silencing of genes. In contrast, lncRNA-protein interaction (LPI) mechanisms are comparatively unknown, likely directed by the difficulties in studying LPI. However, LPI are emerging as key interactions in epigenetic mechanisms, playing a role in development and disease. Their importance is further highlighted by their conservation across kingdoms. Hence, interest in LPI research is increasing. We therefore review the current state of the art in lncRNA-protein interactions. We specifically surveyed recent computational methods and databases which researchers can exploit for LPI investigation. We discovered that algorithm development is heavily reliant on a few generic databases containing curated LPI information. Additionally, these databases house information at gene-level as opposed to transcript-level annotations. We show that early methods predict LPI using molecular docking, have limited scope and are slow, creating a data processing bottleneck. Recently, machine learning has become the strategy of choice in LPI prediction, likely due to the rapid growth in machine learning infrastructure and expertise. While many of these methods have notable limitations, machine learning is expected to be the basis of modern LPI prediction algorithms.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA