Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 652
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Trends Biochem Sci ; 48(6): 527-538, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37061423

RESUMO

Protein-protein interactions (PPIs) drive biological processes, and disruption of PPIs can cause disease. With recent breakthroughs in structure prediction and a deluge of genomic sequence data, computational methods to predict PPIs and model spatial structures of protein complexes are now approaching the accuracy of experimental approaches for permanent interactions and show promise for elucidating transient interactions. As we describe here, the key to this success is rich evolutionary information deciphered from thousands of homologous sequences that coevolve in interacting partners. This covariation signal, revealed by sophisticated statistical and machine learning (ML) algorithms, predicts physiological interactions. Accurate artificial intelligence (AI)-based modeling of protein structures promises to provide accurate 3D models of PPIs at a proteome-wide scale.


Assuntos
Inteligência Artificial , Mapeamento de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Aprendizado de Máquina , Proteoma , Biologia Computacional/métodos
2.
Proc Natl Acad Sci U S A ; 121(35): e2410662121, 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39163334

RESUMO

Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins.


Assuntos
Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas/química , Adenilato Quinase/química , Adenilato Quinase/metabolismo , Regulação Alostérica , Aprendizado Profundo
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38695119

RESUMO

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent. It relies on recent advances in deep learning architectures that employ self-supervised learning in order to leverage the power of enormous amounts of unlabelled data to generate contextual embeddings, which are vector representations for words. These ideas have been applied to protein sequences, producing embedding vectors for protein residues. We propose the E-score between two residues as the cosine similarity between their embedding vector representations. Thorough testing on a wide variety of reference multiple sequence alignments indicate that the alignments produced using the new $E$-score method, especially ProtT5-score, are significantly better than those obtained using BLOSUM matrices. The new method proposes to change the way alignments are computed, with far-reaching implications in all areas of textual data that use sequence similarity. The program to compute alignments based on various $E$-scores is available as a web server at e-score.csd.uwo.ca. The source code is freely available for download from github.com/lucian-ilie/E-score.


Assuntos
Algoritmos , Biologia Computacional , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Software , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Proteínas/química , Proteínas/genética , Aprendizado Profundo , Bases de Dados de Proteínas
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38517696

RESUMO

With the rapid development of single-molecule sequencing (SMS) technologies, the output read length is continuously increasing. Mapping such reads onto a reference genome is one of the most fundamental tasks in sequence analysis. Mapping sensitivity is becoming a major concern since high sensitivity can detect more aligned regions on the reference and obtain more aligned bases, which are useful for downstream analysis. In this study, we present pathMap, a novel k-mer graph-based mapper that is specifically designed for mapping SMS reads with high sensitivity. By viewing the alignment chain as a path containing as many anchors as possible in the matched k-mer graph, pathMap treats chaining as a path selection problem in the directed graph. pathMap iteratively searches the longest path in the remaining nodes; more candidate chains with high quality can be effectively detected and aligned. Compared to other state-of-the-art mapping methods such as minimap2 and Winnowmap2, experiment results on simulated and real-life datasets demonstrate that pathMap obtains the number of mapped chains at least 11.50% more than its closest competitor and increases the mapping sensitivity by 17.28% and 13.84% of bases over the next-best mapper for Pacific Biosciences and Oxford Nanopore sequencing data, respectively. In addition, pathMap is more robust to sequence errors and more sensitive to species- and strain-specific identification of pathogens using MinION reads.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento por Nanoporos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma , Software , Algoritmos
5.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39489605

RESUMO

Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is critical for our understanding of the adaptive immune system's dynamics in health and disease. Reliable analysis of AIRR-seq data depends on accurate rearranged immunoglobulin (Ig) sequence alignment. Various Ig sequence aligners exist, but there is no unified benchmarking standard representing the complexities of AIRR-seq data, obscuring objective comparisons of aligners across tasks. Here, we introduce GenAIRR, a modular simulation framework for generating Ig sequences alongside their ground truths. GenAIRR realistically simulates the intricacies of V(D)J recombination, somatic hypermutation, and an array of sequence corruptions. We comprehensively assessed prominent Ig sequence aligners across various metrics, unveiling unique performance characteristics for each aligner. The GenAIRR-produced datasets, combined with the proposed rigorous evaluation criteria, establish a solid basis for unbiased benchmarking of immunogenetics computational tools. It sets up the ground for further improving the crucial task of Ig sequence alignment, ultimately enhancing our understanding of adaptive immunity.


Assuntos
Imunoglobulinas , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Humanos , Imunoglobulinas/genética , Imunoglobulinas/imunologia , Imunoglobulinas/química , Biologia Computacional/métodos , Recombinação V(D)J , Imunidade Adaptativa/genética , Software , Algoritmos
6.
Semin Cell Dev Biol ; 150-151: 28-34, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37095033

RESUMO

Mutations in the gene encoding the Adenomatous polyposis coli protein (APC) were discovered as driver mutations in colorectal cancers almost 30 years ago. Since then, the importance of APC in normal tissue homeostasis has been confirmed in a plethora of other (model) organisms spanning a large evolutionary space. APC is a multifunctional protein, with roles as a key scaffold protein in complexes involved in diverse signalling pathways, most prominently the Wnt signalling pathway. APC is also a cytoskeletal regulator with direct and indirect links to and impacts on all three major cytoskeletal networks. Correspondingly, a wide range of APC binding partners have been identified. Mutations in APC are extremely strongly associated with colorectal cancers, particularly those that result in the production of truncated proteins and the loss of significant regions from the remaining protein. Understanding the complement of its role in health and disease requires knowing the relationship between and regulation of its diverse functions and interactions. This in turn requires understanding its structural and biochemical features. Here we set out to provide a brief overview of the roles and function of APC and then explore its conservation and structure using the extensive sequence data, which is now available, and spans a broad range of taxonomy. This revealed conservation of APC across taxonomy and new relationships between different APC protein families.


Assuntos
Proteína da Polipose Adenomatosa do Colo , Polipose Adenomatosa do Colo , Humanos , Proteína da Polipose Adenomatosa do Colo/genética , Proteína da Polipose Adenomatosa do Colo/metabolismo , Polipose Adenomatosa do Colo/genética , Polipose Adenomatosa do Colo/metabolismo , Mutação , Citoesqueleto/metabolismo , Via de Sinalização Wnt/genética
7.
Mol Biol Evol ; 41(7)2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-38842253

RESUMO

Despite having important biological implications, insertion, and deletion (indel) events are often disregarded or mishandled during phylogenetic inference. In multiple sequence alignment, indels are represented as gaps and are estimated without considering the distinct evolutionary history of insertions and deletions. Consequently, indels are usually excluded from subsequent inference steps, such as ancestral sequence reconstruction and phylogenetic tree search. Here, we introduce indel-aware parsimony (indelMaP), a novel way to treat gaps under the parsimony criterion by considering insertions and deletions as separate evolutionary events and accounting for long indels. By identifying the precise location of an evolutionary event on the tree, we can separate overlapping indel events and use affine gap penalties for long indel modeling. Our indel-aware approach harnesses the phylogenetic signal from indels, including them into all inference stages. Validation and comparison to state-of-the-art inference tools on simulated data show that indelMaP is most suitable for densely sampled datasets with closely to moderately related sequences, where it can reach alignment quality comparable to probabilistic methods and accurately infer ancestral sequences, including indel patterns. Due to its remarkable speed, our method is well suited for epidemiological datasets, eliminating the need for downsampling and enabling the exploitation of the additional information provided by dense taxonomic sampling. Moreover, indelMaP offers new insights into the indel patterns of biologically significant sequences and advances our understanding of genetic variability by considering gaps as crucial evolutionary signals rather than mere artefacts.


Assuntos
Mutação INDEL , Filogenia , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Evolução Molecular , Modelos Genéticos , Humanos
8.
Mol Biol Evol ; 41(7)2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-39041199

RESUMO

The current trend in phylogenetic and evolutionary analyses predominantly relies on omic data. However, prior to core analyses, traditional methods typically involve intricate and time-consuming procedures, including assembly from high-throughput reads, decontamination, gene prediction, homology search, orthology assignment, multiple sequence alignment, and matrix trimming. Such processes significantly impede the efficiency of research when dealing with extensive data sets. In this study, we develop PhyloAln, a convenient reference-based tool capable of directly aligning high-throughput reads or complete sequences with existing alignments as a reference for phylogenetic and evolutionary analyses. Through testing with simulated data sets of species spanning the tree of life, PhyloAln demonstrates consistently robust performance compared with other reference-based tools across different data types, sequencing technologies, coverages, and species, with percent completeness and identity at least 50 percentage points higher in the alignments. Additionally, we validate the efficacy of PhyloAln in removing a minimum of 90% foreign and 70% cross-contamination issues, which are prevalent in sequencing data but often overlooked by other tools. Moreover, we showcase the broad applicability of PhyloAln by generating alignments (completeness mostly larger than 80%, identity larger than 90%) and reconstructing robust phylogenies using real data sets of transcriptomes of ladybird beetles, plastid genes of peppers, or ultraconserved elements of turtles. With these advantages, PhyloAln is expected to facilitate phylogenetic and evolutionary analyses in the omic era. The tool is accessible at https://github.com/huangyh45/PhyloAln.


Assuntos
Filogenia , Alinhamento de Sequência , Software , Alinhamento de Sequência/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Evolução Molecular
9.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37668049

RESUMO

The Sequence Alignment/Map (SAM) format file is the text file used to record alignment information. Alignment is the core of sequencing analysis, and downstream tasks accept mapping results for further processing. Given the rapid development of the sequencing industry today, a comprehensive understanding of the SAM format and related tools is necessary to meet the challenges of data processing and analysis. This paper is devoted to retrieving knowledge in the broad field of SAM. First, the format of SAM is introduced to understand the overall process of the sequencing analysis. Then, existing work is systematically classified in accordance with generation, compression and application, and the involved SAM tools are specifically mined. Lastly, a summary and some thoughts on future directions are provided.


Assuntos
Alinhamento de Sequência
10.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36460624

RESUMO

Protein model quality assessment plays an important role in protein structure prediction, protein design and drug discovery. In this work, DeepUMQA2, a substantially improved version of DeepUMQA for protein model quality assessment, is proposed. First, sequence features containing protein co-evolution information and structural features reflecting family information are extracted to complement model-dependent features. Second, a novel backbone network based on triangular multiplication update and axial attention mechanism is designed to enhance information exchange between inter-residue pairs. On CASP13 and CASP14 datasets, the performance of DeepUMQA2 increases by 20.5 and 20.4% compared with DeepUMQA, respectively (measured by top 1 loss). Moreover, on the three-month CAMEO dataset (11 March to 04 June 2022), DeepUMQA2 outperforms DeepUMQA by 15.5% (measured by local AUC0,0.2) and ranks first among all competing server methods in CAMEO blind test. Experimental results show that DeepUMQA2 outperforms state-of-the-art model quality assessment methods, such as ProQ3D-LDDT, ModFOLD8, and DeepAccNet and DeepUMQA2 can select more suitable best models than state-of-the-art protein structure methods, such as AlphaFold2, RoseTTAFold and I-TASSER, provided themselves.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Modelos Moleculares , Redes Neurais de Computação , Proteínas/química , Conformação Proteica
11.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37200156

RESUMO

Multiple sequence alignment is widely used for sequence analysis, such as identifying important sites and phylogenetic analysis. Traditional methods, such as progressive alignment, are time-consuming. To address this issue, we introduce StarTree, a novel method to fast construct a guide tree by combining sequence clustering and hierarchical clustering. Furthermore, we develop a new heuristic similar region detection algorithm using the FM-index and apply the k-banded dynamic program to the profile alignment. We also introduce a win-win alignment algorithm that applies the central star strategy within the clusters to fast the alignment process, then uses the progressive strategy to align the central-aligned profiles, guaranteeing the final alignment's accuracy. We present WMSA 2 based on these improvements and compare the speed and accuracy with other popular methods. The results show that the guide tree made by the StarTree clustering method can lead to better accuracy than that of PartTree while consuming less time and memory than that of UPGMA and mBed methods on datasets with thousands of sequences. During the alignment of simulated data sets, WMSA 2 can consume less time and memory while ranking at the top of Q and TC scores. The WMSA 2 is still better at the time, and memory efficiency on the real datasets and ranks at the top on the average sum of pairs score. For the alignment of 1 million SARS-CoV-2 genomes, the win-win mode of WMSA 2 significantly decreased the consumption time than the former version. The source code and data are available at https://github.com/malabz/WMSA2.


Assuntos
COVID-19 , RNA , Humanos , Alinhamento de Sequência , Filogenia , SARS-CoV-2/genética , Software , Algoritmos , DNA/genética
12.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37321965

RESUMO

In recent years, protein structure problems have become a hotspot for understanding protein folding and function mechanisms. It has been observed that most of the protein structure works rely on and benefit from co-evolutionary information obtained by multiple sequence alignment (MSA). As an example, AlphaFold2 (AF2) is a typical MSA-based protein structure tool which is famous for its high accuracy. As a consequence, these MSA-based methods are limited by the quality of the MSAs. Especially for orphan proteins that have no homologous sequence, AlphaFold2 performs unsatisfactorily as MSA depth decreases, which may pose a barrier to its widespread application in protein mutation and design problems in which there are no rich homologous sequences and rapid prediction is needed. In this paper, we constructed two standard datasets for orphan and de novo proteins which have insufficient/none homology information, called Orphan62 and Design204, respectively, to fairly evaluate the performance of the various methods in this case. Then, depending on whether or not utilizing scarce MSA information, we summarized two approaches, MSA-enhanced and MSA-free methods, to effectively solve the issue without sufficient MSAs. MSA-enhanced model aims to improve poor MSA quality from the data source by knowledge distillation and generation models. MSA-free model directly learns the relationship between residues on enormous protein sequences from pre-trained models, bypassing the step of extracting the residue pair representation from MSA. Next, we evaluated the performance of four MSA-free methods (trRosettaX-Single, TRFold, ESMFold and ProtT5) and MSA-enhanced (Bagging MSA) method compared with a traditional MSA-based method AlphaFold2, in two protein structure-related prediction tasks, respectively. Comparison analyses show that trRosettaX-Single and ESMFold which belong to MSA-free method can achieve fast prediction ($\sim\! 40$s) and comparable performance compared with AF2 in tertiary structure prediction, especially for short peptides, $\alpha $-helical segments and targets with few homologous sequences. Bagging MSA utilizing MSA enhancement improves the accuracy of our trained base model which is an MSA-based method when poor homology information exists in secondary structure prediction. Our study provides biologists an insight of how to select rapid and appropriate prediction tools for enzyme engineering and peptide drug development. CONTACT: guofei@csu.edu.cn, jj.tang@siat.ac.cn.


Assuntos
Algoritmos , Furilfuramida , Alinhamento de Sequência , Proteínas/química , Sequência de Aminoácidos
13.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36946414

RESUMO

In the era of constantly increasing amounts of the available protein data, a relevant and interpretable visualization becomes crucial, especially for tasks requiring human expertise. Poincaré disk projection has previously demonstrated its important efficiency for visualization of biological data such as single-cell RNAseq data. Here, we develop a new method PoincaréMSA for visual representation of complex relationships between protein sequences based on Poincaré maps embedding. We demonstrate its efficiency and potential for visualization of protein family topology as well as evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented in open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.


Assuntos
Proteínas , Software , Humanos , Sequência de Aminoácidos , Evolução Biológica
14.
BMC Bioinformatics ; 25(1): 109, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38475727

RESUMO

BACKGROUND: Parent-of-origin allele-specific gene expression (ASE) can be detected in interspecies hybrids by virtue of RNA sequence variants between the parental haplotypes. ASE is detectable by differential expression analysis (DEA) applied to the counts of RNA-seq read pairs aligned to parental references, but aligners do not always choose the correct parental reference. RESULTS: We used public data for species that are known to hybridize. We measured our ability to assign RNA-seq read pairs to their proper transcriptome or genome references. We tested software packages that assign each read pair to a reference position and found that they often favored the incorrect species reference. To address this problem, we introduce a post process that extracts alignment features and trains a random forest classifier to choose the better alignment. On each simulated hybrid dataset tested, our machine-learning post-processor achieved higher accuracy than the aligner by itself at choosing the correct parent-of-origin per RNA-seq read pair. CONCLUSIONS: For the parent-of-origin classification of RNA-seq, machine learning can improve the accuracy of alignment-based methods. This approach could be useful for enhancing ASE detection in interspecies hybrids, though RNA-seq from real hybrids may present challenges not captured by our simulations. We believe this is the first application of machine learning to this problem domain.


Assuntos
Software , Transcriptoma , RNA-Seq , Análise de Sequência de RNA/métodos , Aprendizado de Máquina
15.
BMC Bioinformatics ; 25(1): 247, 2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39075359

RESUMO

BACKGROUND: Sequence alignment lies at the heart of genome sequence annotation. While the BLAST suite of alignment tools has long held an important role in alignment-based sequence database search, greater sensitivity is achieved through the use of profile hidden Markov models (pHMMs). Here, we describe an FPGA hardware accelerator, called HAVAC, that targets a key bottleneck step (SSV) in the analysis pipeline of the popular pHMM alignment tool, HMMER. RESULTS: The HAVAC kernel calculates the SSV matrix at 1739 GCUPS on a ∼  $3000 Xilinx Alveo U50 FPGA accelerator card, ∼  227× faster than the optimized SSV implementation in nhmmer. Accounting for PCI-e data transfer data processing, HAVAC is 65× faster than nhmmer's SSV with one thread and 35× faster than nhmmer with four threads, and uses ∼  31% the energy of a traditional high end Intel CPU. CONCLUSIONS: HAVAC demonstrates the potential offered by FPGA hardware accelerators to produce dramatic speed gains in sequence annotation and related bioinformatics applications. Because these computations are performed on a co-processor, the host CPU remains free to simultaneously compute other aspects of the analysis pipeline.


Assuntos
Cadeias de Markov , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Homologia de Sequência , Algoritmos , Software
16.
BMC Bioinformatics ; 25(1): 85, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38413857

RESUMO

PURPOSE: Despite the many progresses with alignment algorithms, aligning divergent protein sequences with less than 20-35% pairwise identity (so called "twilight zone") remains a difficult problem. Many alignment algorithms have been using substitution matrices since their creation in the 1970's to generate alignments, however, these matrices do not work well to score alignments within the twilight zone. We developed Protein Embedding based Alignments, or PEbA, to better align sequences with low pairwise identity. Similar to the traditional Smith-Waterman algorithm, PEbA uses a dynamic programming algorithm but the matching score of amino acids is based on the similarity of their embeddings from a protein language model. METHODS: We tested PEbA on over twelve thousand benchmark pairwise alignments from BAliBASE, each one extracted from one of their multiple sequence alignments. Five different BAliBASE references were used, each with different sequence identities, motifs, and lengths, allowing PEbA to showcase how well it aligns under different circumstances. RESULTS: PEbA greatly outperformed BLOSUM substitution matrix-based pairwise alignments, achieving different levels of improvements of the alignment quality for pairs of sequences with different levels of similarity (over four times as well for pairs of sequences with <10% identity). We also compared PEbA with embeddings generated by different protein language models (ProtT5 and ESM-2) and found that ProtT5-XL-U50 produced the most useful embeddings for aligning protein sequences. PEbA also outperformed DEDAL and vcMSA, two recently developed protein language model embedding-based alignment methods. CONCLUSION: Our results suggested that general purpose protein language models provide useful contextual information for generating more accurate protein alignments than typically used methods.


Assuntos
Ácidos Borônicos , Proteínas , Proteínas/química , Sequência de Aminoácidos , Alinhamento de Sequência , Algoritmos
17.
BMC Genomics ; 25(1): 973, 2024 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-39415087

RESUMO

BACKGROUND: Multiple sequence alignment (MSA) has proven extremely useful in computational biology, especially in inferring evolutionary relationships via phylogenetic analysis and providing insight into protein structure and function. An alternative to the standard MSA model is partial order alignment (POA), in which aligned sequences are represented as paths in a graph rather than rows in a matrix. While the POA model has proven useful in several applications (e.g. sequencing reads assembly and pangenome structure exploration), we lack efficient visualization tools that could highlight its advantages. RESULTS: We propose Sequence Flow - a web application designed to address the above problem. Sequence Flow presents the POA as a Sankey diagram, a kind of graph visualisation typically used for graphs representing flowcharts. Sequence Flow enables interactive alignment exploration, including fragment selection, highlighting a selected group of sequences, modification of the position of graph nodes, structure simplification etc. After adjustment, the visualization can be saved as a high-quality graphic file. Thanks to the use of SanKEY.js - a JavaScript library for creating Sankey diagrams, designed specifically to visualize POAs, Sequence Flow provides satisfactory performance even with large alignments. CONCLUSIONS: We provide Sankey diagram-based POA visualization tools for both end users (Sequence Flow) and bioinformatic software developers (SanKEY.js). Sequence Flow webservice is available at https://sequenceflow.mimuw.edu.pl/ . The source code for SanKEY.js is available at https://github.com/Krzysiekzd/SanKEY.js and for Sequence Flow at https://github.com/Krzysiekzd/SequenceFlow .


Assuntos
Internet , Alinhamento de Sequência , Software , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Interface Usuário-Computador , Gráficos por Computador
18.
Biochem Biophys Res Commun ; 690: 149096, 2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-37988924

RESUMO

Electron-driven process helps the living organism in the generations of energy, biomass production and detoxification of synthetic compounds. Soluble quinone oxidoreductases (QORs) mediate the transfer of an electron from NADPH to various quinone and other compounds, helping in the detoxification of quinones. QORs play a crucial role in cellular metabolism and are thus potential targets for drug development. Here we report the crystal structure of the NADPH-dependent QOR from Leishmania donovani (LdQOR) at 2.05 Å. The enzyme exists as a homo-dimer, with each protomer consisting of two domains, responsible for binding NADPH cofactor and the substrate. Interestingly, the human QOR exists as a tetramer. Comparative analysis of the oligomeric interfaces of LdQOR with HsQOR shows no significant differences in the protomer/dimer assembly. The tetrameric interface of HsQOR is stabilized by salt bridges formed between Arg 169 and Glu 271 which is non-existent in LdQOR, with an Alanine replacing the glutamate. This distinct feature is conserved across other dimeric QORs, indicating the importance of this interaction for tetramer association. Among the homologs, the sequences of the loop region involved in the stabilization and binding of the adenine ring of the NADPH shows significant differences except for an Arginine & glycine residues. In dimer QORs, this Arginine acts as a gate to the co-factor, while the NADPH binding mode in the human homolog is distinct, stabilized by His 200 and Asn 229, which are not conserved in LdQOR. These distinct features have the potential to be utilized for therapeutic interventions.


Assuntos
NAD(P)H Desidrogenase (Quinona) , Quinona Redutases , Humanos , NADP/metabolismo , Subunidades Proteicas , NAD(P)H Desidrogenase (Quinona)/metabolismo , Quinona Redutases/química , Quinona Redutases/metabolismo , Quinonas , Arginina , Sítios de Ligação , Cristalografia por Raios X
19.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35671504

RESUMO

The identification of the conserved and variable regions in the multiple sequence alignment (MSA) is critical to accelerating the process of understanding the function of genes. MSA visualizations allow us to transform sequence features into understandable visual representations. As the sequence-structure-function relationship gains increasing attention in molecular biology studies, the simple display of nucleotide or protein sequence alignment is not satisfied. A more scalable visualization is required to broaden the scope of sequence investigation. Here we present ggmsa, an R package for mining comprehensive sequence features and integrating the associated data of MSA by a variety of display methods. To uncover sequence conservation patterns, variations and recombination at the site level, sequence bundles, sequence logos, stacked sequence alignment and comparative plots are implemented. ggmsa supports integrating the correlation of MSA sequences and their phenotypes, as well as other traits such as ancestral sequences, molecular structures, molecular functions and expression levels. We also design a new visualization method for genome alignments in multiple alignment format to explore the pattern of within and between species variation. Combining these visual representations with prime knowledge, ggmsa assists researchers in discovering MSA and making decisions. The ggmsa package is open-source software released under the Artistic-2.0 license, and it is freely available on Bioconductor (https://bioconductor.org/packages/ggmsa) and Github (https://github.com/YuLab-SMU/ggmsa).


Assuntos
Genoma , Software , Sequência de Aminoácidos , Matrizes de Pontuação de Posição Específica , Alinhamento de Sequência
20.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35272347

RESUMO

Multiple sequence alignment (MSA) is an essential cornerstone in bioinformatics, which can reveal the potential information in biological sequences, such as function, evolution and structure. MSA is widely used in many bioinformatics scenarios, such as phylogenetic analysis, protein analysis and genomic analysis. However, MSA faces new challenges with the gradual increase in sequence scale and the increasing demand for alignment accuracy. Therefore, developing an efficient and accurate strategy for MSA has become one of the research hotspots in bioinformatics. In this work, we mainly summarize the algorithms for MSA and its applications in bioinformatics. To provide a structured and clear perspective, we systematically introduce MSA's knowledge, including background, database, metric and benchmark. Besides, we list the most common applications of MSA in the field of bioinformatics, including database searching, phylogenetic analysis, genomic analysis, metagenomic analysis and protein analysis. Furthermore, we categorize and analyze classical and state-of-the-art algorithms, divided into progressive alignment, iterative algorithm, heuristics, machine learning and divide-and-conquer. Moreover, we also discuss the challenges and opportunities of MSA in bioinformatics. Our work provides a comprehensive survey of MSA applications and their relevant algorithms. It could bring valuable insights for researchers to contribute their knowledge to MSA and relevant studies.


Assuntos
Algoritmos , Biologia Computacional , Aprendizado de Máquina , Filogenia , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA