Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 101.537
Filter
1.
Article in English | MEDLINE | ID: mdl-38872612

ABSTRACT

Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by incorporating the non-coding RNA (ncRNA) sequences from RNAcentral, the transcriptome assembly and metagenome assembly from metagenomics RAST (MG-RAST), the genomic sequences from Genome Warehouse (GWH), and the genomic sequences from MGnify, in addition to the nucleotide (nt) database and its subsets in National Center of Biotechnology Information (NCBI). The resulting Master database of All possible RNA sequences (MARS) is 20-fold larger than NCBI's nt database or 60-fold larger than RNAcentral. The new dataset along with a new split-search strategy allows a substantial improvement in homology search over existing state-of-the-art techniques. It also yields more accurate and more sensitive multiple sequence alignments (MSAs) than manually curated MSAs from Rfam for the majority of structured RNAs mapped to Rfam. The results indicate that MARS coupled with the fully automatic homology search tool RNAcmap will be useful for improved structural and functional inference of ncRNAs and RNA language models based on MSAs. MARS is accessible at https://ngdc.cncb.ac.cn/omix/release/OMIX003037, and RNAcmap3 is accessible at http://zhouyq-lab.szbl.ac.cn/download/.


Subject(s)
Databases, Nucleic Acid , Sequence Alignment , RNA, Untranslated/genetics , RNA, Untranslated/chemistry , Sequence Analysis, RNA/methods , RNA/genetics , RNA/chemistry , Software , Databases, Genetic
2.
HLA ; 103(6): e15545, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38880985

ABSTRACT

HLA-C*04:520, a novel HLA-C allele, differs from HLA-C*04:01:01 by one mismatch in exon 5.


Subject(s)
Alleles , Base Sequence , Exons , HLA-C Antigens , Histocompatibility Testing , Humans , HLA-C Antigens/genetics , Sequence Analysis, DNA/methods , Sequence Alignment , Codon , Tissue Donors
3.
HLA ; 103(6): e15558, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38887878

ABSTRACT

The novel KIR2DL3*00111 allele differs from the closest allele KIR2DL3*00101 by a single silent mutation.


Subject(s)
Receptors, KIR2DL3 , Humans , Alleles , Base Sequence , China , East Asian People , Exons , Receptors, KIR2DL3/genetics , Sequence Alignment , Sequence Analysis, DNA/methods
4.
HLA ; 103(6): e15562, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38887867

ABSTRACT

Two nucleotide substitutions in codon 152 of HLA-C*08:01:01:01 result in a novel allele HLA-C*08:66.


Subject(s)
Exons , HLA-C Antigens , Histocompatibility Testing , Humans , Alleles , Base Sequence , Codon , Histocompatibility Testing/methods , HLA-C Antigens/genetics , Sequence Alignment , Sequence Analysis, DNA/methods , Taiwan
5.
HLA ; 103(6): e15546, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38887907

ABSTRACT

A nucleotide deletion in the residue 371 of HLA-A*11:01:01:01 results in a novel allele HLA-A*11:466N.


Subject(s)
Exons , HLA-A11 Antigen , Histocompatibility Testing , Humans , Alleles , Base Sequence , Codon , HLA-A11 Antigen/genetics , Sequence Alignment , Sequence Analysis, DNA , Sequence Deletion , Taiwan
8.
Bioinformatics ; 40(Supplement_1): i208-i217, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940166

ABSTRACT

MOTIVATION: Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein's bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance. RESULTS: Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets. AVAILABILITY AND IMPLEMENTATION: The data supporting this work are available in the Figshare repository at https://doi.org/10.6084/m9.figshare.25050554.v1, and the underlying code is accessible via GitHub at https://github.com/noaeker/bootstrap_repo.


Subject(s)
Algorithms , Machine Learning , Phylogeny , Software , Sequence Alignment/methods , Computational Biology/methods , Likelihood Functions
9.
Bioinformatics ; 40(Supplement_1): i328-i336, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940160

ABSTRACT

SUMMARY: Multiple sequence alignment is an important problem in computational biology with applications that include phylogeny and the detection of remote homology between protein sequences. UPP is a popular software package that constructs accurate multiple sequence alignments for large datasets based on ensembles of hidden Markov models (HMMs). A computational bottleneck for this method is a sequence-to-HMM assignment step, which relies on the precise computation of probability scores on the HMMs. In this work, we show that we can speed up this assignment step significantly by replacing these HMM probability scores with alternative scores that can be efficiently estimated. Our proposed approach utilizes a multi-armed bandit algorithm to adaptively and efficiently compute estimates of these scores. This allows us to achieve similar alignment accuracy as UPP with a significant reduction in computation time, particularly for datasets with long sequences. AVAILABILITY AND IMPLEMENTATION: The code used to produce the results in this paper is available on GitHub at: https://github.com/ilanshom/adaptiveMSA.


Subject(s)
Algorithms , Markov Chains , Sequence Alignment , Software , Sequence Alignment/methods , Computational Biology/methods , Sequence Analysis, Protein/methods , Phylogeny , Proteins/chemistry
10.
Bioinformatics ; 40(Supplement_1): i337-i346, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940164

ABSTRACT

MOTIVATION: Exponential growth in sequencing databases has motivated scalable De Bruijn graph-based (DBG) indexing for searching these data, using annotations to label nodes with sample IDs. Low-depth sequencing samples correspond to fragmented subgraphs, complicating finding the long contiguous walks required for alignment queries. Aligners that target single-labelled subgraphs reduce alignment lengths due to fragmentation, leading to low recall for long reads. While some (e.g. label-free) aligners partially overcome fragmentation by combining information from multiple samples, biologically irrelevant combinations in such approaches can inflate the search space or reduce accuracy. RESULTS: We introduce a new scoring model, 'multi-label alignment' (MLA), for annotated DBGs. MLA leverages two new operations: To promote biologically relevant sample combinations, 'Label Change' incorporates more informative global sample similarity into local scores. To improve connectivity, 'Node Length Change' dynamically adjusts the DBG node length during traversal. Our fast, approximate, yet accurate MLA implementation has two key steps: a single-label seed-chain-extend aligner (SCA) and a multi-label chainer (MLC). SCA uses a traditional scoring model adapting recent chaining improvements to assembly graphs and provides a curated pool of alignments. MLC extracts seed anchors from SCAs alignments, produces multi-label chains using MLA scoring, then finally forms multi-label alignments. We show via substantial improvements in taxonomic classification accuracy that MLA produces biologically relevant alignments, decreasing average weighted UniFrac errors by 63.1%-66.8% and covering 45.5%-47.4% (median) more long-read query characters than state-of-the-art aligners. MLAs runtimes are competitive with label-combining alignment and substantially faster than single-label alignment. AVAILABILITY AND IMPLEMENTATION: The data, scripts, and instructions for generating our results are available at https://github.com/ratschlab/mla.


Subject(s)
Algorithms , Sequence Alignment , Sequence Alignment/methods , Software , Computational Biology/methods , Sequence Analysis, DNA/methods , Databases, Genetic
11.
Bioinformatics ; 40(6)2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38870521

ABSTRACT

MOTIVATION: Tools for pairwise alignments between 3D structures of proteins are of fundamental importance for structural biology and bioinformatics, enabling visual exploration of evolutionary and functional relationships. However, the absence of a user-friendly, browser-based tool for creating alignments and visualizing them at both 1D sequence and 3D structural levels makes this process unnecessarily cumbersome. RESULTS: We introduce a novel pairwise structure alignment tool (rcsb.org/alignment) that seamlessly integrates into the RCSB Protein Data Bank (RCSB PDB) research-focused RCSB.org web portal. Our tool and its underlying application programming interface (alignment.rcsb.org) empowers users to align several protein chains with a reference structure by providing access to established alignment algorithms (FATCAT, CE, TM-align, or Smith-Waterman 3D). The user-friendly interface simplifies parameter setup and input selection. Within seconds, our tool enables visualization of results in both sequence (1D) and structural (3D) perspectives through the RCSB PDB RCSB.org Sequence Annotations viewer and Mol* 3D viewer, respectively. Users can effortlessly compare structures deposited in the PDB archive alongside more than a million incorporated Computed Structure Models coming from the ModelArchive and AlphaFold DB. Moreover, this tool can be used to align custom structure data by providing a link/URL or uploading atomic coordinate files directly. Importantly, alignment results can be bookmarked and shared with collaborators. By bridging the gap between 1D sequence and 3D structures of proteins, our tool facilitates deeper understanding of complex evolutionary relationships among proteins through comprehensive sequence and structural analyses. AVAILABILITY AND IMPLEMENTATION: The alignment tool is part of the RCSB PDB research-focused RCSB.org web portal and available at rcsb.org/alignment. Programmatic access is available via alignment.rcsb.org. Frontend code has been published at github.com/rcsb/rcsb-pecos-app. Visualization is powered by the open-source Mol* viewer (github.com/molstar/molstar and github.com/molstar/rcsb-molstar) plus the Sequence Annotations in 3D Viewer (github.com/rcsb/rcsb-saguaro-3d).


Subject(s)
Algorithms , Databases, Protein , Proteins , Sequence Alignment , Software , Proteins/chemistry , Sequence Alignment/methods , Protein Conformation , User-Computer Interface , Computational Biology/methods
12.
HLA ; 103(6): e15552, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38923200

ABSTRACT

At position 778 (C→T) in exon 3, the new allele C*05:01:81 is distinct from C*05:01:01.


Subject(s)
Alleles , Base Sequence , Exons , HLA-C Antigens , Histocompatibility Testing , Humans , HLA-C Antigens/genetics , Sequence Analysis, DNA/methods , Sequence Alignment , Codon , Polymorphism, Single Nucleotide
15.
HLA ; 103(6)2024 Jun.
Article in English | MEDLINE | ID: mdl-38932664

ABSTRACT

Genomic sequence of HLA-DPA1*01:03:01:73, -DPA1*01:03:01:80, DPA1*01:03:01:82, -DPA1*01:155:01:02, -DPA1*02:02:02:16 alleles in Spanish individuals.


Subject(s)
Alleles , HLA-DP alpha-Chains , Humans , HLA-DP alpha-Chains/genetics , Exons , Spain , Histocompatibility Testing , Sequence Analysis, DNA , Base Sequence , Sequence Alignment
17.
BMC Bioinformatics ; 25(1): 219, 2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38898394

ABSTRACT

BACKGROUND: With the surge in genomic data driven by advancements in sequencing technologies, the demand for efficient bioinformatics tools for sequence analysis has become paramount. BLAST-like alignment tool (BLAT), a sequence alignment tool, faces limitations in performance efficiency and integration with modern programming environments, particularly Python. This study introduces PxBLAT, a Python-based framework designed to enhance the capabilities of BLAT, focusing on usability, computational efficiency, and seamless integration within the Python ecosystem. RESULTS: PxBLAT demonstrates significant improvements over BLAT in execution speed and data handling, as evidenced by comprehensive benchmarks conducted across various sample groups ranging from 50 to 600 samples. These experiments highlight a notable speedup, reducing execution time compared to BLAT. The framework also introduces user-friendly features such as improved server management, data conversion utilities, and shell completion, enhancing the overall user experience. Additionally, the provision of extensive documentation and comprehensive testing supports community engagement and facilitates the adoption of PxBLAT. CONCLUSIONS: PxBLAT stands out as a robust alternative to BLAT, offering performance and user interaction enhancements. Its development underscores the potential for modern programming languages to improve bioinformatics tools, aligning with the needs of contemporary genomic research. By providing a more efficient, user-friendly tool, PxBLAT has the potential to impact genomic data analysis workflows, supporting faster and more accurate sequence analysis in a Python environment.


Subject(s)
Computational Biology , Sequence Alignment , Software , Computational Biology/methods , Sequence Alignment/methods , Programming Languages , Genomics/methods
20.
Arch Microbiol ; 206(7): 307, 2024 Jun 17.
Article in English | MEDLINE | ID: mdl-38884653

ABSTRACT

Xylanase is the most important hydrolase in the xylan hydrolase system, the main function of which is ß-1,4-endo-xylanase, which randomly cleaves xylans to xylo-oligosaccharides and xylose. Xylanase has wide ranging of applications, but there remains little research on the cold-adapted enzymes required in some low-temperature industries. Glycoside hydrolase family 8 (GH8) xylanases have been reported to have cold-adapted enzyme activity. In this study, the xylanase gene dgeoxyn was excavated from Deinococcus geothermalis through sequence alignment. The recombinant xylanase DgeoXyn encodes 403 amino acids with a theoretical molecular weight of 45.39 kDa. Structural analysis showed that DgeoXyn has a (α/α)6-barrel fold structure typical of GH8 xylanase. At the same time, it has strict substrate specificity, is only active against xylan, and its hydrolysis products include xylobiose, xylotrinose, xytetranose, xylenanose, and a small amount of xylose. DgeoXyn is most active at 70 â„ƒ and pH 6.0. It is very stable at 10, 20, and 30 â„ƒ, retaining more than 80% of its maximum enzyme activity. The enzyme activity of DgeoXyn increased by 10% after the addition of Mn2+ and decreased by 80% after the addition of Cu2+. The Km and Vmax of dgeox were 42 mg/ml and 20,000 U/mg, respectively, at a temperature of 70 â„ƒ and pH of 6.0 using 10 mg/ml beechwood xylan as the substrate. This research on DgeoXyn will provide a theoretical basis for the development and application of low-temperature xylanase.


Subject(s)
Deinococcus , Endo-1,4-beta Xylanases , Enzyme Stability , Xylans , Deinococcus/enzymology , Deinococcus/genetics , Substrate Specificity , Endo-1,4-beta Xylanases/genetics , Endo-1,4-beta Xylanases/chemistry , Endo-1,4-beta Xylanases/metabolism , Xylans/metabolism , Cold Temperature , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Bacterial Proteins/chemistry , Hydrogen-Ion Concentration , Glycoside Hydrolases/genetics , Glycoside Hydrolases/metabolism , Glycoside Hydrolases/chemistry , Amino Acid Sequence , Hydrolysis , Recombinant Proteins/metabolism , Recombinant Proteins/genetics , Recombinant Proteins/chemistry , Recombinant Proteins/isolation & purification , Sequence Alignment , Cloning, Molecular , Kinetics , Molecular Weight , Disaccharides
SELECTION OF CITATIONS
SEARCH DETAIL
...