Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Nucleic Acids Res ; 52(1): e3, 2024 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-37941140

RESUMO

Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter coded protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models have been developed for RNA, they are ineffective at capturing the evolutionary information from homologous sequences because unlike proteins, RNA sequences are less conserved. Here, we have developed an unsupervised multiple sequence alignment-based RNA language model (RNA-MSM) by utilizing homologous sequences from an automatic pipeline, RNAcmap, as it can provide significantly more homologous sequences than manually annotated Rfam. We demonstrate that the resulting unsupervised, two-dimensional attention maps and one-dimensional embeddings from RNA-MSM contain structural information. In fact, they can be directly mapped with high accuracy to 2D base pairing probabilities and 1D solvent accessibilities, respectively. Further fine-tuning led to significantly improved performance on these two downstream tasks compared with existing state-of-the-art techniques including SPOT-RNA2 and RNAsnap2. By comparison, RNA-FM, a BERT-based RNA language model, performs worse than one-hot encoding with its embedding in base pair and solvent-accessible surface area prediction. We anticipate that the pre-trained RNA-MSM model can be fine-tuned on many other tasks related to RNA structure and function.


Assuntos
Aprendizado de Máquina , RNA , Alinhamento de Sequência , DNA/química , Proteínas , RNA/química , Solventes
2.
Anal Chem ; 96(28): 11163-11171, 2024 07 16.
Artigo em Inglês | MEDLINE | ID: mdl-38953530

RESUMO

Glycans on proteins and lipids play important roles in maturation and cellular interactions, contributing to a variety of biological processes. Aberrant glycosylation has been associated with various human diseases including cancer; however, elucidating the distribution and heterogeneity of glycans in complex tissue samples remains a major challenge. Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) is routinely used to analyze the spatial distribution of a variety of molecules including N-glycans directly from tissue surfaces. Sialic acids are nine carbon acidic sugars that often exist as the terminal sugars of glycans and are inherently difficult to analyze using MALDI-MSI due to their instability prone to in- and postsource decay. Here, we report on a rapid and robust method for stabilizing sialic acid on N-glycans in FFPE tissue sections. The established method derivatizes and identifies the spatial distribution of α2,3- and α2,6-linked sialic acids through complete methylamidation using methylamine and PyAOP ((7-azabenzotriazol-1-yloxy)tripyrrolidinophosphonium hexafluorophosphate). Our in situ approach increases the glycans detected and enhances the coverage of sialylated species. Using this streamlined, sensitive, and robust workflow, we rapidly characterize and spatially localize N-glycans in human tumor tissue sections. Additionally, we demonstrate this method's applicability in imaging mammalian cell suspensions directly on slides, achieving cellular resolution with minimal sample processing and cell numbers. This workflow reveals the cellular locations of distinct N-glycan species, shedding light on the biological and clinical significance of these biomolecules in human diseases.


Assuntos
Glicômica , Polissacarídeos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Humanos , Glicômica/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Polissacarídeos/análise , Polissacarídeos/química
3.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35348613

RESUMO

Characterizing RNA structures and functions have mostly been focused on 2D, secondary and 3D, tertiary structures. Recent advances in experimental and computational techniques for probing or predicting RNA solvent accessibility make this 1D representation of tertiary structures an increasingly attractive feature to explore. Here, we provide a survey of these recent developments, which indicate the emergence of solvent accessibility as a simple 1D property, adding to secondary and tertiary structures for investigating complex structure-function relations of RNAs.


Assuntos
RNA , Conformação de Ácido Nucleico , RNA/química , Solventes/química
4.
Bioinformatics ; 38(16): 3900-3910, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35751593

RESUMO

MOTIVATION: Recently, AlphaFold2 achieved high experimental accuracy for the majority of proteins in Critical Assessment of Structure Prediction (CASP 14). This raises the hope that one day, we may achieve the same feat for RNA structure prediction for those structured RNAs, which is as fundamentally and practically important similar to protein structure prediction. One major factor in the recent advancement of protein structure prediction is the highly accurate prediction of distance-based contact maps of proteins. RESULTS: Here, we showed that by integrated deep learning with physics-inferred secondary structures, co-evolutionary information and multiple sequence-alignment sampling, we can achieve RNA contact-map prediction at a level of accuracy similar to that in protein contact-map prediction. More importantly, highly accurate prediction for top L long-range contacts can be assured for those RNAs with a high effective number of homologous sequences (Neff > 50). The initial use of the predicted contact map as distance-based restraints confirmed its usefulness in 3D structure prediction. AVAILABILITY AND IMPLEMENTATION: SPOT-RNA-2D is available as a web server at https://sparks-lab.org/server/spot-rna-2d/ and as a standalone program at https://github.com/jaswindersingh2/SPOT-RNA-2D. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Aprendizado Profundo , Redes Neurais de Computação , RNA , Proteínas/química , Física
5.
Bioinformatics ; 38(7): 1888-1894, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35104320

RESUMO

MOTIVATION: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. RESULTS: We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. AVAILABILITY AND IMPLEMENTATION: Stand-alone-version of SPOT-Contact-LM is available at https://github.com/jas-preet/SPOT-Contact-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Idioma , Biologia Computacional/métodos , Proteínas/química , Redes Neurais de Computação , Sequência de Aminoácidos
6.
Bioinformatics ; 37(20): 3494-3500, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-34021744

RESUMO

MOTIVATION: The accuracy of RNA secondary and tertiary structure prediction can be significantly improved by using structural restraints derived from evolutionary coupling or direct coupling analysis. Currently, these coupling analyses relied on manually curated multiple sequence alignments collected in the Rfam database, which contains 3016 families. By comparison, millions of non-coding RNA sequences are known. Here, we established RNAcmap, a fully automatic pipeline that enables evolutionary coupling analysis for any RNA sequences. The homology search was based on the covariance model built by INFERNAL according to two secondary structure predictors: a folding-based algorithm RNAfold and the latest deep-learning method SPOT-RNA. RESULTS: We showed that the performance of RNAcmap is less dependent on the specific evolutionary coupling tool but is more dependent on the accuracy of secondary structure predictor with the best performance given by RNAcmap (SPOT-RNA). The performance of RNAcmap (SPOT-RNA) is comparable to that based on Rfam-supplied alignment and consistent for those sequences that are not in Rfam collections. Further improvement can be made with a simple meta predictor RNAcmap (SPOT-RNA/RNAfold) depending on which secondary structure predictor can find more homologous sequences. Reliable base-pairing information generated from RNAcmap, for RNAs with high effective homologous sequences, in particular, will be useful for aiding RNA structure prediction. AVAILABILITY AND IMPLEMENTATION: RNAcmap is available as a web server at https://sparks-lab.org/server/rnacmap/ and as a standalone application along with the datasets at https://github.com/sparks-lab-org/RNAcmap_standalone. A platform independent and fully configured docker image of RNAcmap is also provided at https://hub.docker.com/r/jaswindersingh2/rnacmap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
Bioinformatics ; 37(20): 3464-3472, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-33983382

RESUMO

MOTIVATION: Knowing protein secondary and other one-dimensional structural properties are essential for accurate protein structure and function prediction. As a result, many methods have been developed for predicting these one-dimensional structural properties. However, most methods relied on evolutionary information that may not exist for many proteins due to a lack of sequence homologs. Moreover, it is computationally intensive for obtaining evolutionary information as the library of protein sequences continues to expand exponentially. Here, we developed a new single-sequence method called SPOT-1D-Single based on a large training dataset of 39 120 proteins deposited prior to 2016 and an ensemble of hybrid long-short-term-memory bidirectional neural network and convolutional neural network. RESULTS: We showed that SPOT-1D-Single consistently improves over SPIDER3-Single and ProteinUnet for secondary structure, solvent accessibility, contact number and backbone angles prediction for all seven independent test sets (TEST2018, SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ, CASP12 and CASP13 free-modeling targets). For example, the predicted three-state secondary structure's accuracy ranges from 72.12% to 74.28% by SPOT-1D-Single, compared to 69.1-72.6% by SPIDER3-Single and 70.6-73% by ProteinUnet. SPOT-1D-Single also predicts SS3 and SS8 with 6.24% and 6.98% better accuracy than SPOT-1D on SPOT-2018 proteins with no homologs (Neff = 1), respectively. The new method's improvement over existing techniques is due to a larger training set combined with ensembled learning. AVAILABILITY AND IMPLEMENTATION: Standalone-version of SPOT-1D-Single is available at https://github.com/jas-preet/SPOT-1D-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-1d-single. The datasets used in this research can also be downloaded from GitHub. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
Bioinformatics ; 37(17): 2589-2600, 2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-33704363

RESUMO

MOTIVATION: The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. RESULTS: The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, non-canonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. AVAILABILITY AND IMPLEMENTATION: Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
RNA Biol ; 19(1): 1179-1189, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-36369947

RESUMO

Given the challenges for the experimental determination of RNA tertiary structures, probing solvent accessibility has become increasingly important to gain functional insights. Among various chemical probes developed, backbone-cleaving hydroxyl radical is the only one that can provide unbiased detection of all accessible nucleotides. However, the readouts have been based on reverse transcription (RT) stop at the cleaving sites, which are prone to false positives due to PCR amplification bias, early drop-off of reverse transcriptase, and the use of random primers in RT reaction. Here, we introduced a fixed-primer method called RL-Seq by performing RtcB Ligation (RL) between a fixed 5'-OH-end linker and unique 3'-P-end fragments from hydroxyl radical cleavage prior to high-throughput sequencing. The application of this method to E. coli ribosomes confirmed its ability to accurately probe solvent accessibility with high sensitivity (low required sequencing depth) and accuracy (strong correlation to structure-derived values) at the single-nucleotide resolution. Moreover, a near-perfect correlation was found between the experiments with and without using unique molecular identifiers, indicating negligible PCR biases in RL-Seq. Further improvement of RL-Seq and its potential transcriptome-wide applications are discussed.


Assuntos
Aminoacil-tRNA Sintetases , Proteínas de Escherichia coli , RNA/genética , RNA/química , Radical Hidroxila/química , Conformação de Ácido Nucleico , Nucleotídeos , Solventes/química , Escherichia coli/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aminoacil-tRNA Sintetases/genética , Proteínas de Escherichia coli/genética
10.
Bioinformatics ; 36(4): 1107-1113, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31504193

RESUMO

MOTIVATION: Protein intrinsic disorder describes the tendency of sequence residues to not fold into a rigid three-dimensional shape by themselves. However, some of these disordered regions can transition from disorder to order when interacting with another molecule in segments known as molecular recognition features (MoRFs). Previous analysis has shown that these MoRF regions are indirectly encoded within the prediction of residue disorder as low-confidence predictions [i.e. in a semi-disordered state P(D)≈0.5]. Thus, what has been learned for disorder prediction may be transferable to MoRF prediction. Transferring the internal characterization of protein disorder for the prediction of MoRF residues would allow us to take advantage of the large training set available for disorder prediction, enabling the training of larger analytical models than is currently feasible on the small number of currently available annotated MoRF proteins. In this paper, we propose a new method for MoRF prediction by transfer learning from the SPOT-Disorder2 ensemble models built for disorder prediction. RESULTS: We confirm that directly training on the MoRF set with a randomly initialized model yields substantially poorer performance on independent test sets than by using the transfer-learning-based method SPOT-MoRF, for both deep and simple networks. Its comparison to current state-of-the-art techniques reveals its superior performance in identifying MoRF binding regions in proteins across two independent testing sets, including our new dataset of >800 protein chains. These test chains share <30% sequence similarity to all training and validation proteins used in SPOT-Disorder2 and SPOT-MoRF, and provide a much-needed large-scale update on the performance of current MoRF predictors. The method is expected to be useful in locating functional disordered regions in proteins. AVAILABILITY AND IMPLEMENTATION: SPOT-MoRF and its data are available as a web server and as a standalone program at: http://sparks-lab.org/jack/server/SPOT-MoRF/index.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas Intrinsicamente Desordenadas , Aprendizado de Máquina
11.
J Comput Chem ; 41(8): 745-750, 2020 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-31845383

RESUMO

Protein structure determination has long been one of the most challenging problems in molecular biology for the past 60 years. Here we present an ab initio protein tertiary-structure prediction method assisted by predicted contact maps from SPOT-Contact and predicted dihedral angles from SPIDER 3. These predicted properties were then fed to the crystallography and NMR system (CNS) for restrained structure modeling. The resulted structures are first evaluated by the potential energy calculated by CNS, followed by dDFIRE energy function for model selections. The method called SPOT-Fold has been tested on 241 CASP targets between 67 and 670 amino acid residues, 60 randomly selected globular proteins under 100 amino acids. The method has a comparable accuracy to other contact-map-based modeling techniques. © 2019 Wiley Periodicals, Inc.


Assuntos
Proteínas/química , Software , Modelos Moleculares , Conformação Proteica
12.
Bioinformatics ; 35(14): 2403-2410, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-30535134

RESUMO

MOTIVATION: Sequence-based prediction of one dimensional structural properties of proteins has been a long-standing subproblem of protein structure prediction. Recently, prediction accuracy has been significantly improved due to the rapid expansion of protein sequence and structure libraries and advances in deep learning techniques, such as residual convolutional networks (ResNets) and Long-Short-Term Memory Cells in Bidirectional Recurrent Neural Networks (LSTM-BRNNs). Here we leverage an ensemble of LSTM-BRNN and ResNet models, together with predicted residue-residue contact maps, to continue the push towards the attainable limit of prediction for 3- and 8-state secondary structure, backbone angles (θ, τ, ϕ and ψ), half-sphere exposure, contact numbers and solvent accessible surface area (ASA). RESULTS: The new method, named SPOT-1D, achieves similar, high performance on a large validation set and test set (≈1000 proteins in each set), suggesting robust performance for unseen data. For the large test set, it achieves 87% and 77% in 3- and 8-state secondary structure prediction and 0.82 and 0.86 in correlation coefficients between predicted and measured ASA and contact numbers, respectively. Comparison to current state-of-the-art techniques reveals substantial improvement in secondary structure and backbone angle prediction. In particular, 44% of 40-residue fragment structures constructed from predicted backbone Cα-based θ and τ angles are less than 6 Å root-mean-squared-distance from their native conformations, nearly 20% better than the next best. The method is expected to be useful for advancing protein structure and function prediction. AVAILABILITY AND IMPLEMENTATION: SPOT-1D and its data is available at: http://sparks-lab.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Sequência de Aminoácidos , Biologia Computacional , Estrutura Secundária de Proteína , Proteínas , Solventes
13.
Bioinformatics ; 34(23): 4039-4045, 2018 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-29931279

RESUMO

Motivation: Accurate prediction of a protein contact map depends greatly on capturing as much contextual information as possible from surrounding residues for a target residue pair. Recently, ultra-deep residual convolutional networks were found to be state-of-the-art in the latest Critical Assessment of Structure Prediction techniques (CASP12) for protein contact map prediction by attempting to provide a protein-wide context at each residue pair. Recurrent neural networks have seen great success in recent protein residue classification problems due to their ability to propagate information through long protein sequences, especially Long Short-Term Memory (LSTM) cells. Here, we propose a novel protein contact map prediction method by stacking residual convolutional networks with two-dimensional residual bidirectional recurrent LSTM networks, and using both one-dimensional sequence-based and two-dimensional evolutionary coupling-based information. Results: We show that the proposed method achieves a robust performance over validation and independent test sets with the Area Under the receiver operating characteristic Curve (AUC) > 0.95 in all tests. When compared to several state-of-the-art methods for independent testing of 228 proteins, the method yields an AUC value of 0.958, whereas the next-best method obtains an AUC of 0.909. More importantly, the improvement is over contacts at all sequence-position separations. Specifically, a 8.95%, 5.65% and 2.84% increase in precision were observed for the top L∕10 predictions over the next best for short, medium and long-range contacts, respectively. This confirms the usefulness of ResNets to congregate the short-range relations and 2D-BRLSTM to propagate the long-range dependencies throughout the entire protein contact map 'image'. Availability and implementation: SPOT-Contact server url: http://sparks-lab.org/jack/server/SPOT-Contact/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Sequência de Aminoácidos
14.
J Chem Inf Model ; 59(2): 924-930, 2019 02 25.
Artigo em Inglês | MEDLINE | ID: mdl-30698427

RESUMO

Peptide-binding domains have been successfully targeted in therapeutic applications. However, many peptide-binding proteins (PBPs) remain uncharacterized. Computational prediction of peptide-domain interfaces is challenging due to short lengths, lack of well-defined structures, and limited conservation of peptide motifs. Here we present SPOT-peptide, a template-based protocol for the simultaneous prediction of peptide-binding domains and peptide binding sites independent of specific peptide composition. SPOT-peptide leverages the dogmatic relationship between protein structure and function to predict peptide-binding characteristics for an unknown target based on remote structural homologues. In a leave-homologue out benchmark evaluation, PBPs are discriminated with a Matthews correlation coefficient (MCC) of 0.420 and the correct binding sites are identified in 80% of the predicted PBPs. Furthermore, replacing the holo target structures with equivalent structures in the apo conformation only marginally diminished PBP recovery. The method is available as a web server at http://sparks-lab.org/tom/SPOT-peptide .


Assuntos
Modelos Moleculares , Peptídeos/metabolismo , Proteínas/química , Proteínas/metabolismo , Sítios de Ligação , Peptídeos/química , Ligação Proteica , Domínios Proteicos
15.
Bioinformatics ; 33(8): 1238-1240, 2017 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-28057679

RESUMO

Motivation: The high cost of drug discovery motivates the development of accurate virtual screening tools. Binding-homology, which takes advantage of known protein-ligand binding pairs, has emerged as a powerful discrimination technique. In order to exploit all available binding data, modelled structures of ligand-binding sequences may be used to create an expanded structural binding template library. Results: SPOT-Ligand 2 has demonstrated significantly improved screening performance over its previous version by expanding the template library 15 times over the previous one. It also performed better than or similar to other binding-homology approaches on the DUD and DUD-E benchmarks. Availability and Implementation: The server is available online at http://sparks-lab.org . Contacts: yaoqi.zhou@griffith.edu.au or yuedong.yang@griffith.edu.au. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Descoberta de Drogas , Modelos Moleculares , Software , Ligantes , Homologia Estrutural de Proteína
16.
Ren Fail ; 38(9): 1328-1334, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27494231

RESUMO

BACKGROUND: Analgesic nephropathy (AN) is chronic renal impairment as a direct consequence of chronic heavy analgesia ingestion. An association between non-steroidal anti-inflammatory agents and chronic kidney disease (CKD) has long been suspected. Despite ample observational data obtained in recent decades the relationship remains uncertain. This systematic review intends to summarize the available literature and to define the role of non-steroidal anti-inflammatories in the natural history of AN. METHODS: We conducted a systematic literature search for articles describing the association between non-steroidal anti-inflammatory abuse and renal insufficiency. No restriction was placed on publication date, but papers were limited to English language, observational design, and human studies. RESULTS: Nine articles met our inclusion criteria and were discussed in this review. This includes 5 cohort studies and 4 case-control trials, with a combined population of 12,418 study subjects and 23,877 controls. Eight of the nine reports failed to identify any increased risk of chronic renal impairment with heavy non-steroidal anti-inflammatory consumption. Study methods were heterogeneous and the overall quality of data was relatively poor. CONCLUSION: A relationship between non-steroidal anti-inflammatory medicines and the development of CKD has never been proven. Based on the available scientific evidence non-steroidal anti-inflammatory agents do not appear to be implicated in the pathogenesis of AN.


Assuntos
Anti-Inflamatórios não Esteroides/efeitos adversos , Insuficiência Renal Crônica/induzido quimicamente , Humanos , Fatores de Risco , Fatores de Tempo
17.
Artigo em Inglês | MEDLINE | ID: mdl-38872612

RESUMO

Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by incorporating the non-coding RNA (ncRNA) sequences from RNAcentral, the transcriptome assembly and metagenome assembly from metagenomics RAST (MG-RAST), the genomic sequences from Genome Warehouse (GWH), and the genomic sequences from MGnify, in addition to the nucleotide (nt) database and its subsets in National Center of Biotechnology Information (NCBI). The resulting Master database of All possible RNA sequences (MARS) is 20-fold larger than NCBI's nt database or 60-fold larger than RNAcentral. The new dataset along with a new split-search strategy allows a substantial improvement in homology search over existing state-of-the-art techniques. It also yields more accurate and more sensitive multiple sequence alignments (MSAs) than manually curated MSAs from Rfam for the majority of structured RNAs mapped to Rfam. The results indicate that MARS coupled with the fully automatic homology search tool RNAcmap will be useful for improved structural and functional inference of ncRNAs and RNA language models based on MSAs. MARS is accessible at https://ngdc.cncb.ac.cn/omix/release/OMIX003037, and RNAcmap3 is accessible at http://zhouyq-lab.szbl.ac.cn/download/.


Assuntos
Bases de Dados de Ácidos Nucleicos , Alinhamento de Sequência , RNA não Traduzido/genética , RNA não Traduzido/química , Análise de Sequência de RNA/métodos , RNA/genética , RNA/química , Software , Bases de Dados Genéticas
18.
Sci Rep ; 12(1): 7607, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-35534620

RESUMO

Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a method called SPOT-1D-LM combines traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) for the input and yields a leap in accuracy over single-sequence-based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers for all six test sets (TEST2018, TEST2020, Neff1-2020, CASP12-FM, CASP13-FM and CASP14-FM). More significantly, it has a performance comparable to profile-based methods for those proteins with homologous sequences. For example, the accuracy for three-state secondary structure (SS3) prediction for TEST2018 and TEST2020 proteins are 86.7% and 79.8% by SPOT-1D-LM, compared to 74.3% and 73.4% by the single-sequence-based method SPOT-1D-Single and 86.2% and 80.5% by the profile-based method SPOT-1D, respectively. For proteins without homologous sequences (Neff1-2020) SS3 is 80.41% by SPOT-1D-LM which is 3.8% and 8.3% higher than SPOT-1D-Single and SPOT-1D, respectively. SPOT-1D-LM is expected to be useful for genome-wide analysis given its fast performance. Moreover, high-accuracy prediction of both secondary and tertiary structural properties such as backbone angles and solvent accessibility without sequence alignment suggests that highly accurate prediction of protein structures may be made without homologous sequences, the remaining obstacle in the post AlphaFold2 era.


Assuntos
Algoritmos , Proteínas , Estrutura Secundária de Proteína , Proteínas/química , Alinhamento de Sequência , Solventes/química
19.
Int J Biol Macromol ; 203: 543-552, 2022 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-35120933

RESUMO

Splitting a protein at a position may lead to self- or assisted-complementary fragments depending on whether two resulting fragments can reconstitute to maintain the native function spontaneously or require assistance from two interacting molecules. Assisted complementary fragments with high contrast are an important tool for probing biological interactions. However, only a small number of assisted-complementary split-variants have been identified due to manual, labour-intensive optimization of a candidate gene. Here, we introduce a technique for high-throughput split-protein profiling (HiTS) that allows fast identification of self- and assisted complementary positions by transposon mutagenesis, a rapamycin-regulated FRB-FKBP protein interaction pair, and deep sequencing. We test this technique by profiling three antibiotic-resistant genes (fosfomycin-resistant gene, fosA3, erythromycin-resistant gene, ermB, and chloramphenicol-resistant gene, catI). Self- and assisted complementary fragments discovered by the high-throughput technique were subsequently confirmed by low-throughput testing of individual split positions. Thus, the HiTS technique provides a quicker alternative for discovering the proteins with suitable self- and assisted-complementary split positions when combining with a readout such as fluorescence, bioluminescence, cell survival, gene transcription or genome editing.


Assuntos
Edição de Genes , Proteínas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutagênese , Mutagênese Insercional
20.
J Comput Biol ; 27(5): 796-814, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31390220

RESUMO

The folding of a protein structure is a process governed by both local and nonlocal interactions. While incorporating local dependencies into a machine learning algorithm for protein structure prediction is simple and has been exploited for some time, the modeling of long-range dependences which result from structurally-neighboring residues has only recently begun to be addressed. Structural properties designed to localize the prediction space from direct tertiary structure prediction, such as secondary structure, contact maps, and intrinsic disorder, among others, have begun to greatly benefit from machine learning models capable of modeling a widened, potentially global protein context. This has led to a direct enhancement of the quality of predicted tertiary structures through both the optimization of structural constraints and improved reliability of alignments to structural templates. These improvements have stemmed from the application of recurrent and convolutional neural network architectures effective not only at innate sequential context propagation but also deep feature extraction due to novel skip connections and normalization techniques allowing for greatly enhanced error back-propagation. The recent results from independent blind testing in Critical Assessment of protein Structure Prediction 13 have signaled the beginning of a new generation of protein structure prediction through the utilization of these contextual techniques. The ripples from advancements in the determination of one-dimensional and two-dimensional structural properties have us moving ever closer to the solution of the protein structure prediction problem.


Assuntos
Envelhecimento/genética , Aprendizado de Máquina , Conformação Proteica , Proteínas/genética , Envelhecimento/patologia , Algoritmos , Redes Neurais de Computação , Proteínas/ultraestrutura
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA