Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 2.282
Filtrar
Más filtros

Intervalo de año de publicación
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38349057

RESUMEN

Efficient and accurate recognition of protein-DNA interactions is vital for understanding the molecular mechanisms of related biological processes and further guiding drug discovery. Although the current experimental protocols are the most precise way to determine protein-DNA binding sites, they tend to be labor-intensive and time-consuming. There is an immediate need to design efficient computational approaches for predicting DNA-binding sites. Here, we proposed ULDNA, a new deep-learning model, to deduce DNA-binding sites from protein sequences. This model leverages an LSTM-attention architecture, embedded with three unsupervised language models that are pre-trained on large-scale sequences from multiple database sources. To prove its effectiveness, ULDNA was tested on 229 protein chains with experimental annotation of DNA-binding sites. Results from computational experiments revealed that ULDNA significantly improves the accuracy of DNA-binding site prediction in comparison with 17 state-of-the-art methods. In-depth data analyses showed that the major strength of ULDNA stems from employing three transformer language models. Specifically, these language models capture complementary feature embeddings with evolution diversity, in which the complex DNA-binding patterns are buried. Meanwhile, the specially crafted LSTM-attention network effectively decodes evolution diversity-based embeddings as DNA-binding results at the residue level. Our findings demonstrated a new pipeline for predicting DNA-binding sites on a large scale with high accuracy from protein sequence alone.


Asunto(s)
Análisis de Datos , Lenguaje , Sitios de Unión , Secuencia de Aminoácidos , Bases de Datos Factuales
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38261340

RESUMEN

The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.


Asunto(s)
Benchmarking , Redes Reguladoras de Genes , Área Bajo la Curva , Aprendizaje , Redes Neurales de la Computación
3.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37080771

RESUMEN

Single-cell RNA sequencing (scRNA-seq) has significantly accelerated the experimental characterization of distinct cell lineages and types in complex tissues and organisms. Cell-type annotation is of great importance in most of the scRNA-seq analysis pipelines. However, manual cell-type annotation heavily relies on the quality of scRNA-seq data and marker genes, and therefore can be laborious and time-consuming. Furthermore, the heterogeneity of scRNA-seq datasets poses another challenge for accurate cell-type annotation, such as the batch effect induced by different scRNA-seq protocols and samples. To overcome these limitations, here we propose a novel pipeline, termed TripletCell, for cross-species, cross-protocol and cross-sample cell-type annotation. We developed a cell embedding and dimension-reduction module for the feature extraction (FE) in TripletCell, namely TripletCell-FE, to leverage the deep metric learning-based algorithm for the relationships between the reference gene expression matrix and the query cells. Our experimental studies on 21 datasets (covering nine scRNA-seq protocols, two species and three tissues) demonstrate that TripletCell outperformed state-of-the-art approaches for cell-type annotation. More importantly, regardless of protocols or species, TripletCell can deliver outstanding and robust performance in annotating different types of cells. TripletCell is freely available at https://github.com/liuyan3056/TripletCell. We believe that TripletCell is a reliable computational tool for accurately annotating various cell types using scRNA-seq data and will be instrumental in assisting the generation of novel biological hypotheses in cell biology.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados
4.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36528806

RESUMEN

Determining the pathogenicity and functional impact (i.e. gain-of-function; GOF or loss-of-function; LOF) of a variant is vital for unraveling the genetic level mechanisms of human diseases. To provide a 'one-stop' framework for the accurate identification of pathogenicity and functional impact of variants, we developed a two-stage deep-learning-based computational solution, termed VPatho, which was trained using a total of 9619 pathogenic GOF/LOF and 138 026 neutral variants curated from various databases. A total number of 138 variant-level, 262 protein-level and 103 genome-level features were extracted for constructing the models of VPatho. The development of VPatho consists of two stages: (i) a random under-sampling multi-scale residual neural network (ResNet) with a newly defined weighted-loss function (RUS-Wg-MSResNet) was proposed to predict variants' pathogenicity on the gnomAD_NV + GOF/LOF dataset; and (ii) an XGBOD model was constructed to predict the functional impact of the given variants. Benchmarking experiments demonstrated that RUS-Wg-MSResNet achieved the highest prediction performance with the weights calculated based on the ratios of neutral versus pathogenic variants. Independent tests showed that both RUS-Wg-MSResNet and XGBOD achieved outstanding performance. Moreover, assessed using variants from the CAGI6 competition, RUS-Wg-MSResNet achieved superior performance compared to state-of-the-art predictors. The fine-trained XGBOD models were further used to blind test the whole LOF data downloaded from gnomAD and accordingly, we identified 31 nonLOF variants that were previously labeled as LOF/uncertain variants. As an implementation of the developed approach, a webserver of VPatho is made publicly available at http://csbio.njust.edu.cn/bioinf/vpatho/ to facilitate community-wide efforts for profiling and prioritizing the query variants with respect to their pathogenicity and functional impact.


Asunto(s)
Aprendizaje Profundo , Humanos , Mutación con Ganancia de Función , Genoma
5.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38483285

RESUMEN

MOTIVATION: Drug-target interaction (DTI) prediction refers to the prediction of whether a given drug molecule will bind to a specific target and thus exert a targeted therapeutic effect. Although intelligent computational approaches for drug target prediction have received much attention and made many advances, they are still a challenging task that requires further research. The main challenges are manifested as follows: (i) most graph neural network-based methods only consider the information of the first-order neighboring nodes (drug and target) in the graph, without learning deeper and richer structural features from the higher-order neighboring nodes. (ii) Existing methods do not consider both the sequence and structural features of drugs and targets, and each method is independent of each other, and cannot combine the advantages of sequence and structural features to improve the interactive learning effect. RESULTS: To address the above challenges, a Multi-view Integrated learning Network that integrates Deep learning and Graph Learning (MINDG) is proposed in this study, which consists of the following parts: (i) a mixed deep network is used to extract sequence features of drugs and targets, (ii) a higher-order graph attention convolutional network is proposed to better extract and capture structural features, and (iii) a multi-view adaptive integrated decision module is used to improve and complement the initial prediction results of the above two networks to enhance the prediction performance. We evaluate MINDG on two dataset and show it improved DTI prediction performance compared to state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION: https://github.com/jnuaipr/MINDG.


Asunto(s)
Algoritmos , Redes Neurales de la Computación
6.
Cell Mol Life Sci ; 81(1): 88, 2024 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-38349408

RESUMEN

Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhythmia, and recent epidemiological studies suggested type 2 diabetes mellitus (T2DM) is an independent risk factor for the development of AF. Zinc finger and BTB (broad-complex, tram-track and bric-a-brac) domain containing 16 (Zbtb16) serve as transcriptional factors to regulate many biological processes. However, the potential effects of Zbtb16 in AF under T2DM condition remain unclear. Here, we reported that db/db mice displayed higher AF vulnerability and Zbtb16 was identified as the most significantly enriched gene by RNA sequencing (RNA-seq) analysis in atrium. In addition, thioredoxin interacting protein (Txnip) was distinguished as the key downstream gene of Zbtb16 by Cleavage Under Targets and Tagmentation (CUT&Tag) assay. Mechanistically, increased Txnip combined with thioredoxin 2 (Trx2) in mitochondrion induced excess reactive oxygen species (ROS) release, calcium/calmodulin-dependent protein kinase II (CaMKII) overactivation, and spontaneous Ca2+ waves (SCWs) occurrence, which could be inhibited through atrial-specific knockdown (KD) of Zbtb16 or Txnip by adeno-associated virus 9 (AAV9) or Mito-TEMPO treatment. High glucose (HG)-treated HL-1 cells were used to mimic the setting of diabetic in vitro. Zbtb16-Txnip-Trx2 signaling-induced excess ROS release and CaMKII activation were also verified in HL-1 cells under HG condition. Furthermore, atrial-specific Zbtb16 or Txnip-KD reduced incidence and duration of AF in db/db mice. Altogether, we demonstrated that interrupting Zbtb16-Txnip-Trx2 signaling in atrium could decrease AF susceptibility via reducing ROS release and CaMKII activation in the setting of T2DM.


Asunto(s)
Fibrilación Atrial , Diabetes Mellitus Experimental , Diabetes Mellitus Tipo 2 , Animales , Ratones , Proteína Quinasa Tipo 2 Dependiente de Calcio Calmodulina , Proteínas Portadoras/genética , Diabetes Mellitus Experimental/complicaciones , Diabetes Mellitus Experimental/genética , Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/genética , Proteína de la Leucemia Promielocítica con Dedos de Zinc , Especies Reactivas de Oxígeno , Tiorredoxinas/genética
7.
Proteomics ; 24(12-13): e2300371, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38643379

RESUMEN

Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.


Asunto(s)
Estabilidad Proteica , Proteínas , Termodinámica , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Algoritmos , Mutación , Humanos
8.
Proteomics ; : e2300302, 2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38258387

RESUMEN

Small proteins (SPs) are a unique group of proteins that play crucial roles in many important biological processes. Exploring the biological function of SPs is necessary. In this study, the InterPro tool and the maximum correlation method were utilized to analyze functional domains of SPs. The purpose was to identify important functional domains that can indicate the essential differences between small and large protein sequences. First, the small and large proteins were represented by their functional domains via a one-hot scheme. Then, the MaxRel method was adopted to evaluate the relationships between each domain and the target variable, indicating small or large protein. The top 36 domain features were selected for further investigation. Among them, 14 were deemed to be highly related to SPs because they were annotated to SPs more frequently than large proteins. We found the involvement of functional domains, such as ubiquitin-conjugating enzyme/RWD-like, nuclear transport factor 2 domain, and alpha subunit of guanine nucleotide-binding protein (G-protein) in regulating the biological function of SPs. The involvement of these domains has been confirmed by other recent studies. Our findings indicate that protein functional domains may regulate small protein-related functions and predict their biological activity.

9.
J Am Chem Soc ; 146(9): 6225-6230, 2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38386658

RESUMEN

Per- and polyfluoroalkyl substances (PFAS) accumulate in water resources and pose serious environmental and health threats due to their nonbiodegradable nature and long environmental persistence times. Strategies for the efficient removal of PFAS from contaminated water are needed to address this concern. Here, we report a fluorinated nonporous adaptive crystalline cage (F-Cage 2) that exploits electrostatic interaction, hydrogen bonding, and F-F interactions to achieve the efficient removal of perfluorooctanoic acid (PFOA) from aqueous source phases. F-Cage 2 exhibits a high second-order kobs value of approximately 441,000 g mg-1 h-1 for PFOA and a maximum PFOA adsorption capacity of 45 mg g-1. F-Cage 2 can decrease PFOA concentrations from 1500 to 6 ng L-1 through three rounds of flow-through purification, conducted at a flow rate of 40 mL h-1. Elimination of PFOA from PFOA-loaded F-Cage 2 is readily achieved by rinsing with a mixture of MeOH and saturated NaCl. Heating at 80 °C under vacuum then makes F-Cage 2 ready for reuse, as demonstrated across five successive uptake and release cycles. This work thus highlights the potential utility of suitably designed nonporous adaptive crystals as platforms for PFAS remediation.

10.
J Am Chem Soc ; 146(6): 3585-3590, 2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38316138

RESUMEN

We report here an expanded porphyrinoid, cyclo[2]pyridine[8]pyrrole, 1, that can exist at three closed-shell oxidation levels. Macrocycle 1 was synthesized via the oxidative coupling of two open chain precursors and fully characterized by means of NMR and UV-vis spectroscopies, MS, and X-ray crystallography. Reduction of the fully oxidized form (1, blue) with NaBH4 produced either the half-oxidized (2, teal) or fully reduced forms (3, pale yellow), depending on the amount of reducing agent used and the presence or absence of air. Reduced products 2 or 3 can be oxidized to 1 by various oxidants (quinones, FeCl3, and AgPF6). Macrocycle 1 also undergoes proton-coupled reductions with I-, Br-, Cl-, SO32-, or S2O32- in the presence of an acid. Certain thiol-containing compounds likewise reduce 1 to 2 or 3. This conversion is accompanied by a readily discernible color change, making cyclo[2]pyridine[8]pyrrole 1 able to differentiate biothiols, such as cysteine (Cys), homocysteine (Hcy), and glutathione (GSH).

11.
Radiology ; 312(1): e232387, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39012251

RESUMEN

Background Preoperative local-regional tumor staging of gastric cancer (GC) is critical for appropriate treatment planning. The comparative accuracy of multiparametric MRI (mpMRI) versus dual-energy CT (DECT) for staging of GC is not known. Purpose To compare the diagnostic accuracy of personalized mpMRI with that of DECT for local-regional T and N staging in patients with GC receiving curative surgical intervention. Materials and Methods Patients with GC who underwent gastric mpMRI and DECT before gastrectomy with lymphadenectomy were eligible for this single-center prospective noninferiority study between November 2021 and September 2022. mpMRI comprised T2-weighted imaging, multiorientational zoomed diffusion-weighted imaging, and extradimensional volumetric interpolated breath-hold examination dynamic contrast-enhanced imaging. Dual-phase DECT images were reconstructed at 40 keV and standard 120 kVp-like images. Using gastrectomy specimens as the reference standard, the diagnostic accuracy of mpMRI and DECT for T and N staging was compared by six radiologists in a pairwise blinded manner. Interreader agreement was assessed using the weighted κ and Kendall W statistics. The McNemar test was used for head-to-head accuracy comparisons between DECT and mpMRI. Results This study included 202 participants (mean age, 62 years ± 11 [SD]; 145 male). The interreader agreement of the six readers for T and N staging of GC was excellent for both mpMRI (κ = 0.89 and 0.85, respectively) and DECT (κ = 0.86 and 0.84, respectively). Regardless of reader experience, higher accuracy was achieved with mpMRI than with DECT for both T (61%-77% vs 50%-64%; all P < .05) and N (54%-68% vs 51%-58%; P = .497-.005) staging, specifically T1 (83% vs 65%) and T4a (78% vs 68%) tumors and N1 (41% vs 24%) and N3 (64% vs 45%) nodules (all P < .05). Conclusion Personalized mpMRI was superior in T staging and noninferior or superior in N staging compared with DECT for patients with GC. Clinical trial registration no. NCT05508126 © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Méndez and Martín-Garre in this issue.


Asunto(s)
Estadificación de Neoplasias , Neoplasias Gástricas , Tomografía Computarizada por Rayos X , Humanos , Neoplasias Gástricas/diagnóstico por imagen , Neoplasias Gástricas/patología , Neoplasias Gástricas/cirugía , Masculino , Femenino , Persona de Mediana Edad , Estudios Prospectivos , Anciano , Tomografía Computarizada por Rayos X/métodos , Gastrectomía/métodos , Adulto , Imagen por Resonancia Magnética/métodos , Imágenes de Resonancia Magnética Multiparamétrica/métodos
12.
Small ; 20(26): e2308527, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38221686

RESUMEN

Flexible hydroelectric generators (HEGs) are promising self-powered devices that spontaneously derive electrical power from moisture. However, achieving the desired compatibility between a continuous operating voltage and superior current density remains a significant challenge. Herein, a textile-based van der Waals heterostructure is rationally designed between conductive 1T phase tungsten disulfide@carbonized silk (1T-WS2@CSilk) and carbon black@cotton (CB@Cotton) fabrics with an asymmetric distribution of oxygen-containing functional groups, which enhances the proton concentration gradients toward high-performance wearable HEGs. The vertically staggered 1T-WS2 nanosheet arrays on the CSilk fabric provide abundant hydrophilic nanochannels for rapid carrier transport. Furthermore, the moisture-induced primary battery formed between the active aluminum (Al) electrode and the conductive textiles introduces the desired electric field to facilitate charge separation and compensate for the decreased streaming potential. These devices exhibit a power density of 21.6 µW cm-2, an open-circuit voltage (Voc) of 0.65 V sustained for over 10 000 s, and a current density of 0.17 mA cm-2. This performance makes them capable of supplying power to commercial electronics and human respiratory monitoring. This study presents a promising strategy for the refined design of wearable electronics.

13.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34953462

RESUMEN

More than 6000 human diseases have been recorded to be caused by non-synonymous single nucleotide polymorphisms (nsSNPs). Rapid and accurate prediction of pathogenic nsSNPs can improve our understanding of the principle and design of new drugs, which remains an unresolved challenge. In the present work, a new computational approach, termed MSRes-MutP, is proposed based on ResNet blocks with multi-scale kernel size to predict disease-associated nsSNPs. By feeding the serial concatenation of the extracted four types of features, the performance of MSRes-MutP does not obviously improve. To address this, a second model FFMSRes-MutP is developed, which utilizes deep feature fusion strategy and multi-scale 2D-ResNet and 1D-ResNet blocks to extract relevant two-dimensional features and physicochemical properties. FFMSRes-MutP with the concatenated features achieves a better performance than that with individual features. The performance of FFMSRes-MutP is benchmarked on five different datasets. It achieves the Matthew's correlation coefficient (MCC) of 0.593 and 0.618 on the PredictSNP and MMP datasets, which are 0.101 and 0.210 higher than that of the existing best method PredictSNP1. When tested on the HumDiv and HumVar datasets, it achieves MCC of 0.9605 and 0.9507, and area under curve (AUC) of 0.9796 and 0.9748, which are 0.1747 and 0.2669, 0.0853 and 0.1335, respectively, higher than the existing best methods PolyPhen-2 and FATHMM (weighted). In addition, on blind test using a third-party dataset, FFMSRes-MutP performs as the second-best predictor (with MCC and AUC of 0.5215 and 0.7633, respectively), when compared with the other four predictors. Extensive benchmarking experiments demonstrate that FFMSRes-MutP achieves effective feature fusion and can be explored as a useful approach for predicting disease-associated nsSNPs. The webserver is freely available at http://csbio.njust.edu.cn/bioinf/ffmsresmutp/ for academic use.


Asunto(s)
Aprendizaje Profundo , Enfermedad/genética , Polimorfismo de Nucleótido Simple , Algoritmos , Área Bajo la Curva , Microambiente Celular , Biología Computacional/métodos , Humanos , Mutación , Preparaciones Farmacéuticas
14.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34664074

RESUMEN

Accurate identification of transcription factor binding sites is of great significance in understanding gene expression, biological development and drug design. Although a variety of methods based on deep-learning models and large-scale data have been developed to predict transcription factor binding sites in DNA sequences, there is room for further improvement in prediction performance. In addition, effective interpretation of deep-learning models is greatly desirable. Here we present MAResNet, a new deep-learning method, for predicting transcription factor binding sites on 690 ChIP-seq datasets. More specifically, MAResNet combines the bottom-up and top-down attention mechanisms and a state-of-the-art feed-forward network (ResNet), which is constructed by stacking attention modules that generate attention-aware features. In particular, the multi-scale attention mechanism is utilized at the first stage to extract rich and representative sequence features. We further discuss the attention-aware features learned from different attention modules in accordance with the changes as the layers go deeper. The features learned by MAResNet are also visualized through the TMAP tool to illustrate that the method can extract the unique characteristics of transcription factor binding sites. The performance of MAResNet is extensively tested on 690 test subsets with an average AUC of 0.927, which is higher than that of the current state-of-the-art methods. Overall, this study provides a new and useful framework for the prediction of transcription factor binding sites by combining the funnel attention modules with the residual network.


Asunto(s)
Aprendizaje Profundo , Sitios de Unión/genética , Redes Neurales de la Computación , Unión Proteica , Factores de Transcripción/metabolismo
15.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35907779

RESUMEN

Circular RNA (circRNA) is closely involved in physiological and pathological processes of many diseases. Discovering the associations between circRNAs and diseases is of great significance. Due to the high-cost to verify the circRNA-disease associations by wet-lab experiments, computational approaches for predicting the associations become a promising research direction. In this paper, we propose a method, MDGF-MCEC, based on multi-view dual attention graph convolution network (GCN) with cooperative ensemble learning to predict circRNA-disease associations. First, MDGF-MCEC constructs two disease relation graphs and two circRNA relation graphs based on different similarities. Then, the relation graphs are fed into a multi-view GCN for representation learning. In order to learn high discriminative features, a dual-attention mechanism is introduced to adjust the contribution weights, at both channel level and spatial level, of different features. Based on the learned embedding features of diseases and circRNAs, nine different feature combinations between diseases and circRNAs are treated as new multi-view data. Finally, we construct a multi-view cooperative ensemble classifier to predict the associations between circRNAs and diseases. Experiments conducted on the CircR2Disease database demonstrate that the proposed MDGF-MCEC model achieves a high area under curve of 0.9744 and outperforms the state-of-the-art methods. Promising results are also obtained from experiments on the circ2Disease and circRNADisease databases. Furthermore, the predicted associated circRNAs for hepatocellular carcinoma and gastric cancer are supported by the literature. The code and dataset of this study are available at https://github.com/ABard0/MDGF-MCEC.


Asunto(s)
ARN Circular , Neoplasias Gástricas , Humanos , Péptidos y Proteínas de Señalización Intercelular , Aprendizaje Automático , Neoplasias Gástricas/genética
16.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36413068

RESUMEN

MOTIVATION: Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogeneous feature sets extracted from bioimages. In this regards, limited efforts have been undertaken. RESULTS: We propose a new two-level stacked autoencoder network (termed 2L-SAE-SM) to improve its performance by integrating the heterogeneous feature sets. In particular, in the first level of 2L-SAE-SM, each optimal heterogeneous feature set is fed to train our designed stacked autoencoder network (SAE-SM). All the trained SAE-SMs in the first level can output the decision sets based on their respective optimal heterogeneous feature sets, known as 'intermediate decision' sets. Such intermediate decision sets are then ensembled using the mean ensemble method to generate the 'intermediate feature' set for the second-level SAE-SM. Using the proposed framework, we further develop a novel predictor, referred to as PScL-2LSAESM, to characterize image-based protein subcellular localization. Extensive benchmarking experiments on the latest benchmark training and independent test datasets collected from the human protein atlas databank demonstrate the effectiveness of the proposed 2L-SAE-SM framework for the integration of heterogeneous feature sets. Moreover, performance comparison of the proposed PScL-2LSAESM with current state-of-the-art methods further illustrates that PScL-2LSAESM clearly outperforms the existing state-of-the-art methods for the task of protein subcellular localization. AVAILABILITY AND IMPLEMENTATION: https://github.com/csbio-njust-edu/PScL-2LSAESM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Humanos , Transporte de Proteínas , Biología Computacional/métodos
17.
Bioinformatics ; 39(12)2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-37995291

RESUMEN

MOTIVATION: RNA N6-methyladenosine (m6A) in Homo sapiens plays vital roles in a variety of biological functions. Precise identification of m6A modifications is thus essential to elucidation of their biological functions and underlying molecular-level mechanisms. Currently available high-throughput single-nucleotide-resolution m6A modification data considerably accelerated the identification of RNA modification sites through the development of data-driven computational methods. Nevertheless, existing methods have limitations in terms of the coverage of single-nucleotide-resolution cell lines and have poor capability in model interpretations, thereby having limited applicability. RESULTS: In this study, we present CLSM6A, comprising a set of deep learning-based models designed for predicting single-nucleotide-resolution m6A RNA modification sites across eight different cell lines and three tissues. Extensive benchmarking experiments are conducted on well-curated datasets and accordingly, CLSM6A achieves superior performance than current state-of-the-art methods. Furthermore, CLSM6A is capable of interpreting the prediction decision-making process by excavating critical motifs activated by filters and pinpointing highly concerned positions in both forward and backward propagations. CLSM6A exhibits better portability on similar cross-cell line/tissue datasets, reveals a strong association between highly activated motifs and high-impact motifs, and demonstrates complementary attributes of different interpretation strategies. AVAILABILITY AND IMPLEMENTATION: The webserver is available at http://csbio.njust.edu.cn/bioinf/clsm6a. The datasets and code are available at https://github.com/zhangying-njust/CLSM6A/.


Asunto(s)
Nucleótidos , ARN , Humanos , ARN/metabolismo , Adenosina/genética , Adenosina/metabolismo , Análisis de Secuencia de ARN/métodos
18.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37561093

RESUMEN

MOTIVATION: CircRNAs play a critical regulatory role in physiological processes, and the abnormal expression of circRNAs can mediate the processes of diseases. Therefore, exploring circRNAs-disease associations is gradually becoming an important area of research. Due to the high cost of validating circRNA-disease associations using traditional wet-lab experiments, novel computational methods based on machine learning are gaining more and more attention in this field. However, current computational methods suffer to insufficient consideration of latent features in circRNA-disease interactions. RESULTS: In this study, a multilayer attention neural graph-based collaborative filtering (MLNGCF) is proposed. MLNGCF first enhances multiple biological information with autoencoder as the initial features of circRNAs and diseases. Then, by constructing a central network of different diseases and circRNAs, a multilayer cooperative attention-based message propagation is performed on the central network to obtain the high-order features of circRNAs and diseases. A neural network-based collaborative filtering is constructed to predict the unknown circRNA-disease associations and update the model parameters. Experiments on the benchmark datasets demonstrate that MLNGCF outperforms state-of-the-art methods, and the prediction results are supported by the literature in the case studies. AVAILABILITY AND IMPLEMENTATION: The source codes and benchmark datasets of MLNGCF are available at https://github.com/ABard0/MLNGCF.


Asunto(s)
Redes Neurales de la Computación , ARN Circular , Aprendizaje Automático , Programas Informáticos , Biología Computacional/métodos
19.
Plant Physiol ; 192(1): 307-325, 2023 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-36755501

RESUMEN

Y900 is one of the top hybrid rice (Oryza sativa) varieties, with its yield exceeding 15 t·hm-2. To dissect the mechanism of heterosis, we sequenced the male parent line R900 and female parent line Y58S using long-read and Hi-C technology. High-quality reference genomes of 396.41 Mb and 398.24 Mb were obtained for R900 and Y58S, respectively. Genome-wide variations between the parents were systematically identified, including 1,367,758 single-nucleotide polymorphisms, 299,149 insertions/deletions, and 4,757 structural variations. The level of variation between Y58S and R900 was the lowest among the comparisons of Y58S with other rice genomes. More than 75% of genes exhibited variation between the two parents. Compared with other two-line hybrids sharing the same female parent, the portion of Geng/japonica (GJ)-type genetic components from different male parents increased with yield increasing in their corresponding hybrids. Transcriptome analysis revealed that the partial dominance effect was the main genetic effect that constituted the heterosis of Y900. In the hybrid, both alleles from the two parents were expressed, and their expression patterns were dynamically regulated in different tissues. The cis-regulation was dominant for young panicle tissues, while trans-regulation was more common in leaf tissues. Overdominance was surprisingly prevalent in stems and more likely regulated by the trans-regulation mechanism. Additionally, R900 contained many excellent GJ haplotypes, such as NARROW LEAF1, Oryza sativa SQUAMOSA PROMOTER BINDING PROTEIN-LIKE13, and Grain number, plant height, and heading date8, making it a good complement to Y58S. The fine-tuned mechanism of heterosis involves genome-wide variation, GJ introgression, key functional genes, and dynamic gene/allele expression and regulation pattern changes in different tissues and growth stages.


Asunto(s)
Vigor Híbrido , Oryza , Vigor Híbrido/genética , Oryza/genética , Perfilación de la Expresión Génica , Hibridación Genética
20.
Opt Express ; 32(4): 6277-6290, 2024 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-38439335

RESUMEN

In this study, a novel method that can detect carbon dioxide (CO2) concentration and realize temperature immunity based on only one fiber Bragg grating (FBG) is proposed. The outstanding contribution lies in solving the temperature crosstalk issue of FBG and ensuring the accuracy of detection results under the condition of anti-temperature interference. To achieve immunity to temperature interference without changing the initial structure of FBG, the optical fiber cladding of FBG and adjacent optical fiber cladding at both ends of FBG are modified by a polymer coating. Moreover, a universal immune temperature demodulation algorithm is derived. The experimental results demonstrate that the temperature response sensitivity of the improved FBG is controlled within the range of 0.00407 nm/°C. Compared with the initial FBG (the temperature sensitivity of the initial FBG is 0.04 nm/°C), it decreases by nearly 10 times. Besides, the gas response sensitivity of FBG reaches 1.6 pm/ppm and has overwhelmingly ideal linearity. The detection error results manifest that the gas concentration error in 20 groups of data does not exceed 3.16 ppm. The final reproducibility research shows that the difference in detection sensitivity between the two sensors is 0.08 pm/ppm, and the relative error of linearity is 1.07%. In a word, the proposed method can accurately detect the concentration of CO2 gas and is efficiently immune to temperature interference. The sensor we proposed has the advantages of a simple production process, low cost, and satisfactory reproducibility. It also has the prospect of mass production.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA