Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36484687

RESUMEN

MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF's intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. RESULTS: In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. AVAILABILITY AND IMPLEMENTATION: The source code is available at: https://github.com/turningpoint1988/CSSBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Cromatina , Factores de Transcripción , Humanos , Unión Proteica/genética , Sitios de Unión/genética , Factores de Transcripción/metabolismo , Secuenciación de Inmunoprecipitación de Cromatina , Biología Computacional/métodos
2.
BMC Bioinformatics ; 24(1): 188, 2023 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-37158823

RESUMEN

BACKGROUND: The limited knowledge of miRNA-lncRNA interactions is considered as an obstruction of revealing the regulatory mechanism. Accumulating evidence on Human diseases indicates that the modulation of gene expression has a great relationship with the interactions between miRNAs and lncRNAs. However, such interaction validation via crosslinking-immunoprecipitation and high-throughput sequencing (CLIP-seq) experiments that inevitably costs too much money and time but with unsatisfactory results. Therefore, more and more computational prediction tools have been developed to offer many reliable candidates for a better design of further bio-experiments. METHODS: In this work, we proposed a novel link prediction model based on Gaussian kernel-based method and linear optimization algorithm for inferring miRNA-lncRNA interactions (GKLOMLI). Given an observed miRNA-lncRNA interaction network, the Gaussian kernel-based method was employed to output two similarity matrixes of miRNAs and lncRNAs. Based on the integrated matrix combined with similarity matrixes and the observed interaction network, a linear optimization-based link prediction model was trained for inferring miRNA-lncRNA interactions. RESULTS: To evaluate the performance of our proposed method, k-fold cross-validation (CV) and leave-one-out CV were implemented, in which each CV experiment was carried out 100 times on a training set generated randomly. The high area under the curves (AUCs) at 0.8623 ± 0.0027 (2-fold CV), 0.9053 ± 0.0017 (5-fold CV), 0.9151 ± 0.0013 (10-fold CV), and 0.9236 (LOO-CV), illustrated the precision and reliability of our proposed method. CONCLUSION: GKLOMLI with high performance is anticipated to be used to reveal underlying interactions between miRNA and their target lncRNAs, and deciphers the potential mechanisms of the complex diseases.


Asunto(s)
MicroARNs , ARN Largo no Codificante , Humanos , ARN Largo no Codificante/genética , Reproducibilidad de los Resultados , Proyectos de Investigación , Algoritmos , MicroARNs/genética
3.
Appl Opt ; 57(10): 2376-2382, 2018 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-29714224

RESUMEN

A snapshot imaging polarimeter using spatial modulation can encode four Stokes parameters allowing instantaneous polarization measurement from a single interferogram. However, the reconstructed polarization images could suffer a severe aliasing signal if the high-frequency component of the intensity image is prominent and occurs in the polarization channels, and the reconstructed intensity image also suffers reduction of spatial resolution due to low-pass filtering. In this work, a method using two anti-phase snapshots is proposed to address the two problems simultaneously. The full-resolution target image and the pure interference fringes can be obtained from the sum and the difference of the two anti-phase interferograms, respectively. The polarization information reconstructed from the pure interference fringes does not contain the aliasing signal from the high-frequency component of the object intensity image. The principles of the method are derived and its feasibility is tested by both computer simulation and a verification experiment. This work provides a novel method for spatially modulated imaging polarization technology with two snapshots to simultaneously reconstruct a full-resolution object intensity image and high-quality polarization components.

4.
ScientificWorldJournal ; 2013: 386180, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23956693

RESUMEN

Time-series stream is one of the most common data types in data mining field. It is prevalent in fields such as stock market, ecology, and medical care. Segmentation is a key step to accelerate the processing speed of time-series stream mining. Previous algorithms for segmenting mainly focused on the issue of ameliorating precision instead of paying much attention to the efficiency. Moreover, the performance of these algorithms depends heavily on parameters, which are hard for the users to set. In this paper, we propose PRESEE (parameter-free, real-time, and scalable time-series stream segmenting algorithm), which greatly improves the efficiency of time-series stream segmenting. PRESEE is based on both MDL (minimum description length) and MML (minimum message length) methods, which could segment the data automatically. To evaluate the performance of PRESEE, we conduct several experiments on time-series streams of different types and compare it with the state-of-art algorithm. The empirical results show that PRESEE is very efficient for real-time stream datasets by improving segmenting speed nearly ten times. The novelty of this algorithm is further demonstrated by the application of PRESEE in segmenting real-time stream datasets from ChinaFLUX sensor networks data stream.


Asunto(s)
Algoritmos , Tiempo , Minería de Datos
5.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2690-2699, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36374878

RESUMEN

Transcription factors (TFs) play a part in gene expression. TFs can form complex gene expression regulation system by combining with DNA. Thereby, identifying the binding regions has become an indispensable step for understanding the regulatory mechanism of gene expression. Due to the great achievements of applying deep learning (DL) to computer vision and language processing in recent years, many scholars are inspired to use these methods to predict TF binding sites (TFBSs), achieving extraordinary results. However, these methods mainly focus on whether DNA sequences include TFBSs. In this paper, we propose a fully convolutional network (FCN) coupled with refinement residual block (RRB) and global average pooling layer (GAPL), namely FCNARRB. Our model could classify binding sequences at nucleotide level by outputting dense label for input data. Experimental results on human ChIP-seq datasets show that the RRB and GAPL structures are very useful for improving model performance. Adding GAPL improves the performance by 9.32% and 7.61% in terms of IoU (Intersection of Union) and PRAUC (Area Under Curve of Precision and Recall), and adding RRB improves the performance by 7.40% and 4.64%, respectively. In addition, we find that conservation information can help locate TFBSs.

6.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2629-2638, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-35925844

RESUMEN

Growing studies have shown that miRNAs are inextricably linked with many human diseases, and a great deal of effort has been spent on identifying their potential associations. Compared with traditional experimental methods, computational approaches have achieved promising results. In this article, we propose a graph representation learning method to predict miRNA-disease associations. Specifically, we first integrate the verified miRNA-disease associations with the similarity information of miRNA and disease to construct a miRNA-disease heterogeneous graph. Then, we apply a graph attention network to aggregate the neighbor information of nodes in each layer, and then feed the representation of the hidden layer into the structure-aware jumping knowledge network to obtain the global features of nodes. The output features of miRNAs and diseases are then concatenated and fed into a fully connected layer to score the potential associations. Through five-fold cross-validation, the average AUC, accuracy and precision values of our model are 93.30%, 85.18% and 88.90%, respectively. In addition, for three case studies of the esophageal tumor, lymphoma and prostate tumor, 46, 45 and 45 of the top 50 miRNAs predicted by our model were confirmed by relevant databases. Overall, our method could provide a reliable alternative for miRNA-disease association prediction.

7.
Comput Biol Med ; 167: 107596, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37890423

RESUMEN

Organ segmentation in abdominal or thoracic computed tomography (CT) images plays a crucial role in medical diagnosis as it enables doctors to locate and evaluate organ abnormalities quickly, thereby guiding surgical planning, and aiding treatment decision-making. This paper proposes a novel and efficient medical image segmentation method called SUnet for multi-organ segmentation in the abdomen and thorax. SUnet is a fully attention-based neural network. Firstly, an efficient spatial reduction attention (ESRA) module is introduced not only to extract image features better, but also to reduce overall model parameters, and to alleviate overfitting. Secondly, SUnet's multiple attention-based feature fusion module enables effective cross-scale feature integration. Additionally, an enhanced attention gate (EAG) module is considered by using grouped convolution and residual connections, providing richer semantic features. We evaluate the performance of the proposed model on synapse multiple organ segmentation dataset and automated cardiac diagnostic challenge dataset. SUnet achieves an average Dice of 84.29% and 92.25% on these two datasets, respectively, outperforming other models of similar complexity and size, and achieving state-of-the-art results.


Asunto(s)
Corazón , Redes Neurales de la Computación , Semántica , Tórax , Tomografía Computarizada por Rayos X , Procesamiento de Imagen Asistido por Computador
8.
IEEE Trans Neural Netw Learn Syst ; 33(9): 4332-4345, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-33600326

RESUMEN

Long short-term memory (LSTM) neural networks and attention mechanism have been widely used in sentiment representation learning and detection of texts. However, most of the existing deep learning models for text sentiment analysis ignore emotion's modulation effect on sentiment feature extraction, and the attention mechanisms of these deep neural network architectures are based on word- or sentence-level abstractions. Ignoring higher level abstractions may pose a negative effect on learning text sentiment features and further degrade sentiment classification performance. To address this issue, in this article, a novel model named AEC-LSTM is proposed for text sentiment detection, which aims to improve the LSTM network by integrating emotional intelligence (EI) and attention mechanism. Specifically, an emotion-enhanced LSTM, named ELSTM, is first devised by utilizing EI to improve the feature learning ability of LSTM networks, which accomplishes its emotion modulation of learning system via the proposed emotion modulator and emotion estimator. In order to better capture various structure patterns in text sequence, ELSTM is further integrated with other operations, including convolution, pooling, and concatenation. Then, topic-level attention mechanism is proposed to adaptively adjust the weight of text hidden representation. With the introduction of EI and attention mechanism, sentiment representation and classification can be more effectively achieved by utilizing sentiment semantic information hidden in text topic and context. Experiments on real-world data sets show that our approach can improve sentiment classification performance effectively and outperform state-of-the-art deep learning-based methods significantly.


Asunto(s)
Redes Neurales de la Computación , Análisis de Sentimientos , Emociones , Memoria a Largo Plazo , Semántica
9.
PLoS One ; 13(6): e0198922, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29953448

RESUMEN

Both in DNA and protein contexts, an important method for modelling motifs is to utilize position weight matrix (PWM) in biological sequences. With the development of genome sequencing technology, the quantity of the sequence data is increasing explosively, so the faster searching algorithms which have the ability to meet the increasingly need are desired to develop. In this paper, we proposed a method for speeding up the searching process of candidate transcription factor binding sites (TFBS), and the users can be allowed to specify p threshold to get the desired trade-off between speed and sensitivity for a particular sequence analysis. Moreover, the proposed method can also be generalized to large-scale annotation and sequence projects.


Asunto(s)
Elementos de Respuesta , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factores de Transcripción/genética
10.
IEEE/ACM Trans Comput Biol Bioinform ; 15(5): 1453-1460, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-28961121

RESUMEN

Post translational modification plays a significiant role in the biological processing. The potential post translational modification is composed of the center sites and the adjacent amino acid residues which are fundamental protein sequence residues. It can be helpful to perform their biological functions and contribute to understanding the molecular mechanisms that are the foundations of protein design and drug design. The existing algorithms of predicting modified sites often have some shortcomings, such as lower stability and accuracy. In this paper, a combination of physical, chemical, statistical, and biological properties of a protein have been ulitized as the features, and a novel framework is proposed to predict a protein's post translational modification sites. The multi-layer neural network and support vector machine are invoked to predict the potential modified sites with the selected features that include the compositions of amino acid residues, the E-H description of protein segments, and several properties from the AAIndex database. Being aware of the possible redundant information, the feature selection is proposed in the propocessing step in this research. The experimental results show that the proposed method has the ability to improve the accuracy in this classification issue.


Asunto(s)
Biología Computacional/métodos , Procesamiento Proteico-Postraduccional/genética , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Animales , Humanos , Ratones , Modelos Moleculares , Saccharomyces cerevisiae/genética , Máquina de Vectores de Soporte
11.
Artículo en Inglés | MEDLINE | ID: mdl-30137012

RESUMEN

Underlying a cancer phenotype is a specific gene regulatory network that represents the complex regulatory relationships between genes. However, it remains a challenge to find cancer-related gene regulatory network because of insufficient sample sizes and complex regulatory mechanisms in which gene is influenced by not only other genes but also other biological factors. With the development of high-throughput technologies and the unprecedented wealth of multi-omics data give us a new opportunity to design machine learning method to investigate underlying gene regulatory network. In this paper, we propose an approach, which use biweight midcorrelation to measure the correlation between factors and make use of nonconvex penalty based sparse regression for gene regulatory network inference (BMNPGRN). BMNCGRN incorporates multi-omics data (including DNA methylation and copy number variation) and their interactions in gene regulatory network model. The experimental results on synthetic datasets show that BMNPGRN outperforms popular and state-of-the-art methods (including DCGRN, ARACNE and CLR) under false positive control. Furthermore, we applied BMNPGRN on breast cancer (BRCA) data from The Cancer Genome Atlas database and provided gene regulatory network.

12.
Appl Plant Sci ; 5(1)2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-28090413

RESUMEN

PREMISE OF THE STUDY: Microsatellite markers were developed for Garcinia paucinervis (Clusiaceae), an endangered and endemic tree species of karst habitats, to analyze its genetic diversity and genetic structure. METHODS AND RESULTS: Using shotgun sequencing on an Illumina MiSeq platform, a total of 22 microsatellite primer sets were characterized, of which 17 were identified as polymorphic. For these polymorphic loci, the total number of alleles per locus ranged from two to 12 across 54 individuals from three populations. The observed and expected heterozygosities ranged from 0.000 to 1.000 and from 0.000 to 0.850, respectively. No pair of loci showed significant linkage disequilibrium. Three loci in one population deviated significantly from Hardy-Weinberg equilibrium (P < 0.05). Seven loci (JSL3, JSL5, JSL22, JSL29, JSL32, JSL39, and JSL43) were successfully amplified in G. bracteata. CONCLUSIONS: These markers will be useful in studies on genetic diversity and population structure of G. paucinervis.

13.
Curr Protein Pept Sci ; 15(6): 591-7, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25135674

RESUMEN

MicroRNA(miRNA) is a small, single stranded non-coding RNA which plays an important regulatory role in gene expression. Additionally, miRNAs perform crucial functions in a wide range of biological processes. These functions may be exploited for miRNA-mediated regulation of protein-protein interaction and thus protein function. Many computational methods have been developed to predict the miRNA targets and to explore the regulatory mechanism between miRNA and protein. However, the efforts to investigate important positions within miRNAs are not comprehensive. This paper presents a framework to identify important positions using collision entropy. The information of contained in the sequence and secondary structure of miRNAs is considered. Further, the single base collision entropy and the adjacent base related collision entropy are integrated to measure the importance of miRNA position. Two thresholds are employed to select those positions with more biological meaning. A dataset of Drosophila melanogaster is used in the experiments. The results demonstrate that our approach can find interesting and important positions within miRNAs and may lead to a better understanding of miRNA biogenesis and function.


Asunto(s)
Drosophila melanogaster/química , MicroARNs/química , Algoritmos , Animales , Secuencia de Bases , Bases de Datos de Ácidos Nucleicos , Drosophila melanogaster/genética , Entropía , MicroARNs/genética , Conformación de Ácido Nucleico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA