Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 65
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38975895

ABSTRACT

Spatial transcriptomics provides valuable insights into gene expression within the native tissue context, effectively merging molecular data with spatial information to uncover intricate cellular relationships and tissue organizations. In this context, deciphering cellular spatial domains becomes essential for revealing complex cellular dynamics and tissue structures. However, current methods encounter challenges in seamlessly integrating gene expression data with spatial information, resulting in less informative representations of spots and suboptimal accuracy in spatial domain identification. We introduce stCluster, a novel method that integrates graph contrastive learning with multi-task learning to refine informative representations for spatial transcriptomic data, consequently improving spatial domain identification. stCluster first leverages graph contrastive learning technology to obtain discriminative representations capable of recognizing spatially coherent patterns. Through jointly optimizing multiple tasks, stCluster further fine-tunes the representations to be able to capture complex relationships between gene expression and spatial organization. Benchmarked against six state-of-the-art methods, the experimental results reveal its proficiency in accurately identifying complex spatial domains across various datasets and platforms, spanning tissue, organ, and embryo levels. Moreover, stCluster can effectively denoise the spatial gene expression patterns and enhance the spatial trajectory inference. The source code of stCluster is freely available at https://github.com/hannshu/stCluster.


Subject(s)
Gene Expression Profiling , Transcriptome , Gene Expression Profiling/methods , Computational Biology/methods , Algorithms , Humans , Animals , Software , Machine Learning
2.
Bioinformatics ; 40(2)2024 02 01.
Article in English | MEDLINE | ID: mdl-38290765

ABSTRACT

SUMMARY: Single-cell multi-omics technologies provide a unique platform for characterizing cell states and reconstructing developmental process by simultaneously quantifying and integrating molecular signatures across various modalities, including genome, transcriptome, epigenome, and other omics layers. However, there is still an urgent unmet need for novel computational tools in this nascent field, which are critical for both effective and efficient interrogation of functionality across different omics modalities. Scbean represents a user-friendly Python library, designed to seamlessly incorporate a diverse array of models for the examination of single-cell data, encompassing both paired and unpaired multi-omics data. The library offers uniform and straightforward interfaces for tasks, such as dimensionality reduction, batch effect elimination, cell label transfer from well-annotated scRNA-seq data to scATAC-seq data, and the identification of spatially variable genes. Moreover, Scbean's models are engineered to harness the computational power of GPU acceleration through Tensorflow, rendering them capable of effortlessly handling datasets comprising millions of cells. AVAILABILITY AND IMPLEMENTATION: Scbean is released on the Python Package Index (PyPI) (https://pypi.org/project/scbean/) and GitHub (https://github.com/jhu99/scbean) under the MIT license. The documentation and example code can be found at https://scbean.readthedocs.io/en/latest/.


Subject(s)
Multiomics , Software , Genome , Transcriptome , Single-Cell Analysis , Data Analysis
3.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34585247

ABSTRACT

Single-cell technologies provide us new ways to profile transcriptomic landscape, chromatin accessibility, spatial expression patterns in heterogeneous tissues at the resolution of single cell. With enormous generated single-cell datasets, a key analytic challenge is to integrate these datasets to gain biological insights into cellular compositions. Here, we developed a domain-adversarial and variational approximation, DAVAE, which can integrate multiple single-cell datasets across samples, technologies and modalities with a single strategy. Besides, DAVAE can also integrate paired data of ATAC profile and transcriptome profile that are simultaneously measured from a same cell. With a mini-batch stochastic gradient descent strategy, it is scalable for large-scale data and can be accelerated by GPUs. Results on seven real data integration applications demonstrated the effectiveness and scalability of DAVAE in batch-effect removing, transfer learning and cell-type predictions for multiple single-cell datasets across samples, technologies and modalities. Availability: DAVAE has been implemented in a toolkit package "scbean" in the pypi repository, and the source code can be also freely accessible at https://github.com/jhu99/scbean. All our data and source code for reproducing the results of this paper can be accessible at https://github.com/jhu99/davae_paper.


Subject(s)
Single-Cell Analysis , Software , Algorithms , Chromatin , Transcriptome
4.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36622018

ABSTRACT

MOTIVATION: Single-cell multimodal assays allow us to simultaneously measure two different molecular features of the same cell, enabling new insights into cellular heterogeneity, cell development and diseases. However, most existing methods suffer from inaccurate dimensionality reduction for the joint-modality data, hindering their discovery of novel or rare cell subpopulations. RESULTS: Here, we present VIMCCA, a computational framework based on variational-assisted multi-view canonical correlation analysis to integrate paired multimodal single-cell data. Our statistical model uses a common latent variable to interpret the common source of variances in two different data modalities. Our approach jointly learns an inference model and two modality-specific non-linear models by leveraging variational inference and deep learning. We perform VIMCCA and compare it with 10 existing state-of-the-art algorithms on four paired multi-modal datasets sequenced by different protocols. Results demonstrate that VIMCCA facilitates integrating various types of joint-modality data, thus leading to more reliable and accurate downstream analysis. VIMCCA improves our ability to identify novel or rare cell subtypes compared to existing widely used methods. Besides, it can also facilitate inferring cell lineage based on joint-modality profiles. AVAILABILITY AND IMPLEMENTATION: The VIMCCA algorithm has been implemented in our toolkit package scbean (≥0.5.0), and its code has been archived at https://github.com/jhu99/scbean under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Models, Statistical , Cell Differentiation , Cell Lineage
5.
Nucleic Acids Res ; 50(4): e21, 2022 02 28.
Article in English | MEDLINE | ID: mdl-34871454

ABSTRACT

Data alignment is one of the first key steps in single cell analysis for integrating multiple datasets and performing joint analysis across studies. Data alignment is challenging in extremely large datasets, however, as the major of the current single cell data alignment methods are not computationally efficient. Here, we present VIPCCA, a computational framework based on non-linear canonical correlation analysis for effective and scalable single cell data alignment. VIPCCA leverages both deep learning for effective single cell data modeling and variational inference for scalable computation, thus enabling powerful data alignment across multiple samples, multiple data platforms, and multiple data types. VIPCCA is accurate for a range of alignment tasks including alignment between single cell RNAseq and ATACseq datasets and can easily accommodate millions of cells, thereby providing researchers unique opportunities to tackle challenges emerging from large-scale single-cell atlas.


Subject(s)
Canonical Correlation Analysis , Single-Cell Analysis
6.
Circulation ; 145(24): 1749-1760, 2022 06 14.
Article in English | MEDLINE | ID: mdl-35450432

ABSTRACT

BACKGROUND: Short-term exposure to ambient air pollution has been linked with daily hospitalization and mortality from acute coronary syndrome (ACS); however, the associations of subdaily (hourly) levels of criteria air pollutants with the onset of ACS and its subtypes have rarely been evaluated. METHODS: We conducted a time-stratified case-crossover study among 1 292 880 patients with ACS from 2239 hospitals in 318 Chinese cities between January 1, 2015, and September 30, 2020. Hourly concentrations of fine particulate matter (PM2.5), coarse particulate matter (PM2.5-10), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), and ozone (O3) were collected. Hourly onset data of ACS and its subtypes, including ST-segment-elevation myocardial infarction, non-ST-segment-elevation myocardial infarction, and unstable angina, were also obtained. Conditional logistic regressions combined with polynomial distributed lag models were applied. RESULTS: Acute exposures to PM2.5, NO2, SO2, and CO were each associated with the onset of ACS and its subtypes. These associations were strongest in the concurrent hour of exposure and were attenuated thereafter, with the weakest effects observed after 15 to 29 hours. There were no apparent thresholds in the concentration-response curves. An interquartile range increase in concentrations of PM2.5 (36.0 µg/m3), NO2 (29.0 µg/m3), SO2 (9.0 µg/m3), and CO (0.6 mg/m3) over the 0 to 24 hours before onset was significantly associated with 1.32%, 3.89%, 0.67%, and 1.55% higher risks of ACS onset, respectively. For a given pollutant, the associations were comparable in magnitude across different subtypes of ACS. NO2 showed the strongest associations with all 3 subtypes, followed by PM2.5, CO, and SO2. Greater magnitude of associations was observed among patients older than 65 years and in the cold season. Null associations of exposure to either PM2.5-10 or O3 with ACS onset were observed. CONCLUSIONS: The results suggest that transient exposure to the air pollutants PM2.5, NO2, SO2, or CO, but not PM2.5-10 or O3, may trigger the onset of ACS, even at concentrations below the World Health Organization air quality guidelines.


Subject(s)
Acute Coronary Syndrome , Air Pollutants , Air Pollution , Environmental Exposure , Acute Coronary Syndrome/epidemiology , Air Pollutants/analysis , Air Pollutants/toxicity , Air Pollution/adverse effects , Air Pollution/analysis , Carbon Monoxide/analysis , Carbon Monoxide/toxicity , China/epidemiology , Cities/epidemiology , Cross-Over Studies , Environmental Exposure/adverse effects , Environmental Exposure/analysis , Humans , Nitrogen Dioxide/analysis , Nitrogen Dioxide/toxicity , Ozone/analysis , Ozone/toxicity , Particulate Matter/analysis , Particulate Matter/toxicity , Sulfur Dioxide/analysis , Sulfur Dioxide/toxicity , Time Factors
7.
CMAJ ; 195(17): E601-E611, 2023 05 01.
Article in English | MEDLINE | ID: mdl-37127306

ABSTRACT

BACKGROUND: Few studies have explored the relationship between air pollution and arrhythmia onset at the hourly level. We aimed to examine the association of exposure to air pollution with the onset of acute symptomatic arrhythmia at an hourly level. METHODS: We conducted a nationwide, time-stratified, case-crossover study in China between 2015 and 2021. We obtained hourly information on the onset of symptomatic arrhythmia (including atrial fibrillation, atrial flutter, atrial and ventricular premature beats and supraventricular tachycardia) from the Chinese Cardiovascular Association Database - Chest Pain Center (including 2025 certified hospitals in 322 cities). We obtained data on hourly concentrations of 6 air pollutants from the nearest monitors, including fine particles (PM2.5), coarse particles (PM2.5-10), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO) and ozone. For each patient, we matched the case period to 3 or 4 control periods during the same hour, day of week, month and year. We used conditional logistic regression models to analyze the data. RESULTS: We included a total of 190 115 patients with acute onset of symptomatic arrhythmia. Air pollution was associated with increased risk of onset of symptomatic arrhythmia within the first few hours of exposure; this risk attenuated substantially after 24 hours. An interquartile range increase in PM2.5, NO2, SO2 and CO in the first 24 hours after exposure (i.e., lag period 0-24 h) was associated with significantly higher odds of atrial fibrillation (1.7%-3.4%), atrial flutter (8.1%-11.4%) and supraventricular tachycardia (3.4%-8.9%). Exposure to PM2.5-10 was associated with significantly higher odds of atrial flutter (8.7%) and supraventricular tachycardia (5.4%), and exposure to ozone was associated with higher odds of supraventricular tachycardia (3.4%). The exposure-response relationships were approximately linear, without discernible concentration thresholds. INTERPRETATION: Exposure to air pollution was associated with the onset of symptomatic arrhythmia shortly after exposure. This finding highlights the importance of further reducing air pollution and taking prompt protective measures for susceptible populations during periods of elevated levels of air pollutants.


Subject(s)
Air Pollutants , Air Pollution , Atrial Fibrillation , Atrial Flutter , Ozone , Humans , Cross-Over Studies , Atrial Fibrillation/chemically induced , Cities , Atrial Flutter/chemically induced , Nitrogen Dioxide , Particulate Matter/adverse effects , Particulate Matter/analysis , Air Pollution/adverse effects , Air Pollutants/adverse effects , Air Pollutants/analysis , Ozone/analysis , China , Environmental Exposure/adverse effects
8.
BMC Bioinformatics ; 22(1): 5, 2021 Jan 06.
Article in English | MEDLINE | ID: mdl-33407064

ABSTRACT

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) enables the possibility of many in-depth transcriptomic analyses at a single-cell resolution. It's already widely used for exploring the dynamic development process of life, studying the gene regulation mechanism, and discovering new cell types. However, the low RNA capture rate, which cause highly sparse expression with dropout, makes it difficult to do downstream analyses. RESULTS: We propose a new method SCC to impute the dropouts of scRNA-seq data. Experiment results show that SCC gives competitive results compared to two existing methods while showing superiority in reducing the intra-class distance of cells and improving the clustering accuracy in both simulation and real data. CONCLUSIONS: SCC is an effective tool to resolve the dropout noise in scRNA-seq data. The code is freely accessible at https://github.com/nwpuzhengyan/SCC .


Subject(s)
Gene Expression Profiling/methods , RNA, Small Cytoplasmic/genetics , Single-Cell Analysis/methods , Gene Expression Regulation/genetics , Genomics/methods , Models, Genetic
9.
BMC Cardiovasc Disord ; 21(1): 376, 2021 08 04.
Article in English | MEDLINE | ID: mdl-34348647

ABSTRACT

BACKGROUND: H type hypertension is defined as homocysteine (Hcy) ≥ 10 µmol/L in combination with primary hypertension. Studies demonstrated that the existence of hyperhomocysteine (HHcy) in hypertensive exacerbates the poor outcome of cardiocerebral incidents. This study was to investigate the current epidemic situation of H type hypertension and determine the risk factors in order to find intervention targets for H type hypertensives. METHODS: We conducted a cross-sectional study using cluster sampling design in Shanghai, China from July 2019 and April 2020. 23,652 patients with primary hypertension were enrolled in this study. Their medical information was recorded, and the level of Hcy concentrations and methylenetetrahydrofolate reductase (MTHFR) C677T polymorphisms were detected. RESULTS: In total, 22,731 of 23,652 patients were recorded. The mean age was 68.9 ± 8.6 y and 43% were men. 80.0% of the enrolled patients had H type hypertension. The frequency of allele T was 40.9%, and the proportions of the CC, CT, and TT genotypes were 36.1%, 46.0%, and 17.9%, respectively. Compared with the TT genotype, the plasma Hcy concentration levels were lower in patients with the CC/CT genotype (18.96 ± 13.48 µmol/L vs. 13.62 ± 5.20/14.28 ± 5.36, F = 75.04, p < 0.01). The risk for H type hypertension was higher in elderly people. Men had ~ 5.55-fold odds of H type hypertension compared with women. Patients with CT genotype and TT genotype had ~ 1.36- and ~ 2.76-fold odds of H type hypertension compared with those with CC genotype, respectively. Smoking and diabetes were not significantly associated with H type hypertension. CONCLUSIONS: The prevalence of H type hypertension in patients with primary hypertension was 80.0%, which was higher than the 75% found in prior report in China. Age, gender, and MTHFR C677T polymorphisms rather than smoking and diabetes were independently associated with H type hypertension.


Subject(s)
Genotype , Homocysteine/blood , Hypertension/blood , Hypertension/epidemiology , Methylenetetrahydrofolate Reductase (NADPH2)/genetics , Adult , Aged , Aged, 80 and over , China/epidemiology , Cross-Sectional Studies , Female , Humans , Hyperhomocysteinemia/complications , Hypertension/genetics , Male , Middle Aged , Polymorphism, Genetic , Prevalence , Risk Factors
10.
Environ Res ; 194: 110655, 2021 03.
Article in English | MEDLINE | ID: mdl-33358871

ABSTRACT

BACKGROUND: The impacts of temperature variability on cardiac autonomic function remain unclear. OBJECTIVE: To explore the short-term associations between daily temperature variability and parameters of heart rate variability (HRV). METHODS: This is a repeated-measure study among 78 eligible participants in Shanghai, China. We defined temperature variability as diurnal temperature range (DTR), the standard-deviation of temperature (SDT) and temperature variability (TV). We evaluated 3 frequency-domain HRV parameters (VLF, LF and HF) and 4 time-domain parameters (SDNN, SDANN, rMSSD and pNN50). We used linear mixed-effect models to analyze the data after controlling for environmental and individual confounders. RESULTS: Temperature variability was significantly associated with decreased HRV, especially on the concurrent day. The exposure-response relationships were almost inversely linear for most parameters. Every one interquartile range (IQR) increase of DTR was associated with a decrease of 3.92% for VLF, 6.99% for LF, 5.88% for HF, 3.94% for rMSSD and 1.30% for pNN50. Each IQR increase of SDT was associated with a decline of 6.48% for LF, 5.91% for HF, 4.26% for rMSSD and 1.87% for pNN50. Every IQR increase of SDT was associated with a decrease of 4.39% for VLF, 7.67% for LF, 6.52% for HF, 3.22% for SDNN, 2.98% for SDANN, 4.05% for rMSSD, and 1.41% for pNN50. The decrements in HRV associated with temperature variability were more prominent in females. CONCLUSION: Temperature variability on the concurrent day could significantly decrease cardiac autonomic function, especially in females.


Subject(s)
Autonomic Nervous System , Heart , China , Female , Heart Rate , Humans , Temperature
11.
Ecotoxicol Environ Saf ; 208: 111726, 2021 Jan 15.
Article in English | MEDLINE | ID: mdl-33396057

ABSTRACT

BACKGROUND: It remains unclear which size of particles has the strongest effects on heart rate variability (HRV). OBJECTIVE: To explore the association between HRV parameters and daily variations of size-fractionated particle number concentrations (PNCs). METHODS: We conducted a longitudinal repeated-measure study among 78 participants with a 24-h continuous ambulatory Holter electrocardiographic recorder in Shanghai, China, from January 2015 to June 2019. Linear mixed-effects models were employed to evaluate the changes of HRV parameters associated with PNCs of 7 size ranges from 0.01 to 10 µm after controlling for environmental and individual confounders. RESULTS: On the concurrent day, decreased HRV parameters were associated with increased PNCs of 0.01-0.3 µm, and smaller particles showed greater effects. For an interquartile range increase in ultrafine particles (UFP, those < 0.1 µm, 2453 particles/cm3), the declines in very-low-frequency power, low-frequency power, high-frequency power, standard deviation of normal R-R intervals, root mean square of the successive differences between R-R intervals and percentage of adjacent normal R-R intervals with a difference ≥ 50 ms were 5.06% [95% confidence interval (CI): 2.09%, 7.94%], 7.65% (95%CI: 2.73%, 12.32%), 9.49% (95%CI: 4.64%, 14.09%), 5.10% (95%CI: 2.21%, 7.91%), 8.09% (95%CI: 4.39%, 11.65%) and 24.98% (95%CI: 14.70%, 34.02%), respectively. These results were robust to the adjustment of criteria air pollutants, temperature at different lags, and the status of heart medication. CONCLUSIONS: Particles less than 0.3 µm (especially UFP) may dominate the acute effects of particulate air pollution on cardiac autonomic dysfunction.


Subject(s)
Air Pollutants/analysis , Air Pollution/statistics & numerical data , Environmental Exposure/statistics & numerical data , Particulate Matter/analysis , Air Pollution/analysis , China , Female , Heart Diseases , Heart Rate/drug effects , Humans , Male , Middle Aged , Particle Size , Temperature
12.
BMC Bioinformatics ; 21(Suppl 13): 385, 2020 Sep 17.
Article in English | MEDLINE | ID: mdl-32938373

ABSTRACT

BACKGROUND: Network alignment is an efficient computational framework in the prediction of protein function and phylogenetic relationships in systems biology. However, most of existing alignment methods focus on aligning PPIs based on static network model, which are actually dynamic in real-world systems. The dynamic characteristic of PPI networks is essential for understanding the evolution and regulation mechanism at the molecular level and there is still much room to improve the alignment quality in dynamic networks. RESULTS: In this paper, we proposed a novel alignment algorithm, Twadn, to align dynamic PPI networks based on a strategy of time warping. We compare Twadn with the existing dynamic network alignment algorithm DynaMAGNA++ and DynaWAVE and use area under the receiver operating characteristic curve and area under the precision-recall curve as evaluation indicators. The experimental results show that Twadn is superior to DynaMAGNA++ and DynaWAVE. In addition, we use protein interaction network of Drosophila to compare Twadn and the static network alignment algorithm NetCoffee2 and experimental results show that Twadn is able to capture timing information compared to NetCoffee2. CONCLUSIONS: Twadn is a versatile and efficient alignment tool that can be applied to dynamic network. Hopefully, its application can benefit the research community in the fields of molecular function and evolution.


Subject(s)
Algorithms , Computational Biology/methods , Drosophila/metabolism , Protein Interaction Maps/genetics , Proteins/metabolism , Animals , Humans
13.
BMC Med ; 18(1): 312, 2020 11 10.
Article in English | MEDLINE | ID: mdl-33167994

ABSTRACT

BACKGROUND: Recently, the association between inflammatory bowel disease (including ulcerative colitis and Crohn's disease) and BMD has attracted great interest in the research community. However, the results of the published epidemiological observational studies on the relationship between inflammatory bowel disease and BMD are still inconclusive. Here, we performed a two-sample Mendelian randomization analysis to investigate the causal link between inflammatory bowel disease and level of BMD using publically available GWAS summary statistics. METHODS: A series of quality control steps were taken in our analysis to select eligible instrumental SNPs which were strongly associated with exposure. To make the conclusions more robust and reliable, we utilized several robust analytical methods (inverse-variance weighting, MR-PRESSO method, mode-based estimate method, weighted median, MR-Egger regression, and MR.RAPS method) that are based on different assumptions of two-sample MR analysis. The MR-Egger intercept test, Cochran's Q test, and "leave-one-out" sensitivity analysis were performed to evaluate the horizontal pleiotropy, heterogeneities, and stability of these genetic variants on BMD. Outlier variants identified by the MR-PRESSO outlier test were removed step-by-step to reduce heterogeneity and the effect of horizontal pleiotropy. RESULTS: Our two-sample Mendelian randomization analysis with two groups of exposure GWAS summary statistics and four groups of outcome GWAS summary statistics suggested a definitively causal effect of genetically predicted ulcerative colitis on TB-BMD and FA-BMD but not on FN-BMD or LS-BMD (after Bonferroni correction), and we merely determined a causal effect of Crohn's disease on FN-BMD but not on the others, which was somewhat inconsistent with many published observational researches. The causal effect of inflammatory bowel disease on TB-BMD was significant and robust but not on FA-BMD, FN-BMD, and LS-BMD, which might result from the cumulative effect of ulcerative colitis and Crohn's disease on BMDs. CONCLUSIONS: Our Mendelian randomization analysis supported the causal effect of ulcerative colitis on TB-BMD and FA-BMD. As to Crohn's disease, only the definitively causal effect of it on decreased FN-BMD was observed. Updated MR analysis is warranted to confirm our findings when a more advanced method to get less biased estimates and better precision or GWAS summary data with more ulcerative colitis and Crohn's disease patients was available.


Subject(s)
Inflammatory Bowel Diseases/epidemiology , Inflammatory Bowel Diseases/pathology , Bone Density , Humans , Mendelian Randomization Analysis/methods , Research Design
14.
BMC Bioinformatics ; 20(Suppl 18): 569, 2019 Nov 25.
Article in English | MEDLINE | ID: mdl-31760932

ABSTRACT

BACKGROUNDS: There is evidence to suggest that lncRNAs are associated with distinct and diverse biological processes. The dysfunction or mutation of lncRNAs are implicated in a wide range of diseases. An accurate computational model can benefit the diagnosis of diseases and help us to gain a better understanding of the molecular mechanism. Although many related algorithms have been proposed, there is still much room to improve the accuracy of the algorithm. RESULTS: We developed a novel algorithm, BiWalkLDA, to predict disease-related lncRNAs in three real datasets, which have 528 lncRNAs, 545 diseases and 1216 interactions in total. To compare performance with other algorithms, the leave-one-out validation test was performed for BiWalkLDA and three other existing algorithms, SIMCLDA, LDAP and LRLSLDA. Additional tests were carefully designed to analyze the parameter effects such as α, ß, l and r, which could help user to select the best choice of these parameters in their own application. In a case study of prostate cancer, eight out of the top-ten disease-related lncRNAs reported by BiWalkLDA were previously confirmed in literatures. CONCLUSIONS: In this paper, we develop an algorithm, BiWalkLDA, to predict lncRNA-disease association by using bi-random walks. It constructs a lncRNA-disease network by integrating interaction profile and gene ontology information. Solving cold-start problem by using neighbors' interaction profile information. Then, bi-random walks was applied to three real biological datasets. Results show that our method outperforms other algorithms in predicting lncRNA-disease association in terms of both accuracy and specificity. AVAILABILITY: https://github.com/screamer/BiwalkLDA.


Subject(s)
Computational Biology/methods , Disease/genetics , RNA, Long Noncoding/genetics , Algorithms , Computer Simulation , Gene Ontology , Humans , Software
15.
BMC Bioinformatics ; 20(Suppl 7): 200, 2019 May 01.
Article in English | MEDLINE | ID: mdl-31074373

ABSTRACT

BACKGROUND: Transcription factors (TFs) play important roles in the regulation of gene expression. They can activate or block transcription of downstream genes in a manner of binding to specific genomic sequences. Therefore, motif discovery of these binding preference patterns is of central significance in the understanding of molecular regulation mechanism. Many algorithms have been proposed for the identification of transcription factor binding sites. However, it remains a challengeable problem. RESULTS: Here, we proposed a novel motif discovery algorithm based on support vector machine (MD-SVM) to learn a discriminative model for TF binding sites. MD-SVM firstly obtains position weight matrix (PWM) from a set of training datasets. Then it translates the MD problem into a computational framework of multiple instance learning (MIL). It was applied to several real biological datasets. Results show that our algorithm outperforms MI-SVM in terms of both accuracy and specificity. CONCLUSIONS: In this paper, we modeled the TF motif discovery problem as a MIL optimization problem. The SVM algorithm was adapted to discriminate positive and negative bags of instances. Compared to other svm-based algorithms, MD-SVM show its superiority over its competitors in term of ROC AUC. Hopefully, it could be of benefit to the research community in the understanding of molecular functions of DNA functional elements and transcription factors.


Subject(s)
Algorithms , Nucleotide Motifs , Support Vector Machine , Transcription Factors/metabolism , Binding Sites , Humans , Protein Binding
16.
BMC Genomics ; 20(Suppl 13): 932, 2019 Dec 27.
Article in English | MEDLINE | ID: mdl-31881842

ABSTRACT

Proteins play essential roles in almost all life processes. The prediction of protein function is of significance for the understanding of molecular function and evolution. Network alignment provides a fast and effective framework to automatically identify functionally conserved proteins in a systematic way. However, due to the fast growing genomic data, interactions and annotation data, there is an increasing demand for more accurate and efficient tools to deal with multiple PPI networks. Here, we present a novel global alignment algorithm NetCoffee2 based on graph feature vectors to discover functionally conserved proteins and predict function for unknown proteins. To test the algorithm performance, NetCoffee2 and three other notable algorithms were applied on eight real biological datasets. Functional analyses were performed to evaluate the biological quality of these alignments. Results show that NetCoffee2 is superior to existing algorithms IsoRankN, NetCoffee and multiMAGNA++ in terms of both coverage and consistency. The binary and source code are freely available under the GNU GPL v3 license at https://github.com/screamer/NetCoffee2.


Subject(s)
Algorithms , Proteins/metabolism , Animals , Arabidopsis/metabolism , Arabidopsis Proteins/chemistry , Arabidopsis Proteins/metabolism , Drosophila/metabolism , Drosophila Proteins/chemistry , Drosophila Proteins/metabolism , Entropy , Humans , Mice , Protein Interaction Maps , Proteins/chemistry , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism
17.
BMC Bioinformatics ; 19(1): 422, 2018 Nov 12.
Article in English | MEDLINE | ID: mdl-30419809

ABSTRACT

BACKGROUND: The discovery of functionally conserved proteins is a tough and important task in system biology. Global network alignment provides a systematic framework to search for these proteins from multiple protein-protein interaction (PPI) networks. Although there exist many web servers for network alignment, no one allows to perform global multiple network alignment tasks on users' test datasets. RESULTS: Here, we developed a web server WebNetcoffee based on the algorithm of NetCoffee to search for a global network alignment from multiple networks. To build a series of online test datasets, we manually collected 218,339 proteins, 4,009,541 interactions and many other associated protein annotations from several public databases. All these datasets and alignment results are available for download, which can support users to perform algorithm comparison and downstream analyses. CONCLUSION: WebNetCoffee provides a versatile, interactive and user-friendly interface for easily running alignment tasks on both online datasets and users' test datasets, managing submitted jobs and visualizing the alignment results through a web browser. Additionally, our web server also facilitates graphical visualization of induced subnetworks for a given protein and its neighborhood. To the best of our knowledge, it is the first web server that facilitates the performing of global alignment for multiple PPI networks. AVAILABILITY: http://www.nwpu-bioinformatics.com/WebNetCoffee.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Humans
18.
Article in English | MEDLINE | ID: mdl-28416555

ABSTRACT

Tuberculosis (TB) continues to be one of the most common bacterial infectious diseases and is the leading cause of death in many parts of the world. A major limitation of TB therapy is slow killing of the infecting organism, increasing the risk for the development of a tolerance phenotype and drug resistance. Studies indicate that Mycobacterium tuberculosis takes several days to be killed upon treatment with lethal concentrations of antibiotics both in vitro and in vivo To investigate how metabolic remodeling can enable transient bacterial survival during exposure to bactericidal concentrations of compounds, M. tuberculosis strain H37Rv was exposed to twice the MIC of isoniazid, rifampin, moxifloxacin, mefloquine, or bedaquiline for 24 h, 48 h, 4 days, and 6 days, and the bacterial proteomic response was analyzed using quantitative shotgun mass spectrometry. Numerous sets of de novo bacterial proteins were identified over the 6-day treatment. Network analysis and comparisons between the drug treatment groups revealed several shared sets of predominant proteins and enzymes simultaneously belonging to a number of diverse pathways. Overexpression of some of these proteins in the nonpathogenic Mycobacterium smegmatis extended bacterial survival upon exposure to bactericidal concentrations of antimicrobials, and inactivation of some proteins in M. tuberculosis prevented the pathogen from escaping the fast killing in vitro and in macrophages, as well. Our biology-driven approach identified promising bacterial metabolic pathways and enzymes that might be targeted by novel drugs to reduce the length of tuberculosis therapy.


Subject(s)
Antitubercular Agents/pharmacology , Mycobacterium tuberculosis/drug effects , Proteomics/methods , Diarylquinolines/pharmacology , Fluoroquinolones/pharmacology , Isoniazid/pharmacology , Mefloquine/pharmacology , Moxifloxacin , Proteome/metabolism , Rifampin/pharmacology
19.
Acta Biochim Biophys Sin (Shanghai) ; 49(3): 270-276, 2017 Mar 01.
Article in English | MEDLINE | ID: mdl-28159958

ABSTRACT

Cardiac sodium channel plays a key role in the fast depolarization and maintenance of impulse conduction in cardiomyocytes. Mutations of SCN5A gene can lead to many types of arrhythmias. A 14-year-old boy with familial paternal history of sudden unexpected nocturnal death was admitted to hospital with recurrent syncope. A cardiac channelopathy was suspected and a pathogenic ion channel was searched for mutation identification. The proband manifested sinus node dysfunction, ventricular tachycardia, cardiac conduction disturbance involving atrioventricular node and His bundle. The proband and his mother received whole exome sequencing. A heterozygous in-frame deletion N1380del on exon 23 of SCN5A gene locating in a highly conserved pore residue in domain III (S5-S6) was revealed in the proband. The mutation was assessed in other family members by Sanger sequencing. The proband's living uncle and two sisters were asymptomatic mutation carriers with different degrees of cardiac conduction disturbance. Functional analysis was conducted using whole-cell patch clamping in HEK293T cells transfected with wild-type or mutant channels. The HEK293T cells transfected with plasmid pcDNA3.1-N1380del-SCN5A had no detectable sodium current. Overall, N1380del mutation of SCN5A gene leads to loss of function of sodium channel. N1380del is a pathogenetic mutation which can cause cardiac conduction defect and ventricular tachycardia.


Subject(s)
Cardiac Conduction System Disease/genetics , Mutation , NAV1.5 Voltage-Gated Sodium Channel/genetics , Tachycardia, Ventricular/genetics , Adolescent , Cardiac Conduction System Disease/pathology , Cardiac Conduction System Disease/therapy , Exons , Humans , Male , Phenotype , Prognosis , Tachycardia, Ventricular/pathology , Tachycardia, Ventricular/therapy
20.
Molecules ; 22(12)2017 Dec 10.
Article in English | MEDLINE | ID: mdl-29232861

ABSTRACT

Network motifs are patterns of complex networks occurring significantly more frequently than those in random networks. They have been considered as fundamental building blocks of complex networks. Therefore, the detection of network motifs in transcriptional regulation networks is a crucial step in understanding the mechanism of transcriptional regulation and network evolution. The search for network motifs is similar to solving subgraph searching problems, which has proven to be NP-complete. To quickly and effectively count subgraphs of a large biological network, we propose a novel graph canonization algorithm based on resolving sets. This method has been implemented in a command line interface (CLI) program sgip using the SeqAn library. Comparing to Babai's algorithm, this approach has a tighter complexity bound, o ( exp ( n log 2 n + 4 log n ) ) , on strongly regular graphs. Results on several simulated datasets and transcriptional regulation networks indicate that sgip outperforms nauty on many graph cases. The source code of sgip is freely accessible in https://github.com/seqan/seqan/tree/master/apps/sgip and the binary code in http://packages.seqan.de/sgip/.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Algorithms , Internet , Software
SELECTION OF CITATIONS
SEARCH DETAIL