Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.599
Filter
Add more filters

Publication year range
1.
Mol Cell ; 82(20): 3840-3855.e8, 2022 10 20.
Article in English | MEDLINE | ID: mdl-36270248

ABSTRACT

The use of alternative promoters, splicing, and cleavage and polyadenylation (APA) generates mRNA isoforms that expand the diversity and complexity of the transcriptome. Here, we uncovered thousands of previously undescribed 5' uncapped and polyadenylated transcripts (5' UPTs). We show that these transcripts resist exonucleases due to a highly structured RNA and N6-methyladenosine modification at their 5' termini. 5' UPTs appear downstream of APA sites within their host genes and are induced upon APA activation. Strong enrichment in polysomal RNA fractions indicates 5' UPT translational potential. Indeed, APA promotes downstream translation initiation, non-canonical protein output, and consistent changes to peptide presentation at the cell surface. Lastly, we demonstrate the biological importance of 5' UPTs using Bcl2, a prominent anti-apoptotic gene whose entire coding sequence is a 5' UPT generated from 5' UTR-embedded APA sites. Thus, APA is not only accountable for terminating transcripts, but also for generating downstream uncapped RNAs with translation potential and biological impact.


Subject(s)
Polyadenylation , RNA Isoforms , RNA Isoforms/genetics , 5' Untranslated Regions , 3' Untranslated Regions/genetics , Proto-Oncogene Proteins c-bcl-2/genetics , Exonucleases/genetics
2.
Am J Hum Genet ; 111(5): 966-978, 2024 05 02.
Article in English | MEDLINE | ID: mdl-38701746

ABSTRACT

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.


Subject(s)
Asthma , Genome-Wide Association Study , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Humans , Asthma/genetics , Markov Chains , Colitis, Ulcerative/genetics , Reproducibility of Results , Phenotype , Genotype
3.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38340093

ABSTRACT

Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.


Subject(s)
Algorithms , DNA Copy Number Variations , High-Throughput Nucleotide Sequencing/methods
4.
Mol Biol Evol ; 41(7)2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38958167

ABSTRACT

Admixture between populations and species is common in nature. Since the influx of new genetic material might be either facilitated or hindered by selection, variation in mixture proportions along the genome is expected in organisms undergoing recombination. Various graph-based models have been developed to better understand these evolutionary dynamics of population splits and mixtures. However, current models assume a single mixture rate for the entire genome and do not explicitly account for linkage. Here, we introduce TreeSwirl, a novel method for inferring branch lengths and locus-specific mixture proportions by using genome-wide allele frequency data, assuming that the admixture graph is known or has been inferred. TreeSwirl builds upon TreeMix that uses Gaussian processes to estimate the presence of gene flow between diverged populations. However, in contrast to TreeMix, our model infers locus-specific mixture proportions employing a hidden Markov model that accounts for linkage. Through simulated data, we demonstrate that TreeSwirl can accurately estimate locus-specific mixture proportions and handle complex demographic scenarios. It also outperforms related D- and f-statistics in terms of accuracy and sensitivity to detect introgressed loci.


Subject(s)
Gene Frequency , Models, Genetic , Genetics, Population/methods , Markov Chains , Gene Flow , Genome , Computer Simulation , Genetic Linkage
5.
J Cell Sci ; 136(4)2023 02 15.
Article in English | MEDLINE | ID: mdl-36655427

ABSTRACT

The lateral diffusion of transmembrane proteins on plasma membranes is a fundamental process for various cellular functions. Diffusion properties specific for individual protein species have been extensively studied, but the common features among protein species are poorly understood. Here, we systematically studied the lateral diffusion of various transmembrane proteins in the lower eukaryote Dictyostelium discoideum cells using a hidden Markov model for single-molecule trajectories obtained experimentally. As common features, all membrane proteins that had from one to ten transmembrane regions adopted three free diffusion states with similar diffusion coefficients regardless of their structural variability. All protein species reduced their mobility similarly upon the inhibition of microtubule or actin cytoskeleton dynamics, or myosin II. The relationship between protein size and the diffusion coefficient was consistent with the Saffman-Delbrück model, meaning that membrane viscosity is a major determinant of lateral diffusion, but protein size is not. These protein species-independent properties of multistate free diffusion were explained simply and quantitatively by free diffusion on the three membrane regions with different viscosities, which is in sharp contrast to the complex diffusion behavior of transmembrane proteins in higher eukaryotes.


Subject(s)
Dictyostelium , Dictyostelium/metabolism , Membrane Proteins/metabolism , Cell Membrane/metabolism , Diffusion , Membranes/metabolism
6.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-36961311

ABSTRACT

Intra-tumor heterogeneity (ITH) is one of the major confounding factors that result in cancer relapse, and deciphering ITH is essential for personalized therapy. Single-cell DNA sequencing (scDNA-seq) now enables profiling of single-cell copy number alterations (CNAs) and thus aids in high-resolution inference of ITH. Here, we introduce an integrated framework called rcCAE to accurately infer cell subpopulations and single-cell CNAs from scDNA-seq data. A convolutional autoencoder (CAE) is employed in rcCAE to learn latent representation of the cells as well as distill copy number information from noisy read counts data. This unsupervised representation learning via the CAE model makes it convenient to accurately cluster cells over the low-dimensional latent space, and detect single-cell CNAs from enhanced read counts data. Extensive performance evaluations on simulated datasets show that rcCAE outperforms the existing CNA calling methods, and is highly effective in inferring clonal architecture. Furthermore, evaluations of rcCAE on two real datasets demonstrate that it is able to provide a more refined clonal structure, of which some details are lost in clonal inference based on integer copy numbers.


Subject(s)
DNA Copy Number Variations , Neoplasms , Humans , Sequence Analysis, DNA , Neoplasms/genetics
7.
BMC Bioinformatics ; 25(1): 247, 2024 Jul 29.
Article in English | MEDLINE | ID: mdl-39075359

ABSTRACT

BACKGROUND: Sequence alignment lies at the heart of genome sequence annotation. While the BLAST suite of alignment tools has long held an important role in alignment-based sequence database search, greater sensitivity is achieved through the use of profile hidden Markov models (pHMMs). Here, we describe an FPGA hardware accelerator, called HAVAC, that targets a key bottleneck step (SSV) in the analysis pipeline of the popular pHMM alignment tool, HMMER. RESULTS: The HAVAC kernel calculates the SSV matrix at 1739 GCUPS on a ∼  $3000 Xilinx Alveo U50 FPGA accelerator card, ∼  227× faster than the optimized SSV implementation in nhmmer. Accounting for PCI-e data transfer data processing, HAVAC is 65× faster than nhmmer's SSV with one thread and 35× faster than nhmmer with four threads, and uses ∼  31% the energy of a traditional high end Intel CPU. CONCLUSIONS: HAVAC demonstrates the potential offered by FPGA hardware accelerators to produce dramatic speed gains in sequence annotation and related bioinformatics applications. Because these computations are performed on a co-processor, the host CPU remains free to simultaneously compute other aspects of the analysis pipeline.


Subject(s)
Markov Chains , Sequence Alignment , Sequence Alignment/methods , Computational Biology/methods , Sequence Homology , Algorithms , Software
8.
BMC Bioinformatics ; 25(1): 86, 2024 Feb 28.
Article in English | MEDLINE | ID: mdl-38418970

ABSTRACT

BACKGROUND: Approximating the recent phylogeny of N phased haplotypes at a set of variants along the genome is a core problem in modern population genomics and central to performing genome-wide screens for association, selection, introgression, and other signals. The Li & Stephens (LS) model provides a simple yet powerful hidden Markov model for inferring the recent ancestry at a given variant, represented as an N × N distance matrix based on posterior decodings. RESULTS: We provide a high-performance engine to make these posterior decodings readily accessible with minimal pre-processing via an easy to use package kalis, in the statistical programming language R. kalis enables investigators to rapidly resolve the ancestry at loci of interest and developers to build a range of variant-specific ancestral inference pipelines on top. kalis exploits both multi-core parallelism and modern CPU vector instruction sets to enable scaling to hundreds of thousands of genomes. CONCLUSIONS: The resulting distance matrices accessible via kalis enable local ancestry, selection, and association studies in modern large scale genomic datasets.


Subject(s)
Genome , Genomics , Humans , Markov Chains , Haplotypes , Ethnicity , Genetics, Population
9.
BMC Bioinformatics ; 25(1): 151, 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38627634

ABSTRACT

BACKGROUND: Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS: We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS: Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.


Subject(s)
Genome , Genomics , Animals , Humans , Mice , Markov Chains , Base Composition , Probability , Algorithms
10.
Clin Infect Dis ; 78(Supplement_2): S146-S152, 2024 Apr 25.
Article in English | MEDLINE | ID: mdl-38662703

ABSTRACT

Globally, there are over 1 billion people infected with soil-transmitted helminths (STHs), mostly living in marginalized settings with inadequate sanitation in sub-Saharan Africa and Southeast Asia. The World Health Organization recommends an integrated approach to STH morbidity control through improved access to sanitation and hygiene education and the delivery of preventive chemotherapy (PC) to school-age children delivered through schools. Progress of STH control programs is currently estimated using a baseline (pre-PC) school-based prevalence survey and then monitored using periodical school-based prevalence surveys, known as Impact Assessment Surveys (IAS). We investigated whether integrating geostatistical methods with a Markov model or a mechanistic transmission model for projecting prevalence forward in time from baseline can improve IAS design strategies. To do this, we applied these 2 methods to prevalence data collected in Kenya, before evaluating and comparing their performance in accurately informing optimal survey design for a range of IAS sampling designs. We found that, although both approaches performed well, the mechanistic method more accurately projected prevalence over time and provided more accurate information for guiding survey design. Both methods performed less well in areas with persistent STH hotspots where prevalence did not decrease despite multiple rounds of PC. Our findings show that these methods can be useful tools for more efficient and accurate targeting of PC. The general framework built in this paper can also be used for projecting prevalence and informing survey design for other neglected tropical diseases.


Subject(s)
Helminthiasis , Markov Chains , Soil , Humans , Helminthiasis/epidemiology , Helminthiasis/transmission , Prevalence , Kenya/epidemiology , Soil/parasitology , Child , Helminths/isolation & purification , Animals , Models, Statistical , Adolescent , Schools
11.
Mol Biol Evol ; 40(3)2023 03 04.
Article in English | MEDLINE | ID: mdl-36661852

ABSTRACT

Novel technologies for recovering DNA information from archaeological and historical specimens have made available an ever-increasing amount of temporally spaced genetic samples from natural populations. These genetic time series permit the direct assessment of patterns of temporal changes in allele frequencies and hold the promise of improving power for the inference of selection. Increased time resolution can further facilitate testing hypotheses regarding the drivers of past selection events such as the incidence of plant and animal domestication. However, studying past selection processes through ancient DNA (aDNA) still involves considerable obstacles such as postmortem damage, high fragmentation, low coverage, and small samples. To circumvent these challenges, we introduce a novel Bayesian framework for the inference of temporally variable selection based on genotype likelihoods instead of allele frequencies, thereby enabling us to model sample uncertainties resulting from the damage and fragmentation of aDNA molecules. Also, our approach permits the reconstruction of the underlying allele frequency trajectories of the population through time, which allows for a better understanding of the drivers of selection. We evaluate its performance through extensive simulations and demonstrate its utility with an application to the ancient horse samples genotyped at the loci for coat coloration. Our results reveal that incorporating sample uncertainties can further improve the inference of selection.


Subject(s)
DNA, Ancient , DNA , Animals , Horses/genetics , Bayes Theorem , Gene Frequency , DNA/genetics , Time Factors , Models, Genetic
12.
Hum Brain Mapp ; 45(10): e26746, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-38989618

ABSTRACT

The human brain exhibits spatio-temporally complex activity even in the absence of external stimuli, cycling through recurring patterns of activity known as brain states. Thus far, brain state analysis has primarily been restricted to unimodal neuroimaging data sets, resulting in a limited definition of state and a poor understanding of the spatial and temporal relationships between states identified from different modalities. Here, we applied hidden Markov model (HMM) to concurrent electroencephalography-functional magnetic resonance imaging (EEG-fMRI) eyes open (EO) and eyes closed (EC) resting-state data, training models on the EEG and fMRI data separately, and evaluated the models' ability to distinguish dynamics between the two rest conditions. Additionally, we employed a general linear model approach to identify the BOLD correlates of the EEG-defined states to investigate whether the fMRI data could be used to improve the spatial definition of the EEG states. Finally, we performed a sliding window-based analysis on the state time courses to identify slower changes in the temporal dynamics, and then correlated these time courses across modalities. We found that both models could identify expected changes during EC rest compared to EO rest, with the fMRI model identifying changes in the activity and functional connectivity of visual and attention resting-state networks, while the EEG model correctly identified the canonical increase in alpha upon eye closure. In addition, by using the fMRI data, it was possible to infer the spatial properties of the EEG states, resulting in BOLD correlation maps resembling canonical alpha-BOLD correlations. Finally, the sliding window analysis revealed unique fractional occupancy dynamics for states from both models, with a selection of states showing strong temporal correlations across modalities. Overall, this study highlights the efficacy of using HMMs for brain state analysis, confirms that multimodal data can be used to provide more in-depth definitions of state and demonstrates that states defined across different modalities show similar temporal dynamics.


Subject(s)
Brain , Electroencephalography , Magnetic Resonance Imaging , Rest , Humans , Rest/physiology , Adult , Male , Female , Brain/diagnostic imaging , Brain/physiology , Young Adult , Brain Mapping , Markov Chains
13.
Hum Brain Mapp ; 45(14): e70011, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39327923

ABSTRACT

The temporal dynamics of resting-state networks may represent an intrinsic functional repertoire supporting cognitive control performance across the lifespan. However, little is known about brain dynamics during the preschool period, which is a sensitive time window for cognitive control development. The fast timescale of synchronization and switching characterizing cortical network functional organization gives rise to quasi-stable patterns (i.e., brain states) that recur over time. These can be inferred at the whole-brain level using hidden Markov models (HMMs), an unsupervised machine learning technique that allows the identification of rapid oscillatory patterns at the macroscale of cortical networks. The present study used an HMM technique to investigate dynamic neural reconfigurations and their associations with behavioral (i.e., parental questionnaires) and cognitive (i.e., neuropsychological tests) measures in typically developing preschoolers (4-6 years old). We used high-density EEG to better capture the fast reconfiguration patterns of the HMM-derived metrics (i.e., switching rates, entropy rates, transition probabilities and fractional occupancies). Our results revealed that the HMM-derived metrics were reliable indices of individual neural variability and differed between boys and girls. However, only brain state transition patterns toward prefrontal and default-mode brain states, predicted differences on parental-report questionnaire scores. Overall, these findings support the importance of resting-state brain dynamics as functional scaffolds for behavior and cognition. Brain state transitions may be crucial markers of individual differences in cognitive control development in preschoolers.


Subject(s)
Electroencephalography , Emotional Regulation , Humans , Male , Female , Child, Preschool , Child , Emotional Regulation/physiology , Markov Chains , Child Behavior/physiology , Child Development/physiology , Nerve Net/physiology , Nerve Net/diagnostic imaging , Parents , Brain/physiology , Brain/diagnostic imaging , Cerebral Cortex/physiology , Cerebral Cortex/diagnostic imaging
14.
Hum Brain Mapp ; 45(13): e70018, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39230193

ABSTRACT

The characterisation of resting-state networks (RSNs) using neuroimaging techniques has significantly contributed to our understanding of the organisation of brain activity. Prior work has demonstrated the electrophysiological basis of RSNs and their dynamic nature, revealing transient activations of brain networks with millisecond timescales. While previous research has confirmed the comparability of RSNs identified by electroencephalography (EEG) to those identified by magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI), most studies have utilised static analysis techniques, ignoring the dynamic nature of brain activity. Often, these studies use high-density EEG systems, which limit their applicability in clinical settings. Addressing these gaps, our research studies RSNs using medium-density EEG systems (61 sensors), comparing both static and dynamic brain network features to those obtained from a high-density MEG system (306 sensors). We assess the qualitative and quantitative comparability of EEG-derived RSNs to those from MEG, including their ability to capture age-related effects, and explore the reproducibility of dynamic RSNs within and across the modalities. Our findings suggest that both MEG and EEG offer comparable static and dynamic network descriptions, albeit with MEG offering some increased sensitivity and reproducibility. Such RSNs and their comparability across the two modalities remained consistent qualitatively but not quantitatively when the data were reconstructed without subject-specific structural MRI images.


Subject(s)
Electroencephalography , Magnetoencephalography , Nerve Net , Humans , Magnetoencephalography/methods , Electroencephalography/methods , Adult , Nerve Net/physiology , Nerve Net/diagnostic imaging , Male , Female , Young Adult , Middle Aged , Magnetic Resonance Imaging/methods , Aged , Connectome/methods , Adolescent , Brain/physiology , Brain/diagnostic imaging , Rest/physiology
15.
Hum Brain Mapp ; 45(7): e26700, 2024 May.
Article in English | MEDLINE | ID: mdl-38726799

ABSTRACT

The post-movement beta rebound has been studied extensively using magnetoencephalography (MEG) and is reliably modulated by various task parameters as well as illness. Our recent study showed that rebounds, which we generalise as "post-task responses" (PTRs), are a ubiquitous phenomenon in the brain, occurring across the cortex in theta, alpha, and beta bands. Currently, it is unknown whether PTRs following working memory are driven by transient bursts, which are moments of short-lived high amplitude activity, similar to those that drive the post-movement beta rebound. Here, we use three-state univariate hidden Markov models (HMMs), which can identify bursts without a priori knowledge of frequency content or response timings, to compare bursts that drive PTRs in working memory and visuomotor MEG datasets. Our results show that PTRs across working memory and visuomotor tasks are driven by pan-spectral transient bursts. These bursts have very similar spectral content variation over the cortex, correlating strongly between the two tasks in the alpha (R2 = .89) and beta (R2 = .53) bands. Bursts also have similar variation in duration over the cortex (e.g., long duration bursts occur in the motor cortex for both tasks), strongly correlating over cortical regions between tasks (R2 = .56), with a mean over all regions of around 300 ms in both datasets. Finally, we demonstrate the ability of HMMs to isolate signals of interest in MEG data, such that the HMM probability timecourse correlates more strongly with reaction times than frequency filtered power envelopes from the same brain regions. Overall, we show that induced PTRs across different tasks are driven by bursts with similar characteristics, which can be identified using HMMs. Given the similarity between bursts across tasks, we suggest that PTRs across the cortex may be driven by a common underlying neural phenomenon.


Subject(s)
Magnetoencephalography , Memory, Short-Term , Humans , Memory, Short-Term/physiology , Adult , Male , Female , Young Adult , Markov Chains , Psychomotor Performance/physiology , Cerebral Cortex/physiology , Movement/physiology , Beta Rhythm/physiology
16.
BMC Biotechnol ; 24(1): 2, 2024 01 10.
Article in English | MEDLINE | ID: mdl-38200466

ABSTRACT

BACKGROUND: Lytic polysaccharide monooxygenases (LPMOs) catalyzing the oxidative cleavage of different types of polysaccharides have potential to be used in various industries. However, AA13 family LPMOs which specifically catalyze starch substrates have relatively less members than AA9 and AA10 families to limit their application range. Amylase has been used in enzymatic desizing treatment of cotton fabric for semicentury which urgently need for new assistant enzymes to improve reaction efficiency and reduce cost so as to promote their application in the textile industry. RESULTS: A total of 380 unannotated new genes which probably encode AA13 family LPMOs were discovered by the Hidden Markov model scanning in this study. Ten of them have been successfully heterologous overexpressed. AlLPMO13 with the highest activity has been purified and determined its optimum pH and temperature as pH 5.0 and 50 °C. It also showed various oxidative activities on different substrates (modified corn starch > amylose > amylopectin > corn starch). The results of enzymatic textile desizing application showed that the best combination of amylase (5 g/L), AlLPMO13 (5 mg/L), and H2O2 (3 g/L) made the desizing level and the capillary effects increased by 3 grades and more than 20%, respectively, compared with the results treated by only amylase. CONCLUSION: The Hidden Markov model constructed basing on 34 AA13 family LPMOs was proved to be a valid bioinformatics tool for discovering novel starch-active LPMOs. The novel enzyme AlLPMO13 has strong development potential in the enzymatic textile industry both concerning on economy and on application effect.


Subject(s)
Hydrogen Peroxide , Starch , Humans , Polysaccharides , Amylases , Computational Biology , Mixed Function Oxygenases/genetics , Textiles
17.
Biostatistics ; 2023 Jul 11.
Article in English | MEDLINE | ID: mdl-37433567

ABSTRACT

Existing methods for fitting continuous time Markov models (CTMM) in the presence of covariates suffer from scalability issues due to high computational cost of matrix exponentials calculated for each observation. In this article, we propose an optimization technique for CTMM which uses a stochastic gradient descent algorithm combined with differentiation of the matrix exponential using a Padé approximation. This approach makes fitting large scale data feasible. We present two methods for computing standard errors, one novel approach using the Padé expansion and the other using power series expansion of the matrix exponential. Through simulations, we find improved performance relative to existing CTMM methods, and we demonstrate the method on the large-scale multiple sclerosis NO.MS data set.

18.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35284936

ABSTRACT

Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The hidden Markov model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile-profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins show that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared with the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrate that the average TM-score of VFlib was 16.00% higher than that of NNMake.


Subject(s)
Protein Folding , Proteins , Algorithms , Cluster Analysis , Databases, Protein , Protein Structure, Secondary , Proteins/chemistry
19.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35134113

ABSTRACT

Protein remote homology detection is one of the most fundamental research tool for protein structure and function prediction. Most search methods for protein remote homology detection are evaluated based on the Structural Classification of Proteins-extended (SCOPe) benchmark, but the diverse hierarchical structure relationships between the query protein and candidate proteins are ignored by these methods. In order to further improve the predictive performance for protein remote homology detection, a search framework based on the predicted protein hierarchical relationships (PHR-search) is proposed. In the PHR-search framework, the superfamily level prediction information is obtained by extracting the local and global features of the Hidden Markov Model (HMM) profile through a convolution neural network and it is converted to the fold level and class level prediction information according to the hierarchical relationships of SCOPe. Based on these predicted protein hierarchical relationships, filtering strategy and re-ranking strategy are used to construct the two-level search of PHR-search. Experimental results show that the PHR-search framework achieves the state-of-the-art performance by employing five basic search methods, including HHblits, JackHMMER, PSI-BLAST, DELTA-BLAST and PSI-BLASTexB. Furthermore, the web server of PHR-search is established, which can be accessed at http://bliulab.net/PHR-search.


Subject(s)
Algorithms , Proteins , Proteins/chemistry , Sequence Analysis, Protein/methods
20.
J Transl Med ; 22(1): 763, 2024 Aug 14.
Article in English | MEDLINE | ID: mdl-39143498

ABSTRACT

BACKGROUD: Temporal lobe epilepsy (TLE) is associated with abnormal dynamic functional connectivity patterns, but the dynamic changes in brain activity at each time point remain unclear, as does the potential molecular mechanisms associated with the dynamic temporal characteristics of TLE. METHODS: Resting-state functional magnetic resonance imaging (rs-fMRI) was acquired for 84 TLE patients and 35 healthy controls (HCs). The data was then used to conduct HMM analysis on rs-fMRI data from TLE patients and an HC group in order to explore the intricate temporal dynamics of brain activity in TLE patients with cognitive impairment (TLE-CI). Additionally, we aim to examine the gene expression profiles associated with the dynamic modular characteristics in TLE patients using the Allen Human Brain Atlas (AHBA) database. RESULTS: Five HMM states were identified in this study. Compared with HCs, TLE and TLE-CI patients exhibited distinct changes in dynamics, including fractional occupancy, lifetimes, mean dwell time and switch rate. Furthermore, transition probability across HMM states were significantly different between TLE and TLE-CI patients (p < 0.05). The temporal reconfiguration of states in TLE and TLE-CI patients was associated with several brain networks (including the high-order default mode network (DMN), subcortical network (SCN), and cerebellum network (CN). Furthermore, a total of 1580 genes were revealed to be significantly associated with dynamic brain states of TLE, mainly enriched in neuronal signaling and synaptic function. CONCLUSIONS: This study provides new insights into characterizing dynamic neural activity in TLE. The brain network dynamics defined by HMM analysis may deepen our understanding of the neurobiological underpinnings of TLE and TLE-CI, indicating a linkage between neural configuration and gene expression in TLE.


Subject(s)
Epilepsy, Temporal Lobe , Magnetic Resonance Imaging , Markov Chains , Humans , Epilepsy, Temporal Lobe/genetics , Epilepsy, Temporal Lobe/physiopathology , Epilepsy, Temporal Lobe/diagnostic imaging , Male , Female , Adult , Brain/diagnostic imaging , Brain/physiopathology , Gene Expression Regulation , Case-Control Studies , Young Adult , Middle Aged , Rest/physiology , Nerve Net/physiopathology , Nerve Net/diagnostic imaging
SELECTION OF CITATIONS
SEARCH DETAIL