Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Front Genet ; 13: 883766, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35571042

RESUMO

Hypertension or elevated blood pressure is a serious medical condition that significantly increases the risks of cardiovascular disease, heart disease, diabetes, stroke, kidney disease, and other health problems, that affect people worldwide. Thus, hypertension is one of the major global causes of premature death. Regarding the prevention and treatment of hypertension with no or few side effects, antihypertensive peptides (AHTPs) obtained from natural sources might be useful as nutraceuticals. Therefore, the search for alternative/novel AHTPs in food or natural sources has received much attention, as AHTPs may be functional agents for human health. AHTPs have been observed in diverse organisms, although many of them remain underinvestigated. The identification of peptides with antihypertensive activity in the laboratory is time- and resource-consuming. Alternatively, computational methods based on robust machine learning can identify or screen potential AHTP candidates prior to experimental verification. In this paper, we propose Ensemble-AHTPpred, an ensemble machine learning algorithm composed of a random forest (RF), a support vector machine (SVM), and extreme gradient boosting (XGB), with the aim of integrating diverse heterogeneous algorithms to enhance the robustness of the final predictive model. The selected feature set includes various computed features, such as various physicochemical properties, amino acid compositions (AACs), transitions, n-grams, and secondary structure-related information; these features are able to learn more information in terms of analyzing or explaining the characteristics of the predicted peptide. In addition, the tool is integrated with a newly proposed composite feature (generated based on a logistic regression function) that combines various feature aspects to enable improved AHTP characterization. Our tool, Ensemble-AHTPpred, achieved an overall accuracy above 90% on independent test data. Additionally, the approach was applied to novel experimentally validated AHTPs, obtained from recent studies, which did not overlap with the training and test datasets, and the tool could precisely predict these AHTPs.

2.
Biology (Basel) ; 10(9)2021 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-34571762

RESUMO

Microbial lipid production with cost effectiveness is a prerequisite for the oleochemical sector. In this work, genome-wide transcriptional responses on the utilization of xylose and glucose in oleaginous Aspergillus oryzae were studied with relation to growth and lipid phenotypic traits. Comparative analysis of the active growth (t1) and lipid-accumulating (t2) stages showed that the C5 cultures efficiently consumed carbon sources for biomass and lipid production comparable to the C6 cultures. By pairwise comparison, 599 and 917 differentially expressed genes (DEGs) were identified in the t1 and t2 groups, respectively, in which the consensus DEGs were categorized into polysaccharide-degrading enzymes, membrane transports, and cellular processes. A discrimination in transcriptional responses of DEGs set was also found in various metabolic genes, mostly in carbohydrate, amino acid, lipid, cofactors, and vitamin metabolisms. Although central carbohydrate metabolism was shared among the C5 and C6 cultures, the metabolic functions in acetyl-CoA and NADPH generation, and biosynthesis of terpenoid backbone, fatty acid, sterol, and amino acids were allocated for leveraging biomass and lipid production through at least transcriptional control. This study revealed robust metabolic networks in the oleaginicity of A. oryzae governing glucose/xylose flux toward lipid biosynthesis that provides meaningful hints for further process developments of microbial lipid production using cellulosic sugar feedstocks.

3.
Life (Basel) ; 11(4)2021 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-33808227

RESUMO

The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10-14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.

4.
Genes (Basel) ; 12(2)2021 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-33494403

RESUMO

Antimicrobial peptides (AMPs) are natural peptides possessing antimicrobial activities. These peptides are important components of the innate immune system. They are found in various organisms. AMP screening and identification by experimental techniques are laborious and time-consuming tasks. Alternatively, computational methods based on machine learning have been developed to screen potential AMP candidates prior to experimental verification. Although various AMP prediction programs are available, there is still a need for improvement to reduce false positives (FPs) and to increase the predictive accuracy. In this work, several well-known single and ensemble machine learning approaches have been explored and evaluated based on balanced training datasets and two large testing datasets. We have demonstrated that the developed program with various predictive models has high performance in differentiating between AMPs and non-AMPs. Thus, we describe the development of a program for the prediction and recognition of AMPs using MaxProbVote, which is an ensemble model. Moreover, to increase prediction efficiency, the ensemble model was integrated with a new hybrid feature based on logistic regression. The ensemble model integrated with the hybrid feature can effectively increase the prediction sensitivity of the developed program called Ensemble-AMPPred, resulting in overall improvements in terms of both sensitivity and specificity compared to those of currently available programs.


Assuntos
Peptídeos Catiônicos Antimicrobianos/farmacologia , Bases de Dados Genéticas , Aprendizado de Máquina , Software , Algoritmos , Peptídeos Catiônicos Antimicrobianos/química , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
5.
Sci Rep ; 10(1): 10241, 2020 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-32581273

RESUMO

The safety of microbial cultures utilized for consumption is vital for public health and should be thoroughly assessed. Although general aspects on the safety assessment of microbial cultures have been suggested, no methodological detail nor procedural guideline have been published. Herein, we propose a detailed protocol on microbial strain safety assessment via whole-genome sequence analysis. A starter culture employed in traditional fermented pork production, nham, namely Lactobacillus plantarum BCC9546, was used as an example. The strain's whole-genome was sequenced through several next-generation sequencing techniques. Incomplete plasmid information from the PacBio sequencing platform and shorter chromosome size from the hybrid Oxford Nanopore-Illumina platform were noted. The methods for 1) unambiguous species identification using 16S rRNA gene and average nucleotide identity, 2) determination of virulence factors and undesirable genes, 3) determination of antimicrobial resistance properties and their possibility of transfer, and 4) determination of antimicrobial drug production capability of the strain were provided in detail. Applicability of the search tools and limitations of databases were discussed. Finally, a procedural guideline for the safety assessment of microbial strains via whole-genome analysis was proposed.


Assuntos
Alimentos Fermentados/microbiologia , Lactobacillus plantarum/classificação , Lactobacillus plantarum/crescimento & desenvolvimento , Sequenciamento Completo do Genoma/métodos , Técnicas Bacteriológicas , Inocuidade dos Alimentos , Tamanho do Genoma , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Lactobacillus plantarum/genética , Plasmídeos/genética , RNA Ribossômico 16S/genética
6.
Genes (Basel) ; 11(4)2020 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-32231066

RESUMO

Long non-coding RNAs (lncRNAs) play important roles in the regulation of complex cellular processes, including transcriptional and post-transcriptional regulation of gene expression relevant for development and stress response, among others. Compared to other important crops, there is limited knowledge of cassava lncRNAs and their roles in abiotic stress adaptation. In this study, we performed a genome-wide study of ncRNAs in cassava, integrating genomics- and transcriptomics-based approaches. In total, 56,840 putative ncRNAs were identified, and approximately half the number were verified using expression data or previously known ncRNAs. Among these were 2229 potential novel lncRNA transcripts with unmatched sequences, 250 of which were differentially expressed in cold or drought conditions, relative to controls. We showed that lncRNAs might be involved in post-transcriptional regulation of stress-induced transcription factors (TFs) such as zinc-finger, WRKY, and nuclear factor Y gene families. These findings deepened our knowledge of cassava lncRNAs and shed light on their stress-responsive roles.


Assuntos
Secas , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Manihot/genética , Proteínas de Plantas/genética , RNA Longo não Codificante/genética , Estresse Fisiológico , Transcriptoma , Estudo de Associação Genômica Ampla , Manihot/fisiologia
7.
Gene ; 741: 144559, 2020 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-32169630

RESUMO

The fungi in order Mortierellales are attractive producers for long-chain polyunsaturated fatty acids (PUFAs). Here, the genome sequencing and assembly of a novel strain of Mortierella sp. BCC40632 were done, yielding 65 contigs spanning of 49,964,116 total bases with predicted 12,149 protein-coding genes. We focused on the acetyl-CoA in relevant to its derived metabolic pathways for biosynthesis of macromolecules with biological functions, including PUFAs, eicosanoids and carotenoids. By comparative genome analysis between Mortierellales and Mucorales, the signature genetic characteristics of the arachidonic acid-producing strains, including Δ5-desaturase and GLELO-like elongase, were also identified in the strain BCC40632. Remarkably, this fungal strain contained only n-6 pathway of PUFA biosynthesis due to the absence of Δ15-desaturase or ω3-desaturase gene in contrast to other Mortierella species. Four putative enzyme sequences in the eicosanoid biosynthetic pathways were identified in the strain BCC40632 and others Mortierellale fungi, but were not detected in the Mucorales. Another unique metabolic trait of the Mortierellales was the inability in carotenoid synthesis as a result of the lack of phytoene synthase and phytoene desaturase genes. The findings provide a perspective in strain optimization for production of tailored-made products with industrial applications.


Assuntos
Acetilcoenzima A/biossíntese , Ácido Araquidônico/genética , Genoma Fúngico/genética , Mortierella/metabolismo , Acetilcoenzima A/genética , Ácido Araquidônico/biossíntese , Vias Biossintéticas/genética , Ácidos Graxos Dessaturases/genética , Elongases de Ácidos Graxos/genética , Ácidos Graxos Insaturados/genética , Ácidos Graxos Insaturados/metabolismo , Mortierella/genética , Mucorales/genética , Mucorales/metabolismo , Ácido gama-Linolênico/genética , Ácido gama-Linolênico/metabolismo
8.
Biomed Res Int ; 2019: 5617153, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31886228

RESUMO

Several computational approaches for predicting subcellular localization have been developed and proposed. These approaches provide diverse performance because of their different combinations of protein features, training datasets, training strategies, and computational machine learning algorithms. In some cases, these tools may yield inconsistent and conflicting prediction results. It is important to consider such conflicting or contradictory predictions from multiple prediction programs during protein annotation, especially in the case of a multiclass classification problem such as subcellular localization. Hence, to address this issue, this work proposes the use of the particle swarm optimization (PSO) algorithm to combine the prediction outputs from multiple different subcellular localization predictors with the aim of integrating diverse prediction models to enhance the final predictions. Herein, we present PSO-LocBact, a consensus classifier based on PSO that can be used to combine the strengths of several preexisting protein localization predictors specially designed for bacteria. Our experimental results indicate that the proposed method can resolve inconsistency problems in subcellular localization prediction for both Gram-negative and Gram-positive bacterial proteins. The average accuracy achieved on each test dataset is over 98%, higher than that achieved with any individual predictor.


Assuntos
Proteínas de Bactérias/classificação , Biologia Computacional/métodos , Espaço Intracelular/química , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Algoritmos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Consenso
9.
Cancers (Basel) ; 11(7)2019 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-31336886

RESUMO

Colorectal adenomas are precursor lesions of colorectal adenocarcinoma. The transition from adenoma to carcinoma in patients with colorectal cancer (CRC) has been associated with an accumulation of genetic aberrations. However, criteria that can screen adenoma progression to adenocarcinoma are still lacking. This present study is the first attempt to identify genetic aberrations, such as the somatic mutations, copy number variations (CNVs), and high-frequency mutated genes, found in Thai patients. In this study, we identified the genomic abnormality of two sample groups. In the first group, five cases matched normal-colorectal adenoma-colorectal adenocarcinoma. In the second group, six cases matched normal-colorectal adenomas. For both groups, whole-exome sequencing was performed. We compared the genetic aberration of the two sample groups. In both normal tissues compared with colorectal adenoma and colorectal adenocarcinoma analyses, somatic mutations were observed in the tumor suppressor gene APC (Adenomatous polyposis coli) in eight out of ten patients. In the group of normal tissue comparison with colorectal adenoma tissue, somatic mutations were also detected in Catenin Beta 1 (CTNNB1), Family With Sequence Similarity 123B (FAM123B), F-Box And WD Repeat Domain Containing 7 (FBXW7), Sex-Determining Region Y-Box 9 (SOX9), Low-Density Lipoprotein Receptor-Related Protein 5 (LRP5), Frizzled Class Receptor 10 (FZD10), and AT-Rich Interaction Domain 1A (ARID1A) genes, which are involved in the Wingless-related integration site (Wnt) signaling pathway. In the normal tissue comparison with colorectal adenocarcinoma tissue, Kirsten retrovirus-associated DNA sequences (KRAS), Tumor Protein 53 (TP53), and Ataxia-Telangiectasia Mutated (ATM) genes are found in the receptor tyrosine kinase-RAS (RTK-RAS) signaling pathway and p53 signaling pathway, respectively. These results suggest that APC and TP53 may act as a potential screening marker for colorectal adenoma and early-stage CRC. This preliminary study may help identify patients with adenoma and early-stage CRC and may aid in establishing prevention and surveillance strategies to reduce the incidence of CRC.

10.
Biomed Res Int ; 2019: 2019846, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31321230

RESUMO

MicroRNAs are small noncoding RNAs, involved in the regulation of many cellular processes in plants. Hundreds of miRNAs have been identified in cassava by various techniques, yet these identifications were constrained by a lack of miRNA templates and the narrow range of conditions in transcriptome study. In this research, we conducted genome-wide analysis identification, whereby miRNAs from cassava genome were thoroughly screened using bioinformatics approach independent of predefined templates and studied conditions. Our work provided a catalog of putative mature miRNAs and explored the landscape of miRNAome in cassava. These putative miRNAs were validated using statistical analysis as well as available cassava expression data. We showed that the crowded locations of cassava miRNAs are consistent with other plants and animals and hypothesized to have the same evolutionary origin. At least 10 conserved miRNAs were identified in cassava based on the comparative study of miRNA conservation. Finally, investigation of miRNAs and target gene relationships enabled us to envisage the complexities of cellular regulatory systems modulated at posttranscriptional level.


Assuntos
Biologia Computacional , Manihot/genética , MicroRNAs/genética , Estresse Fisiológico/genética , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas/genética , Genoma de Planta/genética , Manihot/crescimento & desenvolvimento , Transcriptoma/genética
11.
Curr Microbiol ; 75(1): 57-70, 2018 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28865010

RESUMO

The selected robust fungus, Aspergillus oryzae strain BCC7051 is of interest for biotechnological production of lipid-derived products due to its capability to accumulate high amount of intracellular lipids using various sugars and agro-industrial substrates. Here, we report the genome sequence of the oleaginous A. oryzae BCC7051. The obtained reads were de novo assembled into 25 scaffolds spanning of 38,550,958 bps with predicted 11,456 protein-coding genes. By synteny mapping, a large rearrangement was found in two scaffolds of A. oryzae BCC7051 as compared to the reference RIB40 strain. The genetic relationship between BCC7051 and other strains of A. oryzae in terms of aflatoxin production was investigated, indicating that the A. oryzae BCC7051 was categorized into group 2 nonaflatoxin-producing strain. Moreover, a comparative analysis of the structural genes focusing on the involvement in lipid metabolism among oleaginous yeast and fungi revealed the presence of multiple isoforms of metabolic enzymes responsible for fatty acid synthesis in BCC7051. The alternative routes of acetyl-CoA generation as oleaginous features and malate/citrate/pyruvate shuttle were also identified in this A. oryzae strain. The genome sequence generated in this work is a dedicated resource for expanding genome-wide study of microbial lipids at systems level, and developing the fungal-based platform for production of diversified lipids with commercial relevance.


Assuntos
Aspergillus oryzae/genética , Aspergillus oryzae/metabolismo , Genoma Fúngico , Lipídeos/biossíntese , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Malatos/metabolismo , Sintenia
12.
Adv Biochem Eng Biotechnol ; 160: 121-141, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27783133

RESUMO

To understand how biological processes work, it is necessary to explore the systematic regulation governing the behaviour of the processes. Not only driving the normal behavior of organisms, the systematic regulation evidently underlies the temporal responses to surrounding environments (dynamics) and long-term phenotypic adaptation (evolution). The systematic regulation is, in effect, formulated from the regulatory components which collaboratively work together as a network. In the drive to decipher such a code of lives, a spectrum of technologies has continuously been developed in the post-genomic era. With current advances, high-throughput sequencing technologies are tremendously powerful for facilitating genomics and systems biology studies in the attempt to understand system regulation inside the cells. The ability to explore relevant regulatory components which infer transcriptional and signaling regulation, driving core cellular processes, is thus enhanced. This chapter reviews high-throughput sequencing technologies, including second and third generation sequencing technologies, which support the investigation of genomics and transcriptomics data. Utilization of this high-throughput data to form the virtual network of systems regulation is explained, particularly transcriptional regulatory networks. Analysis of the resulting regulatory networks could lead to an understanding of cellular systems regulation at the mechanistic and dynamics levels. The great contribution of the biological networking approach to envisage systems regulation is finally demonstrated by a broad range of examples.


Assuntos
Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Genéticos , Proteoma/genética , Animais , Biologia Computacional/métodos , Simulação por Computador , Humanos
13.
World J Microbiol Biotechnol ; 32(7): 122, 2016 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-27263017

RESUMO

Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.


Assuntos
Biologia Computacional/métodos , Lipase/metabolismo , Metabolismo dos Lipídeos/genética , Lipólise/genética , Animais , Bactérias/enzimologia , Bactérias/genética , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Bases de Dados Factuais , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Fungos/enzimologia , Fungos/genética , Genoma Microbiano , Humanos , Lipase/química , Lipase/genética , Metagenômica/métodos , Homologia de Sequência do Ácido Nucleico
14.
Microbiology (Reading) ; 161(8): 1613-1626, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26271808

RESUMO

Lipases are interesting enzymes, which contribute important roles in maintaining lipid homeostasis and cellular metabolisms. Using available genome data, seven lipase families of oleaginous and non-oleaginous yeast and fungi were categorized based on the similarity of their amino acid sequences and conserved structural domains. Of them, triacylglycerol lipase (patatin-domain-containing protein) and steryl ester hydrolase (abhydro_lipase-domain-containing protein) families were ubiquitous enzymes found in all species studied. The two essential lipases rendered signature characteristics of integral membrane proteins that might be targeted to lipid monolayer particles. At least one of the extracellular lipase families existed in each species of yeast and fungi. We found that the diversity of lipase families and the number of genes in individual families of oleaginous strains were greater than those identified in non-oleaginous species, which might play a role in nutrient acquisition from surrounding hydrophobic substrates and attribute to their obese phenotype. The gene/enzyme catalogue and relevant informative data of the lipases provided by this study are not only valuable toolboxes for investigation of the biological role of these lipases, but also convey potential in various industrial applications.


Assuntos
Proteínas Fúngicas/genética , Fungos/enzimologia , Genoma Fúngico , Lipase/genética , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Fungos/química , Fungos/genética , Microbiologia Industrial , Lipase/química , Lipase/metabolismo , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Alinhamento de Sequência
15.
Nucleic Acids Res ; 42(11): e93, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24771344

RESUMO

To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features-structure, sequence, modularity, structural robustness and coding potential-to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.


Assuntos
Algoritmos , RNA Longo não Codificante/genética , Pequeno RNA não Traduzido/genética , Classificação/métodos , Genoma Bacteriano , Genômica , Humanos , Modelos Logísticos , RNA não Traduzido/classificação , RNA não Traduzido/genética
16.
Microbiology (Reading) ; 159(Pt 12): 2548-2557, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24065718

RESUMO

Malic enzyme (ME) is one of the important enzymes for furnishing the cofactor NAD(P)H for the biosynthesis of fatty acids and sterols. Due to the existence of multiple ME isoforms in a range of oleaginous microbes, a molecular basis for the evolutionary relationships amongst the enzymes in oleaginous fungi was investigated using sequence analysis and structural modelling. Evolutionary distance and structural characteristics were used to discriminate the MEs of yeasts and fungi into several groups. Interestingly, the NADP(+)-dependent MEs of Mucoromycotina had an unusual insertion region (FLxxPG) that was not found in other fungi. However, the subcellular compartment of the Mucoromycotina enzyme could not be clearly identified by an analysis of signal peptide sequences. A constructed structural model of the ME of Mucor circinelloides suggested that the insertion region is located at the N-terminus of the enzyme (aa 159-163). In addition, it is presumably part of the dimer interface region of the enzyme, which might provide a continuously positively charged pocket for the efficient binding of negatively charged effector molecules. The discovery of the unique structure of the Mucoromycotina ME suggests the insertion region could be involved in particular kinetics of this enzyme, which may indicate its involvement in the lipogenesis of industrially important oleaginous microbes.


Assuntos
Evolução Molecular , Fungos/enzimologia , Malato Desidrogenase (NADP+)/genética , Fungos/genética , Malato Desidrogenase (NADP+)/química , Malato Desidrogenase (NADP+)/classificação , Modelos Moleculares , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
17.
Int J Data Min Bioinform ; 7(2): 118-34, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23777171

RESUMO

Non-coding RNAs (ncRNAs) have important biological functions in living cells dependent on their conserved secondary structures. Here, we focus on computational RNA secondary structure prediction by exploring primary sequences and complementary base pair interactions using the Conditional Random Fields (CRFs) model, which treats RNA prediction as a sequence labelling problem. Proposing suitable feature extraction from known RNA secondary structures, we developed a feature extraction based on natural RNA's loop and stem characteristics. Our CRFs models can predict the secondary structures of the test RNAs with optimal F-score prediction between 56.61 and 98.20% for different RNA families.


Assuntos
Conformação de Ácido Nucleico , RNA/química , Pareamento de Bases , Biologia Computacional , RNA não Traduzido/química , Alinhamento de Sequência , Análise de Sequência de RNA
18.
Nucleic Acids Res ; 41(1): e21, 2013 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-23012261

RESUMO

An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods--both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method--improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%-this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html.


Assuntos
Algoritmos , MicroRNAs/classificação , Precursores de RNA/classificação , Pareamento de Bases , Humanos , MicroRNAs/química , Precursores de RNA/química , RNA de Plantas/química , RNA de Plantas/classificação , RNA Viral/química , RNA Viral/classificação , Sensibilidade e Especificidade
19.
Stand Genomic Sci ; 6(1): 43-53, 2012 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-22675597

RESUMO

Arthrospira platensis is a cyanobacterium that is extensively cultivated outdoors on a large commercial scale for consumption as a food for humans and animals. It can be grown in monoculture under highly alkaline conditions, making it attractive for industrial production. Here we describe the complete genome sequence of A. platensis C1 strain and its annotation. The A. platensis C1 genome contains 6,089,210 bp including 6,108 protein-coding genes and 45 RNA genes, and no plasmids. The genome information has been used for further comparative analysis, particularly of metabolic pathways, photosynthetic efficiency and barriers to gene transfer.

20.
Microbiology (Reading) ; 158(Pt 1): 217-228, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22016567

RESUMO

For a bio-based economy, microbial lipids offer a potential solution as alternative feedstocks in the oleochemical industry. The existing genome data for the promising strains, oleaginous yeasts and fungi, allowed us to investigate candidate orthologous sequences that participate in their oleaginicity. Comparative genome analysis of the non-oleaginous (Saccharomyces cerevisiae, Candida albicans and Ashbya gossypii) and oleaginous strains (Yarrowia lipolytica, Rhizopus oryzae, Aspergillus oryzae and Mucor circinelloides) showed that 209 orthologous protein sequences of the oleaginous microbes were distributed over several processes of the cells. Based on the 41 sequences categorized by metabolism, putative routes potentially involved in the generation of precursors for fatty acid and lipid synthesis, particularly acetyl-CoA, were then identified that were not present in the non-oleaginous strains. We found a set of the orthologous oleaginous proteins that was responsible for the biosynthesis of this key two-carbon metabolite through citrate catabolism, fatty acid ß-oxidation, leucine metabolism and lysine degradation. Our findings suggest a relationship between carbohydrate, lipid and amino acid metabolism in the biosynthesis of acetyl-CoA, which contributes to the lipid production of oleaginous microbes.


Assuntos
Acetilcoenzima A/biossíntese , Fungos/genética , Genômica , Metabolismo dos Lipídeos , Leveduras/genética , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Fungos/metabolismo , Leveduras/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA