Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.482
Filtrar
1.
Microbiol Res ; 227: 126309, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31421713

RESUMO

The phosphorus availability in soil ranged from <0.01 to 1 ppm and found limiting for the utilization by plants. Hence, phosphate solubilizing bacteria (PSB) proficiently fulfill the phosphorus requirement of plants in an eco-friendly manner. The PSB encounter dynamic and challenging environmental conditions viz., high temperature, osmotic, acid, and climatic changes often hamper their activity and proficiency. The modern trend is shifting from isolation of the PSB to their genetic potentials and genome annotation not only for their better performance in the field trials but also to study their ability to cope up with stresses. In order to withstand environmental stress, bacteria need to restructure its metabolic network to ensure its survival. Pi starving condition response regulator (PhoB) and the mediator of stringent stress response alarmone (p)ppGpp known to regulate the global regulatory network of bacteria to provide balanced physiology under various stress condition. The current review discusses the global regulation and crosstalk of genes involved in phosphorus homeostasis, solubilization, and various stress response to fine tune the bacterial physiology. The knowledge of these network crosstalk help bacteria to respond efficiently to the challenging environmental parameters, and their physiological plasticity lead us to develop proficient long-lasting consortia for plant growth promotion.


Assuntos
Fenômenos Fisiológicos Bacterianos , Proteínas de Bactérias/genética , Regulação Bacteriana da Expressão Gênica , Guanosina Pentafosfato/metabolismo , Estresse Fisiológico , Bactérias/genética , Plasticidade Celular , Redes Reguladoras de Genes , Homeostase , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Nitrogênio , Fosfatos/metabolismo , Desenvolvimento Vegetal , Plantas , Solo , Estresse Fisiológico/genética
2.
BMC Bioinformatics ; 20(1): 414, 2019 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-31387525

RESUMO

BACKGROUND: R-loops are three-stranded nucleic acid structures that usually form during transcription and that may lead to gene regulation or genome instability. DRIP (DNA:RNA Immunoprecipitation)-seq techniques are widely used to map R-loops genome-wide providing insights into R-loop biology. However, annotation of DRIP-seq peaks to genes can be a tricky step, due to the lack of strand information when using the common basic DRIP technique. RESULTS: Here, we introduce DRIP-seq Optimized Peak Annotator (DROPA), a new tool for gene annotation of R-loop peaks based on gene expression information. DROPA allows a full customization of annotation options, ranging from the choice of reference datasets to gene feature definitions. DROPA allows to assign R-loop peaks to the DNA template strand in gene body with a false positive rate of less than 7%. A comparison of DROPA performance with three widely used annotation tools show that it identifies less false positive annotations than the others. CONCLUSIONS: DROPA is a fully customizable peak-annotation tool optimized for co-transcriptional DRIP-seq peaks, which allows a finest gene annotation based on gene expression information. Its output can easily be integrated into pipelines to perform downstream analyses, while useful and informative summary plots and statistical enrichment tests can be produced.


Assuntos
DNA/metabolismo , Imunoprecipitação , Anotação de Sequência Molecular , RNA/metabolismo , Software , Sequência de Bases , DNA/genética , Regulação da Expressão Gênica , RNA/genética
3.
Bioengineered ; 10(1): 345-352, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31411110

RESUMO

This study aimed to detect serum miR-203 expression levels in AML and explore its potential clinical significance. Quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) was performed to measure the serum miR-203 levels in 134 patients with AML and 70 healthy controls. The results demonstrated that serum miR-203 expression was significantly reduced in AML patients compared with healthy controls. Receiver operating characteristic curve (ROC) analysis revealed miR-203 could distinguish AML cases from normal controls. Low serum miR-203 levels were associated with worse clinical features, as well as poorer overall survival and relapse free survival of AML patients. Moreover, multivariate analysis confirmed low serum miR-203 expression to be an independent unfavorable prognostic predictor for AML. The bioinformatics analysis showed that the downstream genes and pathways of miR-203 was closely associated with tumorigenesis. Downregulation of miR-203 in AML cell lines upregulated the expression levels of oncogenic promoters such as CREB1, SRC and HDAC1. Thus, these findings demonstrated that serum miR-203 might be a promising biomarker for the diagnosis and prognosis of AML.


Assuntos
Biomarcadores Tumorais/genética , Carcinogênese/genética , Regulação Leucêmica da Expressão Gênica , Leucemia Mieloide Aguda/genética , MicroRNAs/genética , Proteínas de Neoplasias/genética , Antagomirs/genética , Antagomirs/metabolismo , Biomarcadores Tumorais/sangue , Carcinogênese/metabolismo , Carcinogênese/patologia , Estudos de Casos e Controles , Linhagem Celular Tumoral , Biologia Computacional/métodos , Proteína de Ligação ao Elemento de Resposta ao AMP Cíclico/sangue , Proteína de Ligação ao Elemento de Resposta ao AMP Cíclico/genética , Perfilação da Expressão Gênica , Ontologia Genética , Histona Desacetilase 1/sangue , Histona Desacetilase 1/genética , Humanos , Leucemia Mieloide Aguda/sangue , Leucemia Mieloide Aguda/mortalidade , Leucemia Mieloide Aguda/patologia , MicroRNAs/antagonistas & inibidores , MicroRNAs/sangue , Anotação de Sequência Molecular , Análise Multivariada , Proteínas de Neoplasias/sangue , Prognóstico , Curva ROC , Recidiva , Transdução de Sinais , Análise de Sobrevida , Quinases da Família src/sangue , Quinases da Família src/genética
4.
DNA Cell Biol ; 38(8): 824-839, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31295023

RESUMO

Tea plant is an important economic crop on a global scale. Its yield and quality are affected by abiotic stress. The calcineurin B-like protein (CBL) and CBL-interacting protein kinase (CIPK) family genes play irreplaceable roles in plant development and stress resistance. More and more CBL-CIPK genes have been identified, but a few CBL-CIPK genes have been cloned and characterized in tea plants. In this study, 7 CsCBLs and 18 CsCIPKs were identified based on the tea plant genome. Physicochemical properties, phylogenetic, conserved motifs, gene structure, homologous gene network, and promoter upstream elements of these 25 genes were analyzed. Conserved motifs of these genes varied with phylogenetic tree node. From the genetic structure, members of the tea plant CIPK gene family can be divided into two types: intron rich and no intron. Many stress-related elements were found in the 2000 bp upstream of the promoter, and PlantCARE predicted that CsCBL4 contained 30 stress-related elements. PlantPAN2 shows that CsCIPK6 contains 48 ABRELATERD1; CsCIPK17 contains 37 GT1CONSENSUS; CsCIPK3 contains 64 MYBCOREATCYCB1; CsCBL3 contains 52 SORLIP1AT; CsCBL5 contains 65 SURECOREATSULTR11; and CsCIPK11 contains 83 WBOXATNPR1. In addition, eight genes were selected for quantitative real-time PCR (RT-qPCR) to detect their expression profiles under high-temperature, low-temperature, salt, and drought treatments. These genes were found to be responsive to one or more abiotic stress treatments. The expression levels of CsCBL4, CsCIPK2, and CsCIPK14 were similar, and they were homologous to AtSOS3 and AtSIP3 and AtSIP4 in Arabidopsis, which were involved in the SOS pathway. This study provides insight into the potential functions of the CsCBL and CsCIPK of tea plant.


Assuntos
Camellia sinensis/genética , Regulação da Expressão Gênica de Plantas , Proteínas de Plantas/genética , Proteínas Serina-Treonina Quinases/genética , Motivos de Aminoácidos , Sequência de Aminoácidos , Arabidopsis/genética , Proteínas de Ligação ao Cálcio/genética , Camellia sinensis/fisiologia , Sequência Conservada , Secas , Evolução Molecular , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Anotação de Sequência Molecular , Filogenia , Proteínas de Plantas/metabolismo , Estresse Fisiológico/genética
5.
Gene ; 712: 143962, 2019 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-31288057

RESUMO

Veratrum nigrum is protected plant of Melanthiaceae family, able to synthetize unique steroidal alkaloids important for pharmacy. Transcriptomes from leaves, stems and rhizomes of in vitro maintained V. nigrum plants were sequenced and annotated for genes and markers discovery. Sequencing of samples derived from the different organs resulted in a total of 108,511 contigs with a mean length of 596 bp. Transcripts derived from leaf and stalk were annotated at 28%, and 38% in Nr nucleotide database, respectively. The sequencing revealed 949 unigenes related with lipid metabolism, including 73 transcripts involved in steroids and genus-specific steroid alkaloids biosynthesis. Additionally, 3203 candidate SSRs markers we identified in unigenes with average density of one SSR locus every 6.2 kb sequence. Unraveling of biochemical machinery of the pathway responsible for steroidal alkaloids will open possibility to design and optimize biotechnological process. The transcriptomic data provide valuable resources for biochemical, molecular genetics, comparative transcriptomics, functional genomics, ecological and evolutionary studies of V. nigrum.


Assuntos
Alcaloides/biossíntese , Regulação da Expressão Gênica de Plantas , Esteroides/biossíntese , Transcriptoma , Veratrum/metabolismo , Mapeamento de Sequências Contíguas , DNA Complementar/metabolismo , Biblioteca Gênica , Ontologia Genética , Marcadores Genéticos , Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Anotação de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos , Folhas de Planta/metabolismo , Proteínas de Plantas/metabolismo , Raízes de Plantas/metabolismo , Análise de Sequência de RNA
6.
Nat Commun ; 10(1): 3100, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31308405

RESUMO

Of the 473 genes in the genome of the bacterium with the smallest genome generated to date, 149 genes have unknown function, emphasising a universal problem; less than 1% of proteins have experimentally determined annotations. Here, we combine the results from state-of-the-art in silico methods for functional annotation and assign functions to 66 of the 149 proteins. Proteins that are still not annotated lack orthologues, lack protein domains, and/ or are membrane proteins. Twenty-four likely transporter proteins are identified indicating the importance of nutrient uptake into and waste disposal out of the minimal bacterial cell in a nutrient-rich environment after removal of metabolic enzymes. Hence, the environment shapes the nature of a minimal genome. Our findings also show that the combination of multiple different state-of-the-art in silico methods for annotating proteins is able to predict functions, even for difficult to characterise proteins and identify crucial gaps for further development.


Assuntos
Adaptação Biológica/genética , Bactérias/genética , Genoma Bacteriano/genética , Biologia Computacional/métodos , Genes Essenciais/genética , Anotação de Sequência Molecular/métodos , Software
7.
Arch Virol ; 164(10): 2599-2603, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31278422

RESUMO

This work describes the characterization and genome annotation of a new lytic Enterococcus faecalis siphovirus, vB_EfaS_AL3 (referred to as AL3), isolated from wastewater samples collected in Liaoning Province, China. The genome of phage AL3 is composed of linear double-stranded DNA that is 40,789 bp in length with a G + C content of 34.84% and 61 putative protein-coding genes. Phylogenetic and comparative genomic analyses indicate that phage AL3 should be considered a novel phage.


Assuntos
Bacteriófagos/genética , Enterococcus faecalis/virologia , Genoma Viral , Filogenia , Análise de Sequência de DNA , Águas Residuárias/virologia , Bacteriólise , Composição de Bases , China , DNA/química , DNA/genética , DNA Viral/química , DNA Viral/genética , Microscopia Eletrônica de Transmissão , Anotação de Sequência Molecular , Ensaio de Placa Viral , Vírion/ultraestrutura
8.
Arch Virol ; 164(10): 2609-2611, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31312966

RESUMO

A new virus belonging to the genus Vitivirus in the family Betaflexiviridae was identified by next-generation sequencing of a blueberry plant showing green mosaic symptoms. The genome organization of the virus, which is tentatively named "blueberry green mosaic-associated virus" (BGMaV), is typical of vitiviruses, with five open reading frames (ORFs) and a polyadenylated 3' terminus. The ORFs code for the viral replicase, a 16K protein of unknown function, a movement protein, a coat protein (CP), and a nucleic acid binding protein. Phylogenetic analyses based on the deduced amino acid sequence of the CP and conserved motifs of the RNA-dependent RNA polymerase confirmed the taxonomic placement of BGMaV in the genus Vitivirus.


Assuntos
Mirtilos Azuis (Planta)/virologia , Flexiviridae/classificação , Flexiviridae/isolamento & purificação , Filogenia , Doenças das Plantas/virologia , Ordem dos Genes , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Fases de Leitura Aberta , RNA Mensageiro , RNA Viral/genética , Análise de Sequência de DNA , Homologia de Sequência de Aminoácidos , Proteínas Virais/genética
9.
Nat Genet ; 51(6): 1052-1059, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31152161

RESUMO

Maize is one of the most important crops globally, and it shows remarkable genetic diversity. Knowledge of this diversity could help in crop improvement; however, gold-standard genomes have been elucidated only for modern temperate varieties. Here, we present a high-quality reference genome (contig N50 of 15.78 megabases) of the maize small-kernel inbred line, which is derived from a tropical landrace. Using haplotype maps derived from B73, Mo17 and SK, we identified 80,614 polymorphic structural variants across 521 diverse lines. Approximately 22% of these variants could not be detected by traditional single-nucleotide-polymorphism-based approaches, and some of them could affect gene expression and trait performance. To illustrate the utility of the diverse SK line, we used it to perform map-based cloning of a major effect quantitative trait locus controlling kernel weight-a key trait selected during maize improvement. The underlying candidate gene ZmBARELY ANY MERISTEM1d provides a target for increasing crop yields.


Assuntos
Estudos de Associação Genética , Genoma de Planta , Genômica , Fenótipo , Zea mays/genética , Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Endogamia , Anotação de Sequência Molecular , Melhoramento Vegetal , Plantas Geneticamente Modificadas , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável
10.
Gene ; 710: 375-386, 2019 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-31200084

RESUMO

Cynanchum thesioides are upright, xerophytic shrubs that are widely distributed in arid and semi-arid areas of China, North Korea, Mongolia and Siberia. To date, little is known about the molecular mechanisms of drought resistance in C. thesioides. To better understand drought resistance, we used transcriptome analysis and Illumina sequencing technology on C. thesioides, to identify drought-responsive genes. Using de novo assembly 55,268 unigenes were identified from 207.58 Gb of clean data. Amongst these, 36,265 were annotated with gene descriptions, conserved domains, gene ontology terms and metabolic pathways. The sequencing results showed that genes that were differentially expressed (DEGs) under drought stress were enriched in pathways such as carbon metabolism, starch and sucrose metabolism, amino acid biosynthesis, phenylpropanoid biosynthesis and plant hormone signal transduction. Moreover, many functional genes were up-regulated under severe drought stress to enhance tolerance. Weighted gene co-expression network analysis showed that there were key hub genes related to drought stress. Hundreds of candidate genes were identified under severe drought stress, including transcriptional factors such as MYB, G2-like, ERF, C2H2, NAC, NF-X1, GRF, HD-ZIP, HB-other, HSF, C3H, GRAS, WRKY, bHLH and Trihelix. These data are a valuable resource for further investigation into the molecular mechanism for drought stress in C. thesioides and will facilitate exploration of drought resistance genes.


Assuntos
Cynanchum/genética , Secas , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Regulação da Expressão Gênica de Plantas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular , Proteínas de Plantas/genética , Análise de Sequência de RNA/métodos , Estresse Fisiológico
11.
BMC Genomics ; 20(1): 455, 2019 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-31164105

RESUMO

BACKGROUND: Natural rubber, an indispensable commodity used in approximately 40,000 products, is fundamental to the tire industry. The rubber tree species Hevea brasiliensis (Willd. ex Adr. de Juss.) Muell-Arg., which is native the Amazon rainforest, is the major producer of latex worldwide. Rubber tree breeding is time consuming, expensive and requires large field areas. Thus, genetic studies could optimize field evaluations, thereby reducing the time and area required for these experiments. In this work, transcriptome sequencing was used to identify a full set of transcripts and to evaluate the gene expression involved in the different cold-response strategies of the RRIM600 (cold-resistant) and GT1 (cold-tolerant) genotypes. RESULTS: We built a comprehensive transcriptome using multiple database sources, which resulted in 104,738 transcripts clustered in 49,304 genes. The RNA-seq data from the leaf tissues sampled at four different times for each genotype were used to perform a gene-level expression analysis. Differentially expressed genes (DEGs) were identified through pairwise comparisons between the two genotypes for each time series of cold treatments. DEG annotation revealed that RRIM600 and GT1 exhibit different chilling tolerance strategies. To cope with cold stress, the RRIM600 clone upregulates genes promoting stomata closure, photosynthesis inhibition and a more efficient reactive oxygen species (ROS) scavenging system. The transcriptome was also searched for putative molecular markers (single nucleotide polymorphisms (SNPs) and microsatellites) in each genotype. and a total of 27,111 microsatellites and 202,949 (GT1) and 156,395 (RRIM600) SNPs were identified in GT1 and RRIM600. Furthermore, a search for alternative splicing (AS) events identified a total of 20,279 events. CONCLUSIONS: The elucidation of genes involved in different chilling tolerance strategies associated with molecular markers and information regarding AS events provides a powerful tool for further genetic and genomic analyses of rubber tree breeding.


Assuntos
Resposta ao Choque Frio/genética , Hevea/genética , Processamento Alternativo , Perfilação da Expressão Gênica , Marcadores Genéticos , Hevea/metabolismo , Anotação de Sequência Molecular , Proteínas de Plantas/química , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Polimorfismo de Nucleotídeo Único , Domínios Proteicos , Análise de Sequência de RNA
12.
BMC Bioinformatics ; 20(1): 346, 2019 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-31208321

RESUMO

BACKGROUND: Acetylation on lysine is a widespread post-translational modification which is reversible and plays a crucial role in some biological activities. To better understand the mechanism, it is necessary to identify acetylation sites in proteins accurately. Computational methods are popular because they are more convenient and faster than experimental methods. In this study, we proposed a new computational method to predict acetylation sites in human by combining sequence features and structural features including physicochemical property (PCP), position specific score matrix (PSSM), auto covariation (AC), residue composition (RC), secondary structure (SS) and accessible surface area (ASA), which can well characterize the information of acetylated lysine sites. Besides, a two-step feature selection was applied, which combined mRMR and IFS. It finally trained a cascade classifier based on SVM, which successfully solved the imbalance between positive samples and negative samples and covered all negative sample information. RESULTS: The performance of this method is measured with a specificity of 72.19% and a sensibility of 76.71% on independent dataset which shows that a cascade SVM classifier outperforms single SVM classifier. CONCLUSIONS: In addition to the analysis of experimental results, we also made a systematic and comprehensive analysis of the acetylation data.


Assuntos
Biologia Computacional/métodos , Máquina de Vetores de Suporte , Acetilação , Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Lisina/química , Camundongos , Anotação de Sequência Molecular , Matrizes de Pontuação de Posição Específica , Processamento de Proteína Pós-Traducional , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/metabolismo , Ratos
13.
BMC Bioinformatics ; 20(1): 338, 2019 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-31208327

RESUMO

BACKGROUND: The advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale. RESULTS: We present OCELOT, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as OCELOT), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning.


Assuntos
Genoma Fúngico , Genômica/métodos , Anotação de Sequência Molecular , Proteínas/genética , Saccharomyces cerevisiae/genética , Algoritmos , Tomada de Decisões , Ontologia Genética
14.
Nat Immunol ; 20(7): 902-914, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31209404

RESUMO

Lupus nephritis is a potentially fatal autoimmune disease for which the current treatment is ineffective and often toxic. To develop mechanistic hypotheses of disease, we analyzed kidney samples from patients with lupus nephritis and from healthy control subjects using single-cell RNA sequencing. Our analysis revealed 21 subsets of leukocytes active in disease, including multiple populations of myeloid cells, T cells, natural killer cells and B cells that demonstrated both pro-inflammatory responses and inflammation-resolving responses. We found evidence of local activation of B cells correlated with an age-associated B-cell signature and evidence of progressive stages of monocyte differentiation within the kidney. A clear interferon response was observed in most cells. Two chemokine receptors, CXCR4 and CX3CR1, were broadly expressed, implying a potentially central role in cell trafficking. Gene expression of immune cells in urine and kidney was highly correlated, which would suggest that urine might serve as a surrogate for kidney biopsies.


Assuntos
Rim/imunologia , Nefrite Lúpica/imunologia , Biomarcadores , Biópsia , Análise por Conglomerados , Biologia Computacional/métodos , Células Epiteliais/metabolismo , Citometria de Fluxo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Imunofenotipagem , Interferons/metabolismo , Rim/metabolismo , Rim/patologia , Leucócitos/imunologia , Leucócitos/metabolismo , Nefrite Lúpica/genética , Nefrite Lúpica/metabolismo , Nefrite Lúpica/patologia , Linfócitos/imunologia , Linfócitos/metabolismo , Anotação de Sequência Molecular , Células Mieloides/imunologia , Células Mieloides/metabolismo , Análise de Célula Única , Transcriptoma
15.
BMC Bioinformatics ; 20(1): 308, 2019 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-31182027

RESUMO

BACKGROUND: Although various machine learning-based predictors have been developed for estimating protein-protein interactions, their performances vary with dataset and species, and are affected by two primary aspects: choice of learning algorithm, and the representation of protein pairs. To improve the performance of predicting protein-protein interactions, we exploit the synergy of multiple learning algorithms, and utilize the expressiveness of different protein-pair features. RESULTS: We developed a stacked generalization scheme that integrates five learning algorithms. We also designed three types of protein-pair features based on the physicochemical properties of amino acids, gene ontology annotations, and interaction network topologies. When tested on 19 published datasets collected from eight species, the proposed approach achieved a significantly higher or comparable overall performance, compared with seven competitive predictors. CONCLUSION: We introduced an ensemble learning approach for PPI prediction that integrated multiple learning algorithms and different protein-pair representations. The extensive comparisons with other state-of-the-art prediction tools demonstrated the feasibility and superiority of the proposed method.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas/métodos , Animais , Área Sob a Curva , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Anotação de Sequência Molecular
16.
Parasit Vectors ; 12(1): 312, 2019 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-31234937

RESUMO

BACKGROUND: Babesiosis is an economically important disease caused by tick-borne apicomplexan protists of the genus Babesia. Most apicomplexan parasites, including Babesia, have a plastid-derived organelle termed an apicoplast, which is involved in critical metabolic pathways such as fatty acid, iron-sulphur, haem and isoprenoid biosynthesis. Apicoplast genomic data can provide significant information for understanding and exploring the biological features, taxonomic and evolutionary relationships of apicomplexan parasites, and identify targets for anti-parasitic drugs. However, there are limited data on the apicoplast genomes of Babesia species infective to small ruminants. METHODS: PCR primers were designed based on the previously reported apicoplast genome sequences of Babesia motasi Lintan and Babesia sp. Xinjiang using Illumina technology. The overlapped apicoplast genomic fragments of six ovine Babesia isolates were amplified and sequenced using the Sanger dideoxy chain-termination method. The full-length sequences of the apicoplast genomes were assembled and annotated using bioinformatics software. The gene contents and order of apicoplast genomes obtained in this study were defined and compared with those of other apicomplexan parasites. Phylogenetic trees were constructed on the concatenated amino acid sequences of 13 gene products using MEGA v.6.06. RESULTS: The results showed that the six ovine Babesia apicoplast genomes consisted of circular DNA. The genome sizes were 29,916-30,846 bp with 78.7-81.0% A + T content, 29-31 open reading frames (ORF) and 23-24 transport RNAs. The ORFs encoded four DNA-directed RNA polymerase subunits (rpoB, rpoCl, rpoC2a and rpoC2b), 13 ribosomal proteins, one elongation factor TU (tufA), two ATP-dependent Clp proteases (ClpC) and 7-11 hypothetical proteins. Babesia sp. has three more genes than Babesia motasi (rpl5, rps8 and rpoB). Phylogenetic analysis showed that Babesia sp. is located in a separate clade. Babesia motasi Lintan/Tianzhu and B. motasi Ningxian/Hebei were divided into two subclades. CONCLUSIONS: To our knowledge, this study is the first to elucidate the whole apicoplast genomic structural features of six Babesia isolates infective to small ruminants in China using Sanger sequencing. The data provide useful information confirming the taxonomic relationships of these parasites and identifying targets for anti-apicomplexan parasite drugs.


Assuntos
Apicoplastos/genética , Babesia/genética , Genoma de Protozoário , Ruminantes/parasitologia , Animais , Babesiose/epidemiologia , China , Biologia Computacional , Primers do DNA/genética , Anotação de Sequência Molecular , Filogenia , Reação em Cadeia da Polimerase , Análise de Sequência de DNA , Ovinos , Doenças dos Ovinos/epidemiologia , Doenças dos Ovinos/parasitologia
17.
BMC Evol Biol ; 19(1): 124, 2019 06 18.
Artigo em Inglês | MEDLINE | ID: mdl-31215393

RESUMO

BACKGROUND: Mycobacteria occupy various ecological niches and can be isolated from soil, tap water and ground water. Several cause diseases in humans and animals. To get deeper insight into our understanding of mycobacterial evolution focusing on tRNA and non-coding (nc)RNA, we conducted a comparative genome analysis of Mycobacterium mucogenicum (Mmuc) and Mycobacterium neoaurum (Mneo) clade members. RESULTS: Genome sizes for Mmuc- and Mneo-clade members vary between 5.4 and 6.5 Mbps with the complete MmucT (type strain) genome encompassing 6.1 Mbp. The number of tRNA genes range between 46 and 79 (including one pseudo tRNA gene) with 39 tRNA genes common among the members of these clades, while additional tRNA genes were probably acquired through horizontal gene transfer. Selected tRNAs and ncRNAs (RNase P RNA, tmRNA, 4.5S RNA, Ms1 RNA and 6C RNA) are expressed, and the levels for several of these are higher in stationary phase compared to exponentially growing cells. The rare tRNAIleTAT isoacceptor and two for mycobacteria novel ncRNAs: the Lactobacillales-derived GOLLD RNA and a homolog to the antisense Salmonella typhimurium phage Sar RNA, were shown to be present and expressed in certain Mmuc-clade members. CONCLUSIONS: Phages, IS elements, horizontally transferred tRNA gene clusters, and phage-derived ncRNAs appears to have influenced the evolution of the Mmuc- and Mneo-clades. While the number of predicted coding sequences correlates with genome size, the number of tRNA coding genes does not. The majority of the tRNA genes in mycobacteria are transcribed mainly from single genes and the levels of certain ncRNAs, including RNase P RNA (essential for the processing of tRNAs), are higher at stationary phase compared to exponentially growing cells. We provide supporting evidence that Ms1 RNA represents a mycobacterial 6S RNA variant. The evolutionary routes for the ncRNAs RNase P RNA, tmRNA and Ms1 RNA are different from that of the core genes.


Assuntos
Genoma Bacteriano , Mycobacterium/crescimento & desenvolvimento , Mycobacterium/genética , RNA Bacteriano/genética , RNA de Transferência/genética , RNA não Traduzido/genética , Aminoacil-tRNA Sintetases/genética , Bacteriófagos/genética , Tamanho do Genoma , Genômica , Anotação de Sequência Molecular , Mycobacterium/classificação , Filogenia , Plasmídeos/genética , RNA não Traduzido/química , Ribonuclease P/genética , Inversão de Sequência
18.
Hum Genet ; 138(10): 1091-1104, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31230194

RESUMO

Although genome-wide association studies (GWAS) have identified hundreds of risk loci for breast and prostate cancer, only a few studies have characterized the GWAS association signals across functional genomic annotations with a particular focus on single nucleotide polymorphisms (SNPs) located in DNA regulatory elements. In this study, we investigated the enrichment pattern of GWAS signals for breast and prostate cancer in genomic functional regions located in normal tissue and cancer cell lines. We quantified the overall enrichment of SNPs with breast and prostate cancer association p values < 1 × 10-8 across regulatory categories. We then obtained annotations for DNaseI hypersensitive sites (DHS), typical enhancers, and super enhancers across multiple tissue types, to assess if significant GWAS signals were selectively enriched in annotations found in disease-related tissue. Finally, we quantified the enrichment of breast and prostate cancer SNP heritability in regulatory regions, and compared the enrichment pattern of SNP heritability with GWAS signals. DHS, typical enhancers, and super enhancers identified in the breast cancer cell line MCF-7 were observed with the highest enrichment of genome-wide significant variants for breast cancer. For prostate cancer, GWAS signals were mostly enriched in DHS and typical enhancers identified in the prostate cancer cell line LNCaP. With progressively stringent GWAS p value thresholds, an increasing trend of enrichment was observed for both diseases in DHS, typical enhancers, and super enhancers located in disease-related tissue. Results from heritability enrichment analysis supported the selective enrichment pattern of functional genomic regions in disease-related cell lines for both breast and prostate cancer. Our results suggest the importance of studying functional annotations identified in disease-related tissues when characterizing GWAS results, and further demonstrate the role of germline DNA regulatory elements from disease-related tissue in breast and prostate carcinogenesis.


Assuntos
Neoplasias da Mama/genética , Predisposição Genética para Doença , Variação Genética , Neoplasias da Próstata/genética , Sequências Reguladoras de Ácido Nucleico , Biomarcadores Tumorais , Linhagem Celular Tumoral , Biologia Computacional/métodos , Feminino , Estudos de Associação Genética , Estudo de Associação Genômica Ampla , Humanos , Masculino , Anotação de Sequência Molecular , Especificidade de Órgãos
19.
BMC Bioinformatics ; 20(1): 253, 2019 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-31096906

RESUMO

BACKGROUND: The development of whole genome bisulfite sequencing has made it possible to identify methylation differences at single base resolution throughout an entire genome. However, a persistent challenge in DNA methylome analysis is the accurate identification of differentially methylated regions (DMRs) between samples. Sensitive and specific identification of DMRs among different conditions requires accurate and efficient algorithms, and while various tools have been developed to tackle this problem, they frequently suffer from inaccurate DMR boundary identification and high false positive rate. RESULTS: We present a novel Histogram Of MEthylation (HOME) based method that takes into account the inherent difference in the distribution of methylation levels between DMRs and non-DMRs to discriminate between the two using a Support Vector Machine. We show that generated features used by HOME are dataset-independent such that a classifier trained on, for example, a mouse methylome training set of regions of differentially accessible chromatin, can be applied to any other organism's dataset and identify accurate DMRs. We demonstrate that DMRs identified by HOME exhibit higher association with biologically relevant genes, processes, and regulatory events compared to the existing methods. Moreover, HOME provides additional functionalities lacking in most of the current DMR finders such as DMR identification in non-CG context and time series analysis. HOME is freely available at https://github.com/ListerLab/HOME . CONCLUSION: HOME produces more accurate DMRs than the current state-of-the-art methods on both simulated and biological datasets. The broad applicability of HOME to identify accurate DMRs in genomic data from any organism will have a significant impact upon expanding our knowledge of how DNA methylation dynamics affect cell development and differentiation.


Assuntos
Algoritmos , Metilação de DNA/genética , Aprendizado de Máquina , Animais , Simulação por Computador , Bases de Dados Genéticas , Camundongos , Anotação de Sequência Molecular , Fatores de Tempo
20.
Genet Sel Evol ; 51(1): 20, 2019 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-31077144

RESUMO

BACKGROUND: Genome-wide association studies (GWAS) are widely used to identify regions of the genome that harbor genetic determinants of quantitative traits. However, the multiple-testing burden from scanning tens of millions of whole-genome sequence variants reduces the power to identify associated variants, especially if sample size is limited. In addition, factors such as inaccuracy of imputation, complex linkage disequilibrium structures, and multiple closely-located causal variants may result in an identified causative mutation not being the most significant single nucleotide polymorphism in a particular genomic region. Therefore, the use of information from different sources, particularly variant annotations, was proposed to enhance the fine-mapping of causal variants. Here, we tested whether applying significance thresholds based on variant annotation categories increases the power of GWAS compared with a flat Bonferroni multiple-testing correction. RESULTS: Whole-genome sequence variants in dairy cattle were categorized according to type and predicted impact. Then, GWAS between markers and 17 quantitative traits were analyzed for enrichment for association of each annotation category. By using annotation categories that were determined with the variants effect predictor software and datasets indicating regions of open chromatin, "low impact" variants were found to be highly enriched. Moreover, when the variants annotated as "modifier" and not located at open chromatin regions were further classified into different types of potential regulatory elements, the high impact variants, moderate impact variants, variants located in the 3' and 5' untranslated regions, and variants located in potential non-coding RNA regions exhibited relatively more enrichment. In contrast, a similar study on human GWAS data reported that enrichment of association signals was highest with high impact variants. We observed an increase in power when these variant category-based significance thresholds were applied for GWAS results on stature in Nordic Holstein cattle, as more candidate genes from previous large GWAS meta-analysis for cattle stature were confirmed. CONCLUSIONS: Use of variant category-based genome-wide significance thresholds can marginally increase the power to detect the candidate genes in cattle. With the continued improvements in annotation of the bovine genome, we anticipate that the growing usefulness of variant category-based significance thresholds will be demonstrated.


Assuntos
Bovinos/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo Genético , Animais , Estudo de Associação Genômica Ampla/normas , Anotação de Sequência Molecular , Locos de Características Quantitativas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA