Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 63
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 38(Suppl 1): i325-i332, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758801

RESUMO

MOTIVATION: During lead compound optimization, it is crucial to identify pathways where a drug-like compound is metabolized. Recently, machine learning-based methods have achieved inspiring progress to predict potential metabolic pathways for drug-like compounds. However, they neglect the knowledge that metabolic pathways are dependent on each other. Moreover, they are inadequate to elucidate why compounds participate in specific pathways. RESULTS: To address these issues, we propose a novel Multi-Label Graph Learning framework of Metabolic Pathway prediction boosted by pathway interdependence, called MLGL-MP, which contains a compound encoder, a pathway encoder and a multi-label predictor. The compound encoder learns compound embedding representations by graph neural networks. After constructing a pathway dependence graph by re-trained word embeddings and pathway co-occurrences, the pathway encoder learns pathway embeddings by graph convolutional networks. Moreover, after adapting the compound embedding space into the pathway embedding space, the multi-label predictor measures the proximity of two spaces to discriminate which pathways a compound participates in. The comparison with state-of-the-art methods on KEGG pathways demonstrates the superiority of our MLGL-MP. Also, the ablation studies reveal how its three components contribute to the model, including the pathway dependence, the adapter between compound embeddings and pathway embeddings, as well as the pre-training strategy. Furthermore, a case study illustrates the interpretability of MLGL-MP by indicating crucial substructures in a compound, which are significantly associated with the attending metabolic pathways. It is anticipated that this work can boost metabolic pathway predictions in drug discovery. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are freely available at https://github.com/dubingxue/MLGL-MP.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Descoberta de Drogas , Redes e Vias Metabólicas , Software
2.
BMC Genomics ; 20(Suppl 10): 914, 2019 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-31888459

RESUMO

BACKGROUND: Identification of antibiotic resistance genes from environmental samples has been a critical sub-domain of gene discovery which is directly connected to human health. However, it is drawing extraordinary attention in recent years and regarded as a severe threat to human health by many institutions around the world. To satisfy the needs for efficient ARG discovery, a series of online antibiotic resistance gene databases have been published. This article will conduct an in-depth analysis of CARD, one of the most widely used ARG databases. RESULTS: The decision model of CARD is based the alignment score with a single ARG type. We discover the occasions where the model is likely to make false prediction, and then propose an optimization method on top of the current CARD model. The optimization is expected to raise the coherence with BLAST homology relationships and improve the confidence for identification of ARGs using the database. CONCLUSIONS: The absence of public recognized benchmark makes it challenging to evaluate the performance of ARG identification. However, possible wrong predictions and methods for resolving the problem can be inferred by computational analysis of the identification method and the underlying reference sequences. We hope our work can bring insight to the mission of precise ARG type classifications.


Assuntos
Resistência Microbiana a Medicamentos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Genéticos , Ontologia Genética , Homologia de Sequência do Ácido Nucleico , Máquina de Vetores de Suporte
3.
BMC Bioinformatics ; 19(Suppl 9): 281, 2018 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-30367598

RESUMO

BACKGROUND: Human Microbiome Project reveals the significant mutualistic influence between human body and microbes living in it. Such an influence lead to an interesting phenomenon that many noninfectious diseases are closely associated with diverse microbes. However, the identification of microbe-noninfectious disease associations (MDAs) is still a challenging task, because of both the high cost and the limitation of microbe cultivation. Thus, there is a need to develop fast approaches to screen potential MDAs. The growing number of validated MDAs enables us to meet the demand in a new insight. Computational approaches, especially machine learning, are promising to predict MDA candidates rapidly among a large number of microbe-disease pairs with the advantage of no limitation on microbe cultivation. Nevertheless, a few computational efforts at predicting MDAs are made so far. RESULTS: In this paper, grouping a set of MDAs into a binary MDA matrix, we propose a novel predictive approach (BMCMDA) based on Binary Matrix Completion to predict potential MDAs. The proposed BMCMDA assumes that the incomplete observed MDA matrix is the summation of a latent parameterizing matrix and a noising matrix. It also assumes that the independently occurring subscripts of observed entries in the MDA matrix follows a binomial model. Adopting a standard mean-zero Gaussian distribution for the nosing matrix, we model the relationship between the parameterizing matrix and the MDA matrix under the observed microbe-disease pairs as a probit regression. With the recovered parameterizing matrix, BMCMDA deduces how likely a microbe would be associated with a particular disease. In the experiment under leave-one-out cross-validation, it exhibits the inspiring performance (AUC = 0.906, AUPR =0.526) and demonstrates its superiority by ~ 7% and ~ 5% improvements in terms of AUC and AUPR respectively in the comparison with the pioneering approach KATZHMDA. CONCLUSIONS: Our BMCMDA provides an effective approach for predicting MDAs and can be also extended to other similar predicting tasks of binary relationship (e.g. protein-protein interaction, drug-target interaction).


Assuntos
Algoritmos , Bactérias , Biologia Computacional/métodos , Doença , Microbiota , Modelos Biológicos , Fenômenos Fisiológicos Bacterianos , Interações Hospedeiro-Patógeno , Humanos , Fatores de Risco
4.
BMC Bioinformatics ; 19(Suppl 14): 411, 2018 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-30453924

RESUMO

BACKGROUND: A significant number of adverse drug reactions is caused by unexpected Drug-drug interactions (DDIs). The identification of DDIs becomes crucial before the co-prescription of multiple drugs is made. Such a task in clinics or in drug discovery usually requires high costs and numerous limitations, while computational approaches are able to predict potential DDIs effectively by utilizing diverse drug attributes (e.g. side effects). Nevertheless, they're incapable when required to predict enhancive and degressive DDIs, which change increasingly and decreasingly the pharmacological behavior of interacting drugs respectively. The pharmacological change of DDIs is one of the most important factors when making a multi-drug prescription. RESULTS: In this work, we design a Triple Matrix Factorization-based Unified Framework (TMFUF) to address the above issue. By leveraging a group of side effect entries of drugs, TMFUF achieves the inspiring result (AUC = 0.842 and AUPR = 0.526) in the case of conventional DDI prediction under the traditional screening task. In the comparison with two state-of-the-art approaches, TMFUF demonstrates it superiority by ~ 7% and ~ 20% improvement in terms of AUC and AUPR respectively. More importantly, TMFUF shows its ability in the comprehensive DDI prediction under different screening tasks. Finally, a utilization TMFUF reveals the significant pairs of side effects, which contribute to form enhancive and degressive DDIs, for further clinical validation. CONCLUSIONS: The proposed TMFUF is first capable to predict both conventional binary DDIs and comprehensive DDIs such that it captures the pharmacological changes caused by DDIs. Furthermore, it provides a unified solution of DDI prediction for two screening scenarios, which involves newly given drugs having no prior interaction. Another advantage is its ability to indicate how significantly the pairs of drug features contribute to form DDIs.


Assuntos
Algoritmos , Interações Medicamentosas , Humanos , Análise dos Mínimos Quadrados , Curva ROC , Reprodutibilidade dos Testes
5.
BMC Bioinformatics ; 18(Suppl 12): 409, 2017 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-29072137

RESUMO

BACKGROUND: Drug Combination is one of the effective approaches for treating complex diseases. However, determining combinative drug pairs in clinical trials is still costly. Thus, computational approaches are used to identify potential drug pairs in advance. Existing computational approaches have the following shortcomings: (i) the lack of an effective integration of heterogeneous features leads to a time-consuming training and even results in an over-fitted classifier; and (ii) the narrow consideration of predicting potential drug combinations only among known drugs having known combinations cannot meet the demand of realistic screenings, which pay more attention to potential combinative pairs among newly-coming drugs that have no approved combination with other drugs at all. RESULTS: In this paper, to tackle the above two problems, we propose a novel drug-driven approach for predicting potential combinative pairs on a large scale. We define four new features based on heterogeneous data and design an efficient fusion scheme to integrate these feature. Moreover importantly, we elaborate appropriate cross-validations towards realistic screening scenarios of drug combinations involving both known drugs and new drugs. In addition, we perform an extra investigation to show how each kind of heterogeneous features is related to combinative drug pairs. The investigation inspires the design of our approach. Experiments on real data demonstrate the effectiveness of our fusion scheme for integrating heterogeneous features and its predicting power in three scenarios of realistic screening. In terms of both AUC and AUPR, the prediction among known drugs achieves 0.954 and 0.821, that between known drugs and new drugs achieves 0.909 and 0.635, and that among new drugs achieves 0.809 and 0.592 respectively. CONCLUSIONS: Our approach provides not only an effective tool to integrate heterogeneous features but also the first tool to predict potential combinative pairs among new drugs.


Assuntos
Biologia Computacional/métodos , Combinação de Medicamentos , Avaliação Pré-Clínica de Medicamentos , Bases de Dados como Assunto , Humanos
6.
Plant J ; 85(4): 532-47, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26764122

RESUMO

The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30-40 amino acid motifs that form an extended binding surface capable of sequence-specific recognition of RNA strands. Almost all of them are post-translationally targeted to plastids and mitochondria, where they play important roles in post-transcriptional processes including splicing, RNA editing and the initiation of translation. A code describing how PPR proteins recognise their RNA targets promises to accelerate research on these proteins, but making use of this code requires accurate definition and annotation of all of the various nucleotide-binding motifs in each protein. We have used a structural modelling approach to define 10 different variants of the PPR motif found in plant proteins, in addition to the putative deaminase motif that is found at the C-terminus of many RNA-editing factors. We show that the super-helical RNA-binding surface of RNA-editing factors is potentially longer than previously recognised. We used the redefined motifs to develop accurate and consistent annotations of PPR sequences from 109 genomes. We report a high error rate in PPR gene models in many public plant proteomes, due to gene fusions and insertions of spurious introns. These consistently annotated datasets across a wide range of species are valuable resources for future comparative genomics studies, and an essential pre-requisite for accurate large-scale computational predictions of PPR targets. We have created a web portal (http://www.plantppr.com) that provides open access to these resources for the community.


Assuntos
Embriófitas/genética , Modelos Estruturais , Proteínas de Plantas/química , Edição de RNA/genética , Motivos de Aminoácidos , Sequência de Aminoácidos , Embriófitas/metabolismo , Mitocôndrias/metabolismo , Modelos Moleculares , Anotação de Sequência Molecular , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plastídeos/metabolismo , Transporte Proteico , Proteínas com Motivo de Reconhecimento de RNA/química , Proteínas com Motivo de Reconhecimento de RNA/genética , Proteínas com Motivo de Reconhecimento de RNA/metabolismo , RNA de Plantas/genética , Alinhamento de Sequência
7.
BMC Genomics ; 17 Suppl 5: 499, 2016 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-27586129

RESUMO

BACKGROUND: De novo genome assembly using NGS data remains a computation-intensive task especially for large genomes. In practice, efficiency is often a primary concern and favors using a more efficient assembler like SOAPdenovo2. Yet SOAPdenovo2, based on de Bruijn graph, fails to take full advantage of longer NGS reads (say, 150 bp to 250 bp from Illumina HiSeq and MiSeq). Assemblers that are based on string graphs (e.g., SGA), though less popular and also very slow, are more favorable for longer reads. METHODS: This paper shows a new de novo assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs. RESULTS: Experiments on two bacteria and four human datasets shows the advantage of BASE in both contig quality and speed in dealing with longer reads. In the experiment on bacteria, two datasets with read length of 100 bp and 250 bp were used.. Especially for the 250 bp dataset, BASE gives much better quality than SOAPdenovo2 and SGA and is simlilar to SPAdes. Regarding speed, BASE is consistently a few times faster than SPAdes and SGA, but still slower than SOAPdenovo2. BASE and Soapdenov2 are further compared using human datasets with read length 100 bp, 150 bp and 250 bp. BASE shows a higher N50 for all datasets, while the improvement becomes more significant when read length reaches 250 bp. Besides, BASE is more-meory efficent than SOAPdenovo2 when sequencing data with error rate. CONCLUSIONS: BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Algoritmos , Humanos , Software , Staphylococcus aureus/genética , Vibrio parahaemolyticus/genética
8.
Methods ; 83: 98-104, 2015 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-25957673

RESUMO

Predicting drug-target interaction using computational approaches is an important step in drug discovery and repositioning. To predict whether there will be an interaction between a drug and a target, most existing methods identify similar drugs and targets in the database. The prediction is then made based on the known interactions of these drugs and targets. This idea is promising. However, there are two shortcomings that have not yet been addressed appropriately. Firstly, most of the methods only use 2D chemical structures and protein sequences to measure the similarity of drugs and targets respectively. However, this information may not fully capture the characteristics determining whether a drug will interact with a target. Secondly, there are very few known interactions, i.e. many interactions are "missing" in the database. Existing approaches are biased towards known interactions and have no good solutions to handle possibly missing interactions which affect the accuracy of the prediction. In this paper, we enhance the similarity measures to include non-structural (and non-sequence-based) information and introduce the concept of a "super-target" to handle the problem of possibly missing interactions. Based on evaluations on real data, we show that our similarity measure is better than the existing measures and our approach is able to achieve higher accuracy than the two best existing algorithms, WNN-GIP and KBMF2K. Our approach is available at http://web.hku.hk/∼liym1018/projects/drug/drug.html or http://www.bmlnwpu.org/us/tools/PredictingDTI_S2/METHODS.html.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Genômica/métodos , Algoritmos , Inteligência Artificial , Humanos , Preparações Farmacêuticas/química
9.
Int J Mol Sci ; 17(12)2016 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-27929436

RESUMO

Small RNAs, including microRNAs (miRNAs) and phased small interfering RNAs (phasiRNAs; from PHAS loci), play key roles in plant development. Cultivated soybean, Glycine max, contributes a great deal to food production, but, compared to its wild kin, Glycine soja, it may lose some genetic information during domestication. In this work, we analyzed the sRNA profiles of different tissues in both cultivated (C08) and wild soybeans (W05) at three stages of development. A total of 443 known miRNAs and 15 novel miRNAs showed varying abundances between different samples, but the miRNA profiles were generally similar in both accessions. Based on a sliding window analysis workflow that we developed, 50 PHAS loci generating 55 21-nucleotide phasiRNAs were identified in C08, and 46 phasiRNAs from 41 PHAS loci were identified in W05. In germinated seedlings, phasiRNAs were more abundant in C08 than in W05. Disease resistant TIR-NB-LRR genes constitute a very large family of PHAS loci. PhasiRNAs were also generated from several loci that encode for NAC transcription factors, Dicer-like 2 (DCL2), Pentatricopeptide Repeat (PPR), and Auxin Signaling F-box 3 (AFB3) proteins. To investigate the possible involvement of miRNAs in initiating the PHAS-phasiRNA pathway, miRNA target predictions were performed and 17 C08 miRNAs and 15 W05 miRNAs were predicted to trigger phasiRNAs biogenesis. In summary, we provide a comprehensive description of the sRNA profiles of wild versus cultivated soybeans, and discuss the possible roles of sRNAs during soybean germination.


Assuntos
Fabaceae/genética , Glycine max/genética , RNA de Plantas/genética , Regulação da Expressão Gênica de Plantas/genética , MicroRNAs/genética , RNA Interferente Pequeno
10.
BMC Bioinformatics ; 16(Suppl 18): I1, 2015 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-28102114

RESUMO

GIW/InCoB2015 the joint 26th International Conference on Genome Informatics (GIW) and 14th International Conference on Bioinformatics (InCoB) held in Tokyo, September 9-11, 2015 was attended by over 200 delegates. Fifty-one out of 89 oral presentations were based on research articles accepted for publication in four BMC journal supplements and three other journals. Sixteen articles in this supplement and six articles in the BMC Systems Biology GIW/InCoB2015 Supplement are covered by this introduction. The topics range from genome informatics, protein structure informatics, image analysis to biological networks and biomarker discovery.


Assuntos
Biologia Computacional/métodos , Ásia , Biomarcadores , Pesquisa Biomédica , Congressos como Assunto , Genômica , Conformação Proteica , Mapeamento de Interação de Proteínas , Processamento de Proteína Pós-Traducional , Biologia de Sistemas
11.
BMC Bioinformatics ; 16 Suppl 5: S4, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25859903

RESUMO

Progressive sequence alignment is one of the most commonly used method for multiple sequence alignment. Roughly speaking, the method first builds a guide tree, and then aligns the sequences progressively according to the topology of the tree. It is believed that guide trees are very important to progressive alignment; a better guide tree will give an alignment with higher accuracy. Recently, we have proposed an adaptive method for constructing guide trees. This paper studies the quality of the guide trees constructed by such method. Our study showed that our adaptive method can be used to improve the accuracy of many different progressive MSA tools. In fact, we give evidences showing that the guide trees constructed by the adaptive method are among the best.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA , Simulação por Computador , Bases de Dados Genéticas , Evolução Molecular , Humanos , Filogenia , Software
12.
BMC Bioinformatics ; 16: 386, 2015 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-26573684

RESUMO

BACKGROUND: Because of the short read length of high throughput sequencing data, assembly errors are introduced in genome assembly, which may have adverse impact to the downstream data analysis. Several tools have been developed to eliminate these errors by either 1) comparing the assembled sequences with some similar reference genome, or 2) analyzing paired-end reads aligned to the assembled sequences and determining inconsistent features alone mis-assembled sequences. However, the former approach cannot distinguish real structural variations between the target genome and the reference genome while the latter approach could have many false positive detections (correctly assembled sequence being considered as mis-assembled sequence). RESULTS: We present misFinder, a tool that aims to identify the assembly errors with high accuracy in an unbiased way and correct these errors at their mis-assembled positions to improve the assembly accuracy for downstream analysis. It combines the information of reference (or close related reference) genome and aligned paired-end reads to the assembled sequence. Assembly errors and correct assemblies corresponding to structural variations can be detected by comparing the genome reference and assembled sequence. Different types of assembly errors can then be distinguished from the mis-assembled sequence by analyzing the aligned paired-end reads using multiple features derived from coverage and consistence of insert distance to obtain high confident error calls. CONCLUSIONS: We tested the performance of misFinder on both simulated and real paired-end reads data, and misFinder gave accurate error calls with only very few miscalls. And, we further compared misFinder with QUAST and REAPR. misFinder outperformed QUAST and REAPR by 1) identified more true positive mis-assemblies with very few false positives and false negatives, and 2) distinguished the correct assemblies corresponding to structural variations from mis-assembled sequence. misFinder can be freely downloaded from https://github.com/hitbio/misFinder.


Assuntos
Escherichia coli/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Schizosaccharomyces/genética , Análise de Sequência de DNA/métodos , Software , Simulação por Computador
13.
BMC Genomics ; 16 Suppl 12: I1, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26679412

RESUMO

Knowledge discovery in bioinformatics thrives on joint and inclusive efforts of stakeholders. Similarly, knowledge dissemination is expected to be more effective and scalable through joint efforts. Therefore, the International Conference on Bioinformatics (InCoB) and the International Conference on Genome Informatics (GIW) were organized as a joint conference for the first time in 13 years of coexistence. The Asia-Pacific Bioinformatics Network (APBioNet) and the Japanese Society for Bioinformatics (JSBi) collaborated to host GIW/InCoB2015 in Tokyo, September 9-11, 2015. The joint endeavour yielded 51 research articles published in seven journals, 78 poster and 89 oral presentations, showcasing bioinformatics research in the Asia-Pacific region. Encouraged by the results and reduced organizational overheads, APBioNet will collaborate with other bioinformatics societies in organizing co-located bioinformatics research and training meetings in the future. InCoB2016 will be hosted in Singapore, September 21-23, 2016.


Assuntos
Biologia Computacional , Alergia e Imunologia , China , Biologia Computacional/métodos , Biologia Computacional/organização & administração , Epigenômica , Genômica , Humanos , Informática Médica
14.
Bioinformatics ; 30(8): 1049-1055, 2014 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-24376038

RESUMO

MOTIVATION: High-throughput sequencing has been used to probe RNA structures, by treating RNAs with reagents that preferentially cleave or mark certain nucleotides according to their local structures, followed by sequencing of the resulting fragments. The data produced contain valuable information for studying various RNA properties. RESULTS: We developed methods for statistically modeling these structure-probing data and extracting structural features from them. We show that the extracted features can be used to predict RNA 'zipcodes' in yeast, regions bound by the She complex in asymmetric localization. The prediction accuracy was better than using raw RNA probing data or sequence features. We further demonstrate the use of the extracted features in identifying binding sites of RNA binding proteins from whole-transcriptome global photoactivatable-ribonucleoside-enhanced cross-linking and immunopurification (gPAR-CLIP) data. AVAILABILITY: The source code of our implemented methods is available at http://yiplab.cse.cuhk.edu.hk/probrna/ CONTACT: kevinyip@cse.cuhk.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Ligação Proteica , RNA/química , Sítios de Ligação , Sequenciamento de Nucleotídeos em Larga Escala , Conformação de Ácido Nucleico , Proteínas de Ligação a RNA/metabolismo , Saccharomyces cerevisiae/genética , Análise de Sequência de RNA/métodos , Transcriptoma
15.
Appl Microbiol Biotechnol ; 99(6): 2871-81, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25359480

RESUMO

In this study, we applied a 16S ribosomal RNA (rRNA) metagenomics approach to survey inanimate hospital environments (IHEs) in a respiratory care center (RCC). A total of 16 samples, including 9 from medical devices and 7 from workstations, were analyzed. Besides, clinical isolates were retrospectively analyzed during the sampling period in the RCC. A high amount of microbial diversity was detected, with an average of 1,836 phylotypes per sample. In addition to Acinetobacter, more than 60 % of the bacterial communities present among the top 25 abundant genera were dominated by skin-associated bacteria. Differences in bacterial profiles were restricted to individual samples. Furthermore, compliance with hand hygiene guidelines may be unsatisfactory among hospital staff according to a principal coordinate analysis that indicated clustering of bacterial communities between devices and workstations for most of the sampling sites. Compared to the high incidence of clinical isolates in the RCC, only Staphylococcus and Acinetobacter were highly abundant in the IHEs. Despite Acinetobacter was the most abundant genus present in IHEs of the RCC, potential pathogens, e.g., Acinetobacter baumannii, might remain susceptible to carbapenem. This study is the first in Taiwan to demonstrate a high diversity of human-associated bacteria in the RCC via 16S rRNA metagenomics, which allows for new assessment of potential health risks in RCCs, aids in the evaluation of existing sanitation protocols, and furthers our understanding of the development of healthcare-associated infections.


Assuntos
Bactérias/classificação , Bactérias/efeitos dos fármacos , Metagenômica/métodos , Acinetobacter baumannii/classificação , Acinetobacter baumannii/efeitos dos fármacos , Alelos , Biomassa , Carbapenêmicos/farmacologia , Chryseobacterium/classificação , Chryseobacterium/efeitos dos fármacos , DNA Bacteriano/genética , Farmacorresistência Bacteriana Múltipla , Enterococcus/classificação , Enterococcus/efeitos dos fármacos , Contaminação de Equipamentos , Fômites/microbiologia , Humanos , Klebsiella pneumoniae/classificação , Klebsiella pneumoniae/efeitos dos fármacos , Testes de Sensibilidade Microbiana , Pseudomonas aeruginosa/classificação , Pseudomonas aeruginosa/efeitos dos fármacos , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Staphylococcus/classificação , Staphylococcus/efeitos dos fármacos , Taiwan
16.
BMC Genomics ; 15: 116, 2014 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-24507710

RESUMO

BACKGROUND: In higher eukaryotes, small RNAs play a role in regulating gene expression. Overexpression (OE) lines of Arabidopsis thaliana purple acid phosphatase 2 (AtPAP2) were shown to grow faster and exhibit higher ATP and sugar contents. Leaf microarray studies showed that many genes involved in microRNAs (miRNAs) and trans-acting siRNAs (tasiRNAs) biogenesis were significantly changed in the fast-growing lines. In this study, the sRNA profiles of the leaf and the root of 20-day-old plants were sequenced and the impacts of high energy status on sRNA expression were analyzed. RESULTS: 9-13 million reads from each library were mapped to genome. miRNAs, tasiRNAs and natural antisense transcripts-generated small interfering RNAs (natsiRNAs) were identified and compared between libraries. In the leaf of OE lines, 15 known miRNAs increased in abundance and 9 miRNAs decreased in abundance, whereas in the root of OE lines, 2 known miRNAs increased in abundance and 9 miRNAs decreased in abundance. miRNAs with increased abundance in the leaf and root samples of both OE lines (miR158b and miR172a/b) were predicted to target mRNAs coding for Dof zinc finger protein and Apetala 2 (AP2) proteins, respectively. Furthermore, a significant change in the miR173-tasiRNAs-PPR/TPR network was observed in the leaves of both OE lines. CONCLUSION: In this study, the impact of high energy content on the sRNA profiles of Arabidopsis is reported. While the abundance of many stress-induced miRNAs is unaltered, the abundance of some miRNAs related to plant growth and development (miR172 and miR319) is elevated in the fast-growing lines. An induction of miR173-tasiRNAs-PPR/TPR network was also observed in the OE lines. In contrast, only few cis- and trans-natsiRNAs are altered in the fast-growing lines.


Assuntos
Trifosfato de Adenosina/farmacologia , Arabidopsis/efeitos dos fármacos , Arabidopsis/genética , Carboidratos/farmacologia , RNA de Plantas/metabolismo , Arabidopsis/metabolismo , Proteínas de Arabidopsis/antagonistas & inibidores , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Metabolismo Energético/efeitos dos fármacos , Proteínas de Homeodomínio/antagonistas & inibidores , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , MicroRNAs/metabolismo , Proteínas Nucleares/antagonistas & inibidores , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Folhas de Planta/genética , Folhas de Planta/metabolismo , Proteínas de Plantas/antagonistas & inibidores , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Raízes de Plantas/genética , Raízes de Plantas/metabolismo , RNA Interferente Pequeno/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
17.
Bioinformatics ; 29(23): 2971-8, 2013 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-24123671

RESUMO

MOTIVATION: RNA-Seq provides a powerful approach to carry out ab initio investigation of fusion transcripts representing critical translocation and post-transcriptional events that recode hereditary information. Most of the existing computational fusion detection tools are challenged by the issues of accuracy and how to handle multiple mappings. RESULTS: We present a novel tool SOAPfusion for fusion discovery with paired-end RNA-Seq reads. SOAPfusion is accurate and efficient for fusion discovery with high sensitivity (≥93%), low false-positive rate (≤1.36%), even the coverage is as low as 10×, highlighting its ability to detect fusions efficiently at low sequencing cost. From real data of Universal Human Reference RNA (UHRR) samples, SOAPfusion detected 7 novel fusion genes, more than other existing tools and all genes have been validated through reverse transcription-polymerase chain reaction followed by Sanger sequencing. SOAPfusion thus proves to be an effective method with precise applicability in search of fusion transcripts, which is advantageous to accelerate pathological and therapeutic cancer studies.


Assuntos
Fusão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias/diagnóstico , Neoplasias/genética , Software , Algoritmos , Sequência de Bases , Biologia Computacional , Humanos , Dados de Sequência Molecular , Análise de Sequência de RNA/métodos , Homologia de Sequência do Ácido Nucleico
18.
Bioinformatics ; 29(13): i326-34, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23813001

RESUMO

MOTIVATION: RNA sequencing based on next-generation sequencing technology is effective for analyzing transcriptomes. Like de novo genome assembly, de novo transcriptome assembly does not rely on any reference genome or additional annotation information, but is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100), which make it very difficult to identify low-expressed isoforms. One challenge is to remove erroneous vertices/edges with high multiplicity (produced by high-expressed isoforms) in the de Bruijn graph without removing correct ones with not-so-high multiplicity from low-expressed isoforms. Failing to do so will result in the loss of low-expressed isoforms or having complicated subgraphs with transcripts of different genes mixed together due to erroneous vertices/edges. Contributions: Unlike existing tools, which remove erroneous vertices/edges with multiplicities lower than a global threshold, we use a probabilistic progressive approach to iteratively remove them with local thresholds. This enables us to decompose the graph into disconnected components, each containing a few genes, if not a single gene, while retaining many correct vertices/edges of low-expressed isoforms. Combined with existing techniques, IDBA-Tran is able to assemble both high-expressed and low-expressed transcripts and outperform existing assemblers in terms of sensitivity and specificity for both simulated and real data. AVAILABILITY: http://www.cs.hku.hk/~alse/idba_tran. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Algoritmos , Gráficos por Computador , Genoma , Oryza/genética , Oryza/metabolismo , Sensibilidade e Especificidade , Software
19.
Artigo em Inglês | MEDLINE | ID: mdl-38324429

RESUMO

The adversarial vulnerability of convolutional neural networks (CNNs) refers to the performance degradation of CNNs under adversarial attacks, leading to incorrect decisions. However, the causes of adversarial vulnerability in CNNs remain unknown. To address this issue, we propose a unique cross-scale analytical approach from a statistical physics perspective. It reveals that the huge amount of nonlinear effects inherent in CNNs is the fundamental cause for the formation and evolution of system vulnerability. Vulnerability is spontaneously formed on the macroscopic level after the symmetry of the system is broken through the nonlinear interaction between microscopic state order parameters. We develop a cascade failure algorithm, visualizing how micro perturbations on neurons' activation can cascade and influence macro decision paths. Our empirical results demonstrate the interplay between microlevel activation maps and macrolevel decision-making and provide a statistical physics perspective to understand the causality behind CNN vulnerability. Our work will help subsequent research to improve the adversarial robustness of CNNs.

20.
BMC Genomics ; 14: 146, 2013 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-23496985

RESUMO

BACKGROUND: Biofuels extracted from the seeds of Camelina sativa have recently been used successfully as environmentally friendly jet-fuel to reduce greenhouse gas emissions. Camelina sativa is genetically very close to Arabidopsis thaliana, and both are members of the Brassicaceae. Although public databases are currently available for some members of the Brassicaceae, such as A. thaliana, A. lyrata, Brassica napus, B. juncea and B. rapa, there are no public Expressed Sequence Tags (EST) or genomic data for Camelina sativa. In this study, a high-throughput, large-scale RNA sequencing (RNA-seq) of the Camelina sativa transcriptome was carried out to generate a database that will be useful for further functional analyses. RESULTS: Approximately 27 million clean "reads" filtered from raw reads by removal of adaptors, ambiguous reads and low-quality reads (2.42 gigabase pairs) were generated by Illumina paired-end RNA-seq technology. All of these clean reads were assembled de novo into 83,493 unigenes and 103,196 transcripts using SOAPdenovo and Trinity, respectively. The average length of the transcripts generated by Trinity was 697 bp (N50 = 976), which was longer than the average length of unigenes (319 bp, N50 = 346 bp). Nonetheless, the assembly generated by SOAPdenovo produced similar number of non-redundant hits (22,435) with that of Trinity (22,433) in BLASTN searches of the Arabidopsis thaliana CDS sequence database (TAIR). Four public databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-prot, NCBI non-redundant protein (NR), and the Cluster of Orthologous Groups (COG), were used for unigene annotation; 67,791 of 83,493 unigenes (81.2%) were finally annotated with gene descriptions or conserved protein domains that were mapped to 25,329 non-redundant protein sequences. We mapped 27,042 of 83,493 unigenes (32.4%) to 119 KEGG metabolic pathways. CONCLUSIONS: This is the first report of a transcriptome database for Camelina sativa, an environmentally important member of the Brassicaceae. We showed that C. savita is closely related to Arabidopsis spp. and more distantly related to Brassica spp. Although the majority of annotated genes had high sequence identity to those of A. thaliana, a substantial proportion of disease-resistance genes (NBS-encoding LRR genes) were instead more closely similar to the genes of other Brassicaceae; these genes included BrCN, BrCNL, BrNL, BrTN, BrTNL in B. rapa. As plant genomes are under long-term selection pressure from environmental stressors, conservation of these disease-resistance genes in C. sativa and B. rapa genomes implies that they are exposed to the threats from closely-related pathogens in their natural habitats.


Assuntos
Brassicaceae/genética , Bases de Dados Genéticas , Transcriptoma , Arabidopsis/genética , Brassica/genética , Genes de Plantas , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , RNA de Plantas/genética , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA