Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
NAR Genom Bioinform ; 4(3): lqac057, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35937545

RESUMO

Temperate phages (active prophages induced from bacteria) help control pathogenicity, modulate community structure, and maintain gut homeostasis. Complete phage genome sequences are indispensable for understanding phage biology. Traditional plaque techniques are inapplicable to temperate phages due to their lysogenicity, curbing their identification and characterization. Existing bioinformatics tools for prophage prediction usually fail to detect accurate and complete temperate phage genomes. This study proposes a novel computational temperate phage detection method (TemPhD) mining both the integrated active prophages and their spontaneously induced forms (temperate phages) from next-generation sequencing raw data. Applying the method to the available dataset resulted in 192 326 complete temperate phage genomes with different host species, expanding the existing number of complete temperate phage genomes by more than 100-fold. The wet-lab experiments demonstrated that TemPhD can accurately determine the complete genome sequences of the temperate phages, with exact flanking sites, outperforming other state-of-the-art prophage prediction methods. Our analysis indicates that temperate phages are likely to function in the microbial evolution by (i) cross-infecting different bacterial host species; (ii) transferring antibiotic resistance and virulence genes and (iii) interacting with hosts through restriction-modification and CRISPR/anti-CRISPR systems. This work provides a comprehensively complete temperate phage genome database and relevant information, which can serve as a valuable resource for phage research.

2.
BMC Genomics ; 23(Suppl 4): 359, 2022 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-35546390

RESUMO

BACKGROUND: Single-cell DNA sequencing is getting indispensable in the study of cell-specific cancer genomics. The performance of computational tools that tackle single-cell genome aberrations may be nevertheless undervalued or overvalued, owing to the insufficient size of benchmarking data. In silicon simulation is a cost-effective approach to generate as many single-cell genomes as possible in a controlled manner to make reliable and valid benchmarking. RESULTS: This study proposes a new tool, SCSilicon, which efficiently generates single-cell in silicon DNA reads with minimum manual intervention. SCSilicon automatically creates a set of genomic aberrations, including SNP, SNV, Indel, and CNV. Besides, SCSilicon yields the ground truth of CNV segmentation breakpoints and subclone cell labels. We have manually inspected a series of synthetic variations. We conducted a sanity check of the start-of-the-art single-cell CNV callers and found SCYN was the most robust one. CONCLUSIONS: SCSilicon is a user-friendly software package for users to develop and benchmark single-cell CNV callers. Source code of SCSilicon is available at https://github.com/xikanfeng2/SCSilicon .


Assuntos
Variações do Número de Cópias de DNA , Silício , DNA , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34671807

RESUMO

The recent advance of single-cell copy number variation (CNV) analysis plays an essential role in addressing intratumor heterogeneity, identifying tumor subgroups and restoring tumor-evolving trajectories at single-cell scale. Informative visualization of copy number analysis results boosts productive scientific exploration, validation and sharing. Several single-cell analysis figures have the effectiveness of visualizations for understanding single-cell genomics in published articles and software packages. However, they almost lack real-time interaction, and it is hard to reproduce them. Moreover, existing tools are time-consuming and memory-intensive when they reach large-scale single-cell throughputs. We present an online visualization platform, single-cell Somatic Variant Analysis Suite (scSVAS), for real-time interactive single-cell genomics data visualization. scSVAS is specifically designed for large-scale single-cell genomic analysis that provides an arsenal of unique functionalities. After uploading the specified input files, scSVAS deploys the online interactive visualization automatically. Users may conduct scientific discoveries, share interactive visualizations and download high-quality publication-ready figures. scSVAS provides versatile utilities for managing, investigating, sharing and publishing single-cell CNV profiles. We envision this online platform will expedite the biological understanding of cancer clonal evolution in single-cell resolution. All visualizations are publicly hosted at https://sc.deepomics.org.


Assuntos
Variações do Número de Cópias de DNA , Software , Visualização de Dados , Genoma , Genômica/métodos
4.
BMC Genomics ; 22(Suppl 5): 651, 2021 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-34789142

RESUMO

BACKGROUND: Copy number variation is crucial in deciphering the mechanism and cure of complex disorders and cancers. The recent advancement of scDNA sequencing technology sheds light upon addressing intratumor heterogeneity, detecting rare subclones, and reconstructing tumor evolution lineages at single-cell resolution. Nevertheless, the current circular binary segmentation based approach proves to fail to efficiently and effectively identify copy number shifts on some exceptional trails. RESULTS: Here, we propose SCYN, a CNV segmentation method powered with dynamic programming. SCYN resolves the precise segmentation on in silico dataset. Then we verified SCYN manifested accurate copy number inferring on triple negative breast cancer scDNA data, with array comparative genomic hybridization results of purified bulk samples as ground truth validation. We tested SCYN on two datasets of the newly emerged 10x Genomics CNV solution. SCYN successfully recognizes gastric cancer cells from 1% and 10% spike-ins 10x datasets. Moreover, SCYN is about 150 times faster than state of the art tool when dealing with the datasets of approximately 2000 cells. CONCLUSIONS: SCYN robustly and efficiently detects segmentations and infers copy number profiles on single cell DNA sequencing data. It serves to reveal the tumor intra-heterogeneity. The source code of SCYN can be accessed in https://github.com/xikanfeng2/SCYN .


Assuntos
Variações do Número de Cópias de DNA , Software , Algoritmos , Hibridização Genômica Comparativa , Genômica , Análise de Sequência de DNA
5.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34109382

RESUMO

Attention deficit hyperactivity disorder (ADHD) is a common neurodevelopmental disorder. Although genome-wide association studies (GWAS) identify the risk ADHD-associated variants and genes with significant P-values, they may neglect the combined effect of multiple variants with insignificant P-values. Here, we proposed a convolutional neural network (CNN) to classify 1033 individuals diagnosed with ADHD from 950 healthy controls according to their genomic data. The model takes the single nucleotide polymorphism (SNP) loci of P-values $\le{1\times 10^{-3}}$, i.e. 764 loci, as inputs, and achieved an accuracy of 0.9018, AUC of 0.9570, sensitivity of 0.8980 and specificity of 0.9055. By incorporating the saliency analysis for the deep learning network, a total of 96 candidate genes were found, of which 14 genes have been reported in previous ADHD-related studies. Furthermore, joint Gene Ontology enrichment and expression Quantitative Trait Loci analysis identified a potential risk gene for ADHD, EPHA5 with a variant of rs4860671. Overall, our CNN deep learning model exhibited a high accuracy for ADHD classification and demonstrated that the deep learning model could capture variants' combining effect with insignificant P-value, while GWAS fails. To our best knowledge, our model is the first deep learning method for the classification of ADHD with SNPs data.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade/genética , Biomarcadores , Aprendizado Profundo , Predisposição Genética para Doença , Receptor EphA5/genética , Área Sob a Curva , Transtorno do Deficit de Atenção com Hiperatividade/diagnóstico , Biologia Computacional/métodos , Ontologia Genética , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Curva ROC
6.
J Intensive Care ; 9(1): 19, 2021 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-33602326

RESUMO

BACKGROUND: Immune and inflammatory dysfunction was reported to underpin critical COVID-19(coronavirus disease 2019). We aim to develop a machine learning model that enables accurate prediction of critical COVID-19 using immune-inflammatory features at admission. METHODS: We retrospectively collected 2076 consecutive COVID-19 patients with definite outcomes (discharge or death) between January 27, 2020 and March 30, 2020 from two hospitals in China. Critical illness was defined as admission to intensive care unit, receiving invasive ventilation, or death. Least Absolute Shrinkage and Selection Operator (LASSO) was applied for feature selection. Five machine learning algorithms, including Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosted Decision Tree (GBDT), K-Nearest Neighbor (KNN), and Neural Network (NN) were built in a training dataset, and assessed in an internal validation dataset and an external validation dataset. RESULTS: Six features (procalcitonin, [T + B + NK cell] count, interleukin 6, C reactive protein, interleukin 2 receptor, T-helper lymphocyte/T-suppressor lymphocyte) were finally used for model development. Five models displayed varying but all promising predictive performance. Notably, the ensemble model, SPMCIIP (severity prediction model for COVID-19 by immune-inflammatory parameters), derived from three contributive algorithms (SVM, GBDT, and NN) achieved the best performance with an area under the curve (AUC) of 0.991 (95% confidence interval [CI] 0.979-1.000) in internal validation cohort and 0.999 (95% CI 0.998-1.000) in external validation cohort to identify patients with critical COVID-19. SPMCIIP could accurately and expeditiously predict the occurrence of critical COVID-19 approximately 20 days in advance. CONCLUSIONS: The developed online prediction model SPMCIIP is hopeful to facilitate intensive monitoring and early intervention of high risk of critical illness in COVID-19 patients. TRIAL REGISTRATION: This study was retrospectively registered in the Chinese Clinical Trial Registry ( ChiCTR2000032161 ). vv.

7.
BMC Genomics ; 21(Suppl 10): 618, 2020 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-33208097

RESUMO

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is becoming indispensable in the study of cell-specific transcriptomes. However, in scRNA-seq techniques, only a small fraction of the genes are captured due to "dropout" events. These dropout events require intensive treatment when analyzing scRNA-seq data. For example, imputation tools have been proposed to estimate dropout events and de-noise data. The performance of these imputation tools are often evaluated, or fine-tuned, using various clustering criteria based on ground-truth cell subgroup labels. This limits their effectiveness in the cases where we lack cell subgroup knowledge. We consider an alternative strategy which requires the imputation to follow a "self-consistency" principle; that is, the imputation process is to refine its results until there is no internal inconsistency or dropouts from the data. RESULTS: We propose the use of "self-consistency" as a main criteria in performing imputation. To demonstrate this principle we devised I-Impute, a "self-consistent" method, to impute scRNA-seq data. I-Impute optimizes continuous similarities and dropout probabilities, in iterative refinements until a self-consistent imputation is reached. On the in silico data sets, I-Impute exhibited the highest Pearson correlations for different dropout rates consistently compared with the state-of-art methods SAVER and scImpute. Furthermore, we collected three wetlab datasets, mouse bladder cells dataset, embryonic stem cells dataset, and aortic leukocyte cells dataset, to evaluate the tools. I-Impute exhibited feasible cell subpopulation discovery efficacy on all the three datasets. It achieves the highest clustering accuracy compared with SAVER and scImpute. CONCLUSIONS: A strategy based on "self-consistency", captured through our method, I-Impute, gave imputation results better than the state-of-the-art tools. Source code of I-Impute can be accessed at https://github.com/xikanfeng2/I-Impute .


Assuntos
RNA , Análise de Célula Única , Animais , Perfilação da Expressão Gênica , Camundongos , Análise de Sequência de RNA , Software
9.
Front Genet ; 10: 1173, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31805179

RESUMO

[This corrects the article DOI: 10.3389/fgene.2019.00903.].

10.
BMC Bioinformatics ; 20(Suppl 24): 596, 2019 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-31861975

RESUMO

BACKGROUND: Adenosine-to-inosine RNA editing can markedly diversify the transcriptome, leading to a variety of critical molecular and biological processes in mammals. Over the past several years, researchers have developed several new pipelines and software packages to identify RNA editing sites with a focus on downstream statistical analysis and functional interpretation. RESULTS: Here, we developed a user-friendly public webserver named MIRIA that integrates statistics and visualization techniques to facilitate the comprehensive analysis of RNA editing sites data identified by the pipelines and software packages. MIRIA is unique in that provides several analytical functions, including RNA editing type statistics, genomic feature annotations, editing level statistics, genome-wide distribution of RNA editing sites, tissue-specific analysis and conservation analysis. We collected high-throughput RNA sequencing (RNA-seq) data from eight tissues across seven species as the experimental data for MIRIA and constructed an example result page. CONCLUSION: MIRIA provides both visualization and analysis of mammal RNA editing data for experimental biologists who are interested in revealing the functions of RNA editing sites. MIRIA is freely available at https://mammal.deepomics.org.


Assuntos
Mamíferos , Edição de RNA , Análise de Sequência de RNA , Transcriptoma , Animais , Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mamíferos/genética , RNA/genética , Análise de Sequência de RNA/métodos
11.
Front Genet ; 10: 903, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31611909

RESUMO

Single-cell RNA-seq studies profile thousands of cells in developmental processes. Current databases for human single-cell expression atlas only provide search and visualize functions for a selected gene in specific cell types or subpopulations. These databases are limited to technical properties or visualization of single-cell RNA-seq data without considering the biological relations of their collected cell groups. Here, we developed a database to investigate single-cell gene expression profiling during different developmental pathways (SCDevDB). In this database, we collected 10 human single-cell RNA-seq datasets, split these datasets into 176 developmental cell groups, and constructed 24 different developmental pathways. SCDevDB allows users to search the expression profiles of the interested genes across different developmental pathways. It also provides lists of differentially expressed genes during each developmental pathway, T-distributed stochastic neighbor embedding maps showing the relationships between developmental stages based on these differentially expressed genes, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes analysis results of these differentially expressed genes. This database is freely available at https://scdevdb.deepomics.org.

12.
Gigascience ; 8(8)2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-31367746

RESUMO

BACKGROUND: The imbalanced respiratory microbiota observed in pneumonia causes high morbidity and mortality in childhood. Respiratory metagenomic analysis demands a comprehensive microbial gene catalogue, which will significantly advance our understanding of host-microorganism interactions. RESULTS: We collected 334 respiratory microbial samples from 171 healthy children and 76 children with pneumonia. The respiratory microbial gene catalogue we established comprised 2.25 million non-redundant microbial genes, covering 90.52% of prevalent genes. The major oropharyngeal microbial species found in healthy children were Prevotella and Streptococcus. In children with Mycoplasma pneumoniae pneumonia (MPP), oropharyngeal microbial diversity and associated gene numbers decreased compared with those of healthy children. The concurrence network of oropharyngeal microorganisms in patients predominantly featured Staphylococcus spp. and M. pneumoniae. Functional orthologues, which are associated with the metabolism of various lipids, membrane transport, and signal transduction, accumulated in the oropharyngeal microbiome of children with pneumonia. Several antibiotic resistance genes and virulence factor genes were identified in the genomes of M. pneumoniae and 13 other microorganisms reconstructed via metagenomic data. Although the common macrolide/ß-lactam resistance genes were not identified in the assembled M. pneumoniae genome, a single-nucleotide polymorphism (A2063G) related to macrolide resistance was identified in a 23S ribosomal RNA gene. CONCLUSIONS: The results of this study will facilitate exploration of unknown microbial components and host-microorganism interactions in studies of the respiratory microbiome. They will also yield further insights into the microbial aetiology of MPP.


Assuntos
Metagenoma , Metagenômica , Microbiota , Mycoplasma pneumoniae/classificação , Mycoplasma pneumoniae/genética , Pneumonia por Mycoplasma/microbiologia , Estudos de Casos e Controles , Criança , Pré-Escolar , Feminino , Genes Microbianos , Humanos , Lactente , Masculino , Metagenômica/métodos
13.
Genes (Basel) ; 10(7)2019 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-31295957

RESUMO

Two errors occurred in the References part of our paper [...].

14.
Genes (Basel) ; 10(5)2019 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-31052161

RESUMO

Recently, the prevalence and importance of RNA editing have been illuminated in mammals. However, studies on RNA editing of pigs, a widely used biomedical model animal, are rare. Here we collected RNA sequencing data across 11 tissues and identified more than 490,000 RNA editing sites. We annotated their biological features, detected flank sequence characteristics of A-to-I editing sites and the impact of A-to-I editing on miRNA-mRNA interactions, and identified RNA editing quantitative trait loci (edQTL). Sus scrofa RNA editing sites showed high enrichment in repetitive regions with a median editing level as 15.38%. Expectedly, 96.3% of the editing sites located in non-coding regions including intron, 3' UTRs, intergenic, and gene proximal regions. There were 2233 editing sites located in the coding regions and 980 of them caused missense mutation. Our results indicated that to an A-to-I editing site, the adjacent four nucleotides, two before it and two after it, have a high impact on the editing occurrences. A commonly observed editing motif is CCAGG. We found that 4552 A-to-I RNA editing sites could disturb the original binding efficiencies of miRNAs and 4176 A-to-I RNA editing sites created new potential miRNA target sites. In addition, we performed edQTL analysis and found that 1134 edQTLs that significantly affected the editing levels of 137 RNA editing sites. Finally, we constructed PRESDB, the first pig RNA editing sites database. The site provides necessary functions associated with Sus scrofa RNA editing study.


Assuntos
Motivos de Nucleotídeos/genética , Edição de RNA/genética , Suínos/genética , Distribuição Tecidual/genética , Animais , Genoma/genética , Íntrons/genética , MicroRNAs/genética , Fases de Leitura Aberta/genética , Locos de Características Quantitativas/genética , RNA Mensageiro/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...