RESUMO
Recent nanopore sequencing system (R10.4) has enhanced base calling accuracy and is being increasingly utilized for detecting CpG methylation state. However, the robustness and universality of the methylation calling model in officially supplied Dorado remains poorly tested. In this study, we obtained heterogeneous datasets from human and plant sources to carry out comprehensive evaluations, which showed that Dorado performed significantly different across datasets. We therefore developed deep neural networks and implemented several optimizations in training a new model called DeepBAM. DeepBAM achieved superior and more stable performances compared with Dorado, including higher area under the ROC curves (98.47% on average and up to 7.36% improvement) and F1 scores (94.97% on average and up to 16.24% improvement) across the datasets. DeepBAM-based whole genome methylation frequencies have achieved >0.95 correlations with BS-seq on four of five datasets, outperforming Dorado in all instances. It enables unraveling allele-specific methylation patterns, including regions of transposable elements. The enhanced performance of DeepBAM paves the way for broader applications of nanopore sequencing in CpG methylation studies.
Assuntos
Ilhas de CpG , Metilação de DNA , Sequenciamento por Nanoporos , Sequenciamento por Nanoporos/métodos , Humanos , Software , Análise de Sequência de DNA/métodos , Redes Neurais de ComputaçãoRESUMO
DNA N6-methyladenine (6mA) modification is the most prevalent DNA modification in prokaryotes, but whether it exists in human cells and whether it plays a role in human diseases remain enigmatic. Here, we showed that 6mA is extensively present in the human genome, and we cataloged 881,240 6mA sites accounting for â¼0.051% of the total adenines. [G/C]AGG[C/T] was the most significantly associated motif with 6mA modification. 6mA sites were enriched in the coding regions and mark actively transcribed genes in human cells. DNA 6mA and N6-demethyladenine modification in the human genome were mediated by methyltransferase N6AMT1 and demethylase ALKBH1, respectively. The abundance of 6mA was significantly lower in cancers, accompanied by decreased N6AMT1 and increased ALKBH1 levels, and downregulation of 6mA modification levels promoted tumorigenesis. Collectively, our results demonstrate that DNA 6mA modification is extensively present in human cells and the decrease of genomic DNA 6mA promotes human tumorigenesis.
Assuntos
Adenina/análogos & derivados , Homólogo AlkB 1 da Histona H2a Dioxigenase/metabolismo , Genoma Humano , DNA Metiltransferases Sítio Específica (Adenina-Específica)/metabolismo , Adenina/metabolismo , Homólogo AlkB 1 da Histona H2a Dioxigenase/genética , Animais , Carcinogênese/genética , DNA/genética , Metilação de DNA , Xenoenxertos , Humanos , Camundongos , Camundongos Nus , DNA Metiltransferases Sítio Específica (Adenina-Específica)/genéticaRESUMO
MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem. RESULTS: We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (â¼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002-HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/NanoSNP.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Sequenciamento por Nanoporos , Nanoporos , Humanos , Haplótipos , Software , Polimorfismo de Nucleotídeo Único , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodosRESUMO
While China experienced a peak and decline in coronavirus disease 2019 (COVID-19) cases at the start of 2020, regional outbreaks continuously emerged in subsequent months. Resurgences of COVID-19 have also been observed in many other countries. In Guangzhou, China, a small outbreak, involving less than 100 residents, emerged in March and April 2020, and comprehensive and near-real-time genomic surveillance of SARS-CoV-2 was conducted. When the numbers of confirmed cases among overseas travelers increased, public health measures were enhanced by shifting from self-quarantine to central quarantine and SARS-CoV-2 testing for all overseas travelers. In an analysis of 109 imported cases, we found diverse viral variants distributed in the global viral phylogeny, which were frequently shared within households but not among passengers on the same flight. In contrast to the viral diversity of imported cases, local transmission was predominately attributed to two specific variants imported from Africa, including local cases that reported no direct or indirect contact with imported cases. The introduction events of the virus were identified or deduced before the enhanced measures were taken. These results show the interventions were effective in containing the spread of SARS-CoV-2, and they rule out the possibility of cryptic transmission of viral variants from the first wave in January and February 2020. Our study provides evidence and emphasizes the importance of controls for overseas travelers in the context of the pandemic and exemplifies how viral genomic data can facilitate COVID-19 surveillance and inform public health mitigation strategies.
Assuntos
COVID-19 , SARS-CoV-2 , África , Teste para COVID-19 , China/epidemiologia , Genômica , HumanosRESUMO
We present a tool that combines fast mapping, error correction, and de novo assembly (MECAT; accessible at https://github.com/xiaochuanle/MECAT) for processing single-molecule sequencing (SMS) reads. MECAT's computing efficiency is superior to that of current tools, while the results MECAT produces are comparable or improved. MECAT enables reference mapping or de novo assembly of large genomes using SMS reads on a single computer.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , SoftwareRESUMO
MOTIVATION: The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. RESULTS: In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6 mA and 5mC methylation states comparing to previous hidden Markov model (HMM) based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6 mA using only 2× coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20× coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases. AVAILABILITY AND IMPLEMENTATION: DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal. SUPPLEMENTARY INFORMATION: Supplementary data are available at bioinformatics online.
Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Aprendizado Profundo , Escherichia coli , Humanos , Sequenciamento por Nanoporos , Análise de Sequência de DNARESUMO
BACKGROUND: DNA methylation is an important epigenetic modification. Recently the developed single-molecule real-time (SMRT) sequencing technology provided an efficient way to detect DNA N6-methyladenine (6mA) modification that played an important role in epigenetic and positively regulated gene expression. In addition, the gene expression was also regulated by genetic variation. However, the relationship between DNA 6mA modification and variation is still unknown. RESULTS: We collected the SMRT long-reads DNA, Illumina short reads DNA and RNA datasets from the young leaves of Herrania umbratica, and used them to detect 35,654 6mA modification sites, 829,894 DNA variations and 60,672 RNA variations respectively, among which, there are 303 DNA variations and 19 RNA variations with 6mA modification, and 57,468 transmitted genetic variations from DNA to RNA. The results illustrated that the genes with 6mA modification were significant disadvantage to mutate than those genes without modification (p-value< 4.9e-08). And result from the linear regression model showed the 6mA densities of genes were associated with the transmitted variations type 0/1 to 1/1 (p-value < 0.001). CONCLUSIONS: The variations of DNA and RNA in genes with 6mA modification were significant less than those in unmodified genes. Furthermore, the variations in 6mA modified genes were easily transmitted from DNA to RNA, especially the transmitted variation from DNA heterozygote to RNA homozygote.
Assuntos
Adenosina/análogos & derivados , DNA de Plantas/genética , DNA de Plantas/metabolismo , Variação Genética/genética , Genoma de Planta/genética , Magnoliopsida/genética , RNA de Plantas/genética , Adenosina/metabolismo , DNA Intergênico/genética , DNA Intergênico/metabolismo , DNA de Plantas/química , Heterozigoto , Homozigoto , Magnoliopsida/metabolismoRESUMO
Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.
Assuntos
Variações do Número de Cópias de DNA , Genoma Humano , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Genômica/métodosRESUMO
Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .
Assuntos
Algoritmos , Peptídeos , Proteínas , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , Internet , Modelos Estatísticos , Peptídeos/genética , Peptídeos/isolamento & purificação , Probabilidade , Proteínas/genética , Proteínas/isolamento & purificação , SoftwareRESUMO
Canonical three-dimensional (3D) genome structures represent the ensemble average of pairwise chromatin interactions but not the single-allele topologies in populations of cells. Recently developed Pore-C can capture multiway chromatin contacts that reflect regional topologies of single chromosomes. By carrying out high-throughput Pore-C, we reveal extensive but regionally restricted clusters of single-allele topologies that aggregate into canonical 3D genome structures in two human cell types. We show that fragments in multi-contact reads generally coexist in the same TAD. In contrast, a concurrent significant proportion of multi-contact reads span multiple compartments of the same chromatin type over megabase distances. Synergistic chromatin looping between multiple sites in multi-contact reads is rare compared to pairwise interactions. Interestingly, the single-allele topology clusters are cell type-specific even inside highly conserved TADs in different types of cells. In summary, HiPore-C enables global characterization of single-allele topologies at an unprecedented depth to reveal elusive genome folding principles.
Assuntos
Cromatina , Humanos , Alelos , Cromatina/genéticaRESUMO
Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.
Assuntos
5-Metilcitosina , DNA , Humanos , Consenso , DNA/genética , Análise de Sequência de DNA/métodos , Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Here, we introduce HIT-scISOseq, a method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. We also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. We apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics.
Assuntos
Isoformas de RNA , RNA , Isoformas de RNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Consenso , Isoformas de Proteínas/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de RNARESUMO
The aim of metalloproteomics is to identify and characterize putative metal-binding proteins and metal-binding motifs. In this study, we performed a systematical metalloproteomic analysis on Streptococcus pneumoniae through the combined use of efficient immobilized metal affinity chromatography enrichment and high-accuracy linear ion trap-Orbitrap MS to identify metal-binding proteins and metal-binding peptides. In total, 232 and 166 putative metal-binding proteins were respectively isolated by Cu- and Zn-immobilized metal affinity chromatography columns, in which 133 proteins were present in both preparations. The putative metalloproteins are mainly involved in protein, nucleotide and carbon metabolisms, oxidation and cell cycle regulation. Based on the sequence of the putative Cu- and Zn-binding peptides, putative Cu-binding motifs were identified: H(X)mH (m=0-11), C(X)(2) C, C(X)nH (n=2-4, 6, 9), H(X)iM (i=0-10) and M(X)tM (t=8 or 12), while putative Zn-binding motifs were identified as follows: H(X)mH (m=1-12), H(X)iM (i=0-12), M(X)tM (t=0, 3 and 4), C(X)nH (n=1, 2, 7, 10 and 11). Equilibrium dialysis and inductively coupled plasma-MS experiments confirmed that the artificially synthesized peptides harboring differential identified metal-binding motifs interacted directly with the metal ions. The metalloproteomic study presented here suggests that the comparably large size and diverse functions of the S. pneumoniae metalloproteome may play important roles in various biological processes and thus contribute to the bacterial pathologies.
Assuntos
Proteínas de Bactérias/química , Cobre/metabolismo , Metaloproteínas/química , Streptococcus pneumoniae/metabolismo , Zinco/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Proteínas de Bactérias/análise , Proteínas de Bactérias/metabolismo , Proteínas de Transporte/análise , Proteínas de Transporte/química , Proteínas de Transporte/metabolismo , Cromatografia de Afinidade , Cobre/química , Eletroforese em Gel de Poliacrilamida , Espaço Intracelular/química , Espaço Intracelular/metabolismo , Espectrometria de Massas , Metaloproteínas/análise , Metaloproteínas/metabolismo , Dados de Sequência Molecular , Fragmentos de Peptídeos/análise , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/metabolismo , Proteoma/análise , Proteoma/química , Proteoma/metabolismo , Proteômica , Streptococcus pneumoniae/química , Zinco/químicaRESUMO
Subcellular proteomics was used to compare the protein profiles between human lung adenocarcinoma A549 cells and human bronchial epithelial (HBE) cells. In total, 106 differential proteins were identified and the altered expression levels of partial identified proteins were confirmed by Western blot analysis. Importantly, pathway analysis and biological validation revealed epithelial-mesenchymal transition (EMT) phenotype shift in A549 cells as compared with HBE cells. The EMT phenotype of A549 cells can be increased by self-producing TGF-ß1 and significantly decreased by silencing heterogeneous nuclear ribonucleoprotein (hnRNPK) expression. As EMT has been considered as an important event during malignant tumor progression and metastasis, investigating EMT and deciphering the related pathways may lead to more efficient strategies to fight lung cancer progression. By integrating the subcellular proteomic data with EMT-related functional studies, we revealed new insights into the EMT progress of lung carcinogenesis, providing clues for further investigations on the discovery of potential therapeutic targets.
Assuntos
Adenocarcinoma/metabolismo , Transição Epitelial-Mesenquimal , Neoplasias Pulmonares/metabolismo , Proteoma/metabolismo , Proteômica , Adenocarcinoma/patologia , Brônquios/citologia , Brônquios/metabolismo , Células Cultivadas , Eletroforese em Gel Bidimensional , Humanos , Immunoblotting , Neoplasias Pulmonares/patologia , Fenótipo , Proteoma/análise , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Frações SubcelularesRESUMO
The silent information regulator (Sir2) family proteins are NAD+-dependent deacetylases. Although a few substrates have been identified, functions of the bacteria Sir2-like protein (CobB) still remain unclear. Here the role of CobB on Escherichia coli chemotaxis was investigated. We used Western blotting and mass spectrometry to show that the response regulator CheY is a substrate of CobB. Surface plasmon resonance (SPR) indicated that acetylation affects the interaction between CheY and the flagellar switch protein FliM. The presence of intact flagella in knockout strains DeltacobB, Deltaacs, Delta(cobB) Delta(acs), Delta(cheA) Delta(cheZ), Delta(cheA) Delta(cheZ) Delta(cobB) and Delta(cheA) Delta(cheZ) Delta(acs) was confirmed by electron microscopy. Genetic analysis of these knockout strains showed that: (i) the DeltacobB mutant exhibited reduced responses to chemotactic stimuli in chemotactic assays, whereas the Deltaacs mutant was indistinguishable from the parental strain, (ii) CheY from the DeltacobB mutant showed a higher level of acetylation, indicating that CobB can mediate the deacetylation of CheY in vivo, and (iii) deletion of cobB reversed the phenotype of Delta(cheA) Delta(cheZ). Our findings suggest that CobB regulates E. coli chemotaxis by deacetylating CheY. Thus a new function of bacterial cobB was identified and also new insights of regulation of bacterial chemotaxis were provided.
Assuntos
Proteínas de Bactérias/metabolismo , Quimiotaxia/fisiologia , Proteínas de Escherichia coli/metabolismo , Escherichia coli/fisiologia , Proteínas de Membrana/metabolismo , Sirtuínas/metabolismo , Acetilação , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Proteínas de Escherichia coli/genética , Histidina Quinase , Proteínas de Membrana/genética , Proteínas Quimiotáticas Aceptoras de Metil , Dados de Sequência Molecular , Mutação , Fosforilação , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Sirtuínas/genéticaRESUMO
SPLiT-seq provides a low-cost platform to generate single-cell data by labeling the cellular origin of RNA through four rounds of combinatorial barcoding. However, an automatic and rapid method for preprocessing and classifying single-cell sequencing (SCS) data from SPLiT-seq, which directly identified and labeled combinatorial barcoding reads and distinguished special cell sequencing data, is currently lacking. Here, we develop a high-efficiency preprocessing tool for single-cell sequencing data from SPLiT-seq (SCSit), which can directly identify combinatorial barcodes and UMI of cell types and obtain more labeled reads, and remarkably enhance the retained data from SCS due to the exact alignment of insertion and deletion. Compared with the original method used in SPLiT-seq, the consistency of identified reads from SCSit increases to 97%, and mapped reads are twice than the original. Furthermore, the runtime of SCSit is less than 10% of the original. It can accurately and rapidly analyze SPLiT-seq raw data and obtain labeled reads, as well as effectively improve the single-cell data from SPLiT-seq platform. The data and source of SCSit are available on the GitHub website https://github.com/shang-qian/SCSit.
RESUMO
In plants, cytosine DNA methylations (5mCs) can happen in three sequence contexts as CpG, CHG, and CHH (where H = A, C, or T), which play different roles in the regulation of biological processes. Although long Nanopore reads are advantageous in the detection of 5mCs comparing to short-read bisulfite sequencing, existing methods can only detect 5mCs in the CpG context, which limits their application in plants. Here, we develop DeepSignal-plant, a deep learning tool to detect genome-wide 5mCs of all three contexts in plants from Nanopore reads. We sequence Arabidopsis thaliana and Oryza sativa using both Nanopore and bisulfite sequencing. We develop a denoising process for training models, which enables DeepSignal-plant to achieve high correlations with bisulfite sequencing for 5mC detection in all three contexts. Furthermore, DeepSignal-plant can profile more 5mC sites, which will help to provide a more complete understanding of epigenetic mechanisms of different biological processes.
Assuntos
Arabidopsis/genética , Citosina/metabolismo , DNA de Plantas/genética , Epigênese Genética , Genoma de Planta , Oryza/genética , Arabidopsis/metabolismo , Ilhas de CpG , Metilação de DNA , DNA de Plantas/metabolismo , Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nanoporos , Oryza/metabolismo , Análise de Sequência de DNA , Sulfitos/químicaRESUMO
Long nanopore reads are advantageous in de novo genome assembly. However, nanopore reads usually have broad error distribution and high-error-rate subsequences. Existing error correction tools cannot correct nanopore reads efficiently and effectively. Most methods trim high-error-rate subsequences during error correction, which reduces both the length of the reads and contiguity of the final assembly. Here, we develop an error correction, and de novo assembly tool designed to overcome complex errors in nanopore reads. We propose an adaptive read selection and two-step progressive method to quickly correct nanopore reads to high accuracy. We introduce a two-stage assembler to utilize the full length of nanopore reads. Our tool achieves superior performance in both error correction and de novo assembling nanopore reads. It requires only 8122 hours to assemble a 35X coverage human genome and achieves a 2.47-fold improvement in NG50. Furthermore, our assembly of the human WERI cell line shows an NG50 of 22 Mbp. The high-quality assembly of nanopore reads can significantly reduce false positives in structure variation detection.
Assuntos
Nanoporos , Análise de Sequência de DNA , Linhagem Celular , Cromossomos Humanos/genética , Genoma Humano , Humanos , Retinoblastoma/genética , SoftwareRESUMO
Genistein is a natural protein tyrosine kinase inhibitor that exerts anti-cancer effect by inducing G2/M arrest and apoptosis. However, the phosphotyrosine signaling pathways mediated by genistein are largely unknown. In this study, we combined tyrosine phosphoprotein enrichment with MS-based quantitative proteomics technology to globally identify genistein-regulated tyrosine phosphoproteins aiming to depict genistein-inhibited phosphotyrosine cascades. Our experiments resulted in the identification of 213 phosphotyrosine sites on 181 genistein-regulated proteins. Many identified phosphoproteins, including nine protein kinases, eight receptors, five protein phosphatases, seven transcriptical regulators and four signal adaptors, were novel inhibitory effectors with no previously known function in the anti-cancer mechanism of genistein. Functional analysis suggested that genistein-regulated protein tyrosine phosphorylation mainly by inhibiting the activity of tyrosine kinase EGFR, PDGFR, insulin receptor, Abl, Fgr, Itk, Fyn and Src. Core signaling molecules inhibited by genistein can be functionally categorized into the canonial Receptor-MAPK or Receptor-PI3K/AKT cascades. The method used here may be suitable for the identification of inhibitory effectors and tyrosine kinases regulated by anti-cancer drugs.
Assuntos
Genisteína/farmacologia , Fosfoproteínas/metabolismo , Inibidores de Proteínas Quinases/farmacologia , Proteômica/métodos , Transdução de Sinais/efeitos dos fármacos , Sequência de Aminoácidos , Linhagem Celular Tumoral , Regulação para Baixo/efeitos dos fármacos , Fase G2/efeitos dos fármacos , Humanos , Marcação por Isótopo , Espectrometria de Massas , Mitose/efeitos dos fármacos , Dados de Sequência Molecular , Peptídeos/química , Fosfoproteínas/química , Fosforilação/efeitos dos fármacos , Fosfotirosina/metabolismo , Proteínas Tirosina Quinases/análise , Reprodutibilidade dos TestesRESUMO
Recent phosphoproteomic characterizations of Bacillus subtilis, Escherichia coli, Lactococcus lactis, Pseudomonas putida, and Pseudomonas aeruginosa have suggested that protein phosphorylation on serine, threonine, and tyrosine residues is a major regulatory post-translational modification in bacteria. In this study, we carried out a global and site-specific phosphoproteomic analysis on the Gram-positive pathogenic bacterium Streptococcus pneumoniae. One hundred and two unique phosphopeptides and 163 phosphorylation sites with distributions of 47%/44%/9% for Ser/Thr/Tyr phosphorylations from 84 S. pneumoniae proteins were identified through the combined use of TiO(2) enrichment and LC-MS/MS determination. The identified phosphoproteins were found to be involved in various biological processes including carbon/protein/nucleotide metabolisms, cell cycle and division regulation. A striking characteristic of S. pneumoniae phosphoproteome is the large number of multiple species-specific phosphorylated sites, indicating that high level of protein phosphorylation may play important roles in regulating many metabolic pathways and bacterial virulence.