Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 18.708
Filtrar
1.
PLoS Comput Biol ; 16(9): e1008108, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32898133

RESUMO

Existing models for assessing microbiome sequencing such as operational taxonomic units (OTUs) can only test predictors' effects on OTUs. There is limited work on how to estimate the correlations between multiple OTUs and incorporate such relationship into models to evaluate longitudinal OTU measures. We propose a novel approach to estimate OTU correlations based on their taxonomic structure, and apply such correlation structure in Generalized Estimating Equations (GEE) models to estimate both predictors' effects and OTU correlations. We develop a two-part Microbiome Taxonomic Longitudinal Correlation (MTLC) model for multivariate zero-inflated OTU outcomes based on the GEE framework. In addition, longitudinal and other types of repeated OTU measures are integrated in the MTLC model. Extensive simulations have been conducted to evaluate the performance of the MTLC method. Compared with the existing methods, the MTLC method shows robust and consistent estimation, and improved statistical power for testing predictors' effects. Lastly we demonstrate our proposed method by implementing it into a real human microbiome study to evaluate the obesity on twins.


Assuntos
Biologia Computacional/métodos , DNA Bacteriano , Microbioma Gastrointestinal/genética , Modelos Estatísticos , Análise de Sequência de DNA/métodos , DNA Bacteriano/classificação , DNA Bacteriano/genética , Bases de Dados Genéticas , Humanos
2.
PLoS Comput Biol ; 16(9): e1008194, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32936799

RESUMO

CRISPR screens are a powerful technology for the identification of genome sequences that affect cellular phenotypes such as gene expression, survival, and proliferation. By targeting non-coding sequences for perturbation, CRISPR screens have the potential to systematically discover novel functional sequences, however, a lack of purpose-built analysis tools limits the effectiveness of this approach. Here we describe RELICS, a Bayesian hierarchical model for the discovery of functional sequences from CRISPR screens. RELICS specifically addresses many of the challenges of non-coding CRISPR screens such as the unknown locations of functional sequences, overdispersion in the observed single guide RNA counts, and the need to combine information across multiple pools in an experiment. RELICS outperforms existing methods with higher precision, higher recall, and finer-resolution predictions on simulated datasets. We apply RELICS to published CRISPR interference and CRISPR activation screens to predict and experimentally validate novel regulatory sequences that are missed by other analysis methods. In summary, RELICS is a powerful new analysis method for CRISPR screens that enables the discovery of functional sequences with unprecedented resolution and accuracy.


Assuntos
Sistemas CRISPR-Cas/genética , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Teorema de Bayes , Humanos , Células Jurkat , RNA Guia/genética
3.
PLoS Comput Biol ; 16(9): e1008229, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32936825

RESUMO

Accurately predicting essential genes using computational methods can greatly reduce the effort in finding them via wet experiments at both time and resource scales, and further accelerate the process of drug discovery. Several computational methods have been proposed for predicting essential genes in model organisms by integrating multiple biological data sources either via centrality measures or machine learning based methods. However, the methods aiming to predict human essential genes are still limited and the performance still need improve. In addition, most of the machine learning based essential gene prediction methods are lack of skills to handle the imbalanced learning issue inherent in the essential gene prediction problem, which might be one factor affecting their performance. We propose a deep learning based method, DeepHE, to predict human essential genes by integrating features derived from sequence data and protein-protein interaction (PPI) network. A deep learning based network embedding method is utilized to automatically learn features from PPI network. In addition, 89 sequence features were derived from DNA sequence and protein sequence for each gene. These two types of features are integrated to train a multilayer neural network. A cost-sensitive technique is used to address the imbalanced learning problem when training the deep neural network. The experimental results for predicting human essential genes show that our proposed method, DeepHE, can accurately predict human gene essentiality with an average performance of AUC higher than 94%, the area under precision-recall curve (AP) higher than 90%, and the accuracy higher than 90%. We also compare DeepHE with several widely used traditional machine learning models (SVM, Naïve Bayes, Random Forest, and Adaboost) using the same features and utilizing the same cost-sensitive technique to against the imbalanced learning issue. The experimental results show that DeepHE significantly outperforms the compared machine learning models. We have demonstrated that human essential genes can be accurately predicted by designing effective machine learning algorithm and integrating representative features captured from available biological data. The proposed deep learning framework is effective for such task.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Genes Essenciais/genética , Análise de Sequência de DNA/métodos , DNA/genética , Humanos , Redes Neurais de Computação , Mapas de Interação de Proteínas/genética
4.
BMC Bioinformatics ; 21(Suppl 8): 260, 2020 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938358

RESUMO

BACKGROUND: In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory consuming due to the use of additional heavy data structures (namely, the Suffix and LCP arrays), besides the BWT. RESULTS: In this paper, we introduce a new algorithm and the corresponding tool ebwt2InDel that (i) extend the framework of [Prezza et al., AMB 2019] to detect also INDELs, and (ii) implements recent algorithmic findings that allow to perform the whole analysis using just the BWT, thus reducing the working space by one order of magnitude and allowing the analysis of full genomes. Finally, we describe a simple strategy for effectively parallelizing our tool for SNP detection only. On a 24-cores machine, the parallel version of our tool is one order of magnitude faster than the sequential one. The tool ebwt2InDel is available at github.com/nicolaprezza/ebwt2InDel . CONCLUSIONS: Results on a synthetic dataset covered at 30x (Human chromosome 1) show that our tool is indeed able to find up to 83% of the SNPs and 72% of the existing INDELs. These percentages considerably improve the 71% of SNPs and 51% of INDELs found by the state-of-the art tool based on de Bruijn graphs. We furthermore report results on larger (real) Human whole-genome sequencing experiments. Also in these cases, our tool exhibits a much higher sensitivity than the state-of-the art tool.


Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Humanos , Polimorfismo de Nucleotídeo Único
5.
BMC Bioinformatics ; 21(1): 410, 2020 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938397

RESUMO

BACKGROUND: Motif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a ChIP-seq experiment. Score-based MEA determines motifs enriched in regions exhibiting extreme differences in regulatory activity, but existing methods do not control for biases in GC content or dinucleotide composition. This lack of control for sequence bias, such as those often found in CpG islands, can obscure the enrichment of biologically relevant motifs. RESULTS: We developed Motif Enrichment In Ranked Lists of Peaks (MEIRLOP), a novel MEA method that determines enrichment of TF binding motifs in a list of scored regulatory regions, while controlling for sequence bias. In this study, we compare MEIRLOP against other MEA methods in identifying binding motifs found enriched in differentially active regulatory regions after interferon-beta stimulus, finding that using logistic regression and covariates improves the ability to call enrichment of ISGF3 binding motifs from differential acetylation ChIP-seq data compared to other methods. Our method achieves similar or better performance compared to other methods when quantifying the enrichment of TF binding motifs from ENCODE TF ChIP-seq datasets. We also demonstrate how MEIRLOP is broadly applicable to the analysis of numerous types of NGS assays and experimental designs. CONCLUSIONS: Our results demonstrate the importance of controlling for sequence bias when accurately identifying enriched DNA sequence motifs using score-based MEA. MEIRLOP is available for download from https://github.com/npdeloss/meirlop under the MIT license.


Assuntos
Biologia Computacional/métodos , Motivos de Nucleotídeos/genética , Análise de Sequência de DNA/métodos , Viés , Humanos
6.
Mem Inst Oswaldo Cruz ; 115: e200220, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32935751

RESUMO

BACKGROUND: The Nyssomyia genus and Lutzomyia subgenus include medical important species that are Latin American leishmaniases vectors. Little is known about the phylogenetic relationships of closely-related species in each of these taxonomic groups that are morphologically indistinguishable or differentiated by very subtle details. OBJECTIVES: We inferred the phylogenetic relationships of closely-related species within both the Nyssomyia genus and the Lutzomyia subgenus using a cytochrome c oxidase subunit I (COI) fragment. METHODS: The sampling was carried out from 11 Argentinean localities. For genetic analyses, we used GenBank sequences in addition to our sequences from Argentina. Kimura 2-parameter (K2P) genetic distance and nucleotide divergence (Da) was calculated between closely-related species of Nyssomyia genus, Lutzomyia subgenus and between clades of Lutzomyia longipalpis complex. FINDINGS: The K2P and Da values within species of Nyssomyia genus and Lutzomyia subgenus were lower than the divergence detected between clades of Lu. longipalpis complex. The haplotype network analyses within Lutzomyia subgenus showed shared haplotypes between species, contrary to Nyssomyia genus with none haplotype shared. Bayesian inference within Nyssomyia genus presented structuring by species. MAIN CONCLUSIONS: This study evidences the phylogenetic proximity among closely-related species within Nyssomyia genus and Lutzomyia subgenus. The COI sequences of Nyssomyia neivai derived from the present study are the first available in GenBank.


Assuntos
Psychodidae/classificação , Psychodidae/genética , Animais , Argentina , Sequência de Bases , Teorema de Bayes , Leishmaniose , Filogenia , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA/métodos
7.
BMC Bioinformatics ; 21(Suppl 13): 388, 2020 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-32938392

RESUMO

BACKGROUND: In Overlap-Layout-Consensus (OLC) based de novo assembly, all reads must be compared with every other read to find overlaps. This makes the process rather slow and limits the practicality of using de novo assembly methods at a large scale in the field. Darwin is a fast and accurate read overlapper that can be used for de novo assembly of state-of-the-art third generation long DNA reads. Darwin is designed to be hardware-friendly and can be accelerated on specialized computer system hardware to achieve higher performance. RESULTS: This work accelerates Darwin on GPUs. Using real Pacbio data, our GPU implementation on Tesla K40 has shown a speedup of 109x vs 8 CPU threads of an Intel Xeon machine and 24x vs 64 threads of IBM Power8 machine. The GPU implementation supports both linear and affine gap, scoring model. The results show that the GPU implementation can achieve the same high speedup for different scoring schemes. CONCLUSIONS: The GPU implementation proposed in this work shows significant improvement in performance compared to the CPU version, thereby making it accessible for utilization as a practical read overlapper in a DNA assembly pipeline. Furthermore, our GPU acceleration can also be used for performing fast Smith-Waterman alignment between long DNA reads. GPU hardware has become commonly available in the field today, making the proposed acceleration accessible to a larger public. The implementation is available at https://github.com/Tongdongq/darwin-gpu .


Assuntos
Algoritmos , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos
8.
PLoS One ; 15(9): e0238467, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32877464

RESUMO

Resolving the genetic architecture of painful neuropathy will lead to better disease management strategies. We aimed to develop a reliable method to re-sequence multiple genes in a large cohort of painful neuropathy patients at low cost. In this study, we compared sensitivity, specificity, targeting efficiency, performance and cost effectiveness of Molecular Inversion Probes-Next generation sequencing (MIPs-NGS) and TruSeq® Custom Amplicon-Next generation sequencing (TSCA-NGS). Capture probes were designed to target nine sodium channel genes (SCN3A, SCN8A-SCN11A, and SCN1B-SCN4B). One hundred sixty-six patients with diabetic and idiopathic neuropathy were tested by both methods, 70 patients were validated by Sanger sequencing. Sensitivity, specificity and performance of both techniques were comparable, and in agreement with Sanger sequencing. The average targeted regions coverage for MIPs-NGS was 97.3% versus 93.9% for TSCA-NGS. MIPs-NGS has a more versatile assay design and is more flexible than TSCA-NGS. The cost of MIPs-NGS is >5 times cheaper than TSCA-NGS when 500 or more samples are tested. In conclusion, MIPs-NGS is a reliable, flexible, and relatively inexpensive method to detect genetic variations in a large cohort of patients. In our centers, MIPs-NGS is currently implemented as a routine diagnostic tool for screening of sodium channel genes in painful neuropathy patients.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sondas Moleculares/genética , Análise de Sequência de DNA/métodos , Inversão Cromossômica/genética , Sondas de DNA/genética , Testes Genéticos/métodos , Humanos , Mutação , Neuralgia/genética , Sensibilidade e Especificidade
9.
PLoS Comput Biol ; 16(8): e1008030, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32804924

RESUMO

The human body generates a diverse set of high affinity antibodies, the soluble form of B cell receptors (BCRs), that bind to and neutralize invading pathogens. The natural development of BCRs must be understood in order to design vaccines for highly mutable pathogens such as influenza and HIV. BCR diversity is induced by naturally occurring combinatorial "V(D)J" rearrangement, mutation, and selection processes. Most current methods for BCR sequence analysis focus on separately modeling the above processes. Statistical phylogenetic methods are often used to model the mutational dynamics of BCR sequence data, but these techniques do not consider all the complexities associated with B cell diversification such as the V(D)J rearrangement process. In particular, standard phylogenetic approaches assume the DNA bases of the progenitor (or "naive") sequence arise independently and according to the same distribution, ignoring the complexities of V(D)J rearrangement. In this paper, we introduce a novel approach to Bayesian phylogenetic inference for BCR sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique not only integrates a naive rearrangement model with a phylogenetic model for BCR sequence evolution but also naturally accounts for uncertainty in all unobserved variables, including the phylogenetic tree, via posterior distribution sampling.


Assuntos
Modelos Genéticos , Receptores de Antígenos de Linfócitos B , Análise de Sequência de DNA/métodos , Teorema de Bayes , Biologia Computacional , Rearranjo Gênico do Linfócito B/genética , Humanos , Cadeias de Markov , Filogenia , Receptores de Antígenos de Linfócitos B/classificação , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos B/imunologia , Hipermutação Somática de Imunoglobulina/genética , Vacinas
10.
PLoS One ; 15(8): e0237507, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32813726

RESUMO

DNA barcoding can identify biological species and provides an important tool in diverse applications, such as conserving species and identifying pathogens, among many others. If combined with statistical tests, DNA barcoding can focus taxonomic scrutiny onto anomalous species identifications based on morphological features. Accordingly, we put nonparametric tests into a taxonomic context to answer questions about our sequence dataset of the formal fungal barcode, the nuclear ribosomal internal transcribed spacer (ITS). For example, does DNA barcoding concur with annotated species identifications significantly better if expert taxonomists produced the annotations? Does species assignment improve significantly if sequences are restricted to lengths greater than 500 bp? Both questions require a figure of merit to measure of the accuracy of species identification, typically provided by the probability of correct identification (PCI). Many articles on DNA barcoding use variants of PCI to measure the accuracy of species identification, but do not provide the variants with names, and the absence of explicit names hinders the recognition that the different variants are not comparable from study to study. We provide four variant PCIs with a name and show that for fixed data they follow systematic inequalities. Despite custom, therefore, their comparison is at a minimum problematic. Some popular PCI variants are particularly vulnerable to errors in species annotation, insensitive to improvements in a barcoding pipeline, and unable to predict identification accuracy as a database grows, making them unsuitable for many purposes. Generally, the Fractional PCI has the best properties as a figure of merit for species identification. The fungal genus Ramaria provides unusual taxonomic difficulties. As a case study, it shows that a good taxonomic background can be combined with the pertinent summary statistics of molecular results to improve the identification of doubtful samples, linking both disciplines synergistically.


Assuntos
Código de Barras de DNA Taxonômico/métodos , DNA Fúngico/análise , DNA Espaçador Ribossômico/análise , Fungos/classificação , Fungos/genética , Análise de Sequência de DNA/métodos , Teorema de Bayes , Modelos Estatísticos , Filogenia , Especificidade da Espécie
11.
PLoS One ; 15(8): e0237538, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32804981

RESUMO

Dearth of genomic resources particularly, microsatellite markers in nutritionally and commercially important fruit crop, guava necessitate the development of the novel genomic SSR markers through the library enrichment techniques. Three types of 3' -biotinylated oligonucleotide probes [(CT)14, (GT)12, and (AAC)8] were used to develop microsatellite enriched libraries. A total of 153 transformed colonies were screened of which 111 positive colonies were subjected for Sanger sequencing. The clones having more than five motif repeats were selected for primer designing and a total of 38 novel genomic simple sequence repeats could be identified. The g-SSRs had the motif groups ranging from monomer to pentamer out of which dimer group occurred the most (89.47%). Out of 38 g-SSRs markers developed, 26 were found polymorphic, which showed substantial genetic diversity among the guava genotypes including wild species. The average number of alleles per locus, major allele frequency, gene diversity, expected heterozygosity and polymorphic information content of 26 SSRs were 3.46, 0.56, 0.53, 0.29 and 0.46, respectively. The rate of cross-species transferability of the developed g-SSR loci varied from 38.46 to 80.77% among the studied wild Psidium species. Generation of N-J tree based on 26 SSRs grouped the 40 guava genotypes into six clades with two out-groups, the wild guava species showed genetic distinctness from cultivated genotypes. Furthermore, population structure analysis grouped the guava genotypes into three genetic groups, which were partly supported by PCoA and N-J tree. Further, AMOVA and PCoA deciphered high genetic diversity among the present set of guava genotypes including wild species. Thus, the developed novel g-SSRs were found efficient and informative for diversity and population structure analyses of the guava genotypes. These developed novel g-SSR loci would add to the new genomic resource in guava, which may be utilized in genomic-assisted guava breeding.


Assuntos
Repetições de Microssatélites , Psidium/classificação , Análise de Sequência de DNA/métodos , DNA de Plantas/genética , Evolução Molecular , Frequência do Gene , Variação Genética , Genética Populacional , Biblioteca Genômica , Psidium/genética , Especificidade da Espécie
12.
PLoS One ; 15(8): e0236483, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32853203

RESUMO

Takifugu rubripes is more expensive than other species of the genus because of its high protein content and special flavor. However, it is easily confused with imported T. chinensis and T. pseudommus because they have similar morphological characteristics. We identified single nucleotide polymorphism (SNP) markers of T. rubripes by genotyping-by-sequencing (GBS) and evaluated their ability to distinguish among T. rubripes, T. chinensis, and T. pseudommus. In all, 18 polymorphic SNPs were subjected to phylogenetic analyses of the three Takifugu species. Additionally, we subjected a second set of samples to Sanger sequencing to verify that the polymorphic SNPs could be used to evaluate the genetic variation among the three Takifugu species. A phylogenetic tree that included the analyzed sequence of set A, which is referred to as the reference sequence, and a validation sequence of set B with 18 SNPs were produced. Based on this phylogenetic tree and STRUCTURE analyses, T. rubripes, T. chinensis and T. pseudommus have low genetic variation and should be considered the same gene pool. Our findings suggest that further studies are needed to estimate the genetic association of the three Takifugu species.


Assuntos
Técnicas de Genotipagem/métodos , Filogenia , Análise de Sequência de DNA/métodos , Takifugu/genética , Animais , Genótipo , Polimorfismo de Nucleotídeo Único/genética , Takifugu/classificação , Transcriptoma/genética
13.
Mol Cell ; 79(5): 797-811.e8, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32750314

RESUMO

Pausing by RNA polymerase (RNAP) during transcription elongation, in which a translocating RNAP uses a "stepping" mechanism, has been studied extensively, but pausing by RNAP during initial transcription, in which a promoter-anchored RNAP uses a "scrunching" mechanism, has not. We report a method that directly defines the RNAP-active-center position relative to DNA with single-nucleotide resolution (XACT-seq; "crosslink-between-active-center-and-template sequencing"). We apply this method to detect and quantify pausing in initial transcription at 411 (∼4,000,000) promoter sequences in vivo in Escherichia coli. The results show initial-transcription pausing can occur in each nucleotide addition during initial transcription, particularly the first 4 to 5 nucleotide additions. The results further show initial-transcription pausing occurs at sequences that resemble the consensus sequence element for transcription-elongation pausing. Our findings define the positional and sequence determinants for initial-transcription pausing and establish initial-transcription pausing is hard coded by sequence elements similar to those for transcription-elongation pausing.


Assuntos
DNA Bacteriano/metabolismo , RNA Polimerases Dirigidas por DNA/metabolismo , Regiões Promotoras Genéticas , Análise de Sequência de DNA/métodos , Domínio Catalítico , Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Transcrição Genética
14.
BMC Bioinformatics ; 21(1): 341, 2020 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-32753028

RESUMO

BACKGROUND: Single Molecule Sequencing (SMS) technology can produce longer reads with higher sequencing error rate. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. Most existing mapping tools generally adopt the traditional seed-and-extend strategy, and the candidate aligned regions for each query read are selected either by counting the number of matched seeds or chaining a group of seeds. However, for all the existing mapping tools, the coverage ratio of the alignment region to the query read is lower, and the read alignment quality and efficiency need to be improved. Here, we introduce smsMap, a novel mapping tool that is specifically designed to map the long reads of SMS to a reference genome. RESULTS: smsMap was evaluated with other existing seven SMS mapping tools (e.g., BLASR, minimap2, and BWA-MEM) on both simulated and real-life SMS datasets. The experimental results show that smsMap can efficiently achieve higher aligned read coverage ratio and has higher sensitivity that can align more sequences and bases to the reference genome. Additionally, smsMap is more robust to sequencing errors. CONCLUSIONS: smsMap is computationally efficient to align SMS reads, especially for the larger size of the reference genome (e.g., H. sapiens genome with over 3 billion base pairs). The source code of smsMap can be freely downloaded from https://github.com/NWPU-903PR/smsMap .


Assuntos
Alinhamento de Sequência , Análise de Sequência de DNA/métodos , Software , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Escherichia coli/genética , Humanos , Fatores de Tempo
15.
Nat Commun ; 11(1): 4025, 2020 08 12.
Artigo em Inglês | MEDLINE | ID: mdl-32788667

RESUMO

Droplet-based high throughput single cell sequencing techniques tremendously advanced our insight into cell-to-cell heterogeneity. However, those approaches only allow analysis of one extremity of the transcript after short read sequencing. In consequence, information on splicing and sequence heterogeneity is lost. To overcome this limitation, several approaches that use long-read sequencing were introduced recently. Yet, those techniques are limited by low sequencing depth and/or lacking or inaccurate assignment of unique molecular identifiers (UMIs), which are critical for elimination of PCR bias and artifacts. We introduce ScNaUmi-seq, an approach that combines the high throughput of Oxford Nanopore sequencing with an accurate cell barcode and UMI assignment strategy. UMI guided error correction allows to generate high accuracy full length sequence information with the 10x Genomics single cell isolation system at high sequencing depths. We analyzed transcript isoform diversity in embryonic mouse brain and show that ScNaUmi-seq allows defining splicing and SNVs (RNA editing) at a single cell level.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento por Nanoporos , Nanoporos , Transcriptoma , Animais , Encéfalo , Expressão Gênica , Perfilação da Expressão Gênica , Genômica , Camundongos , Camundongos Endogâmicos C57BL , Isoformas de Proteínas , Receptores de AMPA/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
16.
Nat Commun ; 11(1): 3868, 2020 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-32747648

RESUMO

Archaeological research documents major technological shifts among people who have lived in the southern tip of South America (South Patagonia) during the last thirteen millennia, including the development of marine-based economies and changes in tools and raw materials. It has been proposed that movements of people spreading culture and technology propelled some of these shifts, but these hypotheses have not been tested with ancient DNA. Here we report genome-wide data from 20 ancient individuals, and co-analyze it with previously reported data. We reveal that immigration does not explain the appearance of marine adaptations in South Patagonia. We describe partial genetic continuity since ~6600 BP and two later gene flows correlated with technological changes: one between 4700-2000 BP that affected primarily marine-based groups, and a later one impacting all <2000 BP groups. From ~2200-1200 BP, mixture among neighbors resulted in a cline correlated to geographic ordering along the coast.


Assuntos
DNA Antigo/análise , Fósseis , Fluxo Gênico , Genoma Humano/genética , Migração Humana , Arqueologia/métodos , Argentina , Osso e Ossos/metabolismo , Chile , DNA Mitocondrial/classificação , DNA Mitocondrial/genética , Variação Genética , Geografia , Humanos , Filogenia , Datação Radiométrica/métodos , Análise de Sequência de DNA/métodos , Dente/metabolismo
17.
Medicine (Baltimore) ; 99(30): e21331, 2020 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-32791730

RESUMO

The aim of this study was to elucidate the possible association between migration inhibitory factor (MIF)-173G/C gene polymorphisms and transcript and plasma levels of MIF in spinal tuberculosis (TB) patients. Clinical data were collected from 254 spinal TB patients and 262 healthy controls participating in the study. The genotype of the MIF-173G/C gene was amplified by polymerase chain reaction and genotyped by DNA sequencing technology. The level of mRNA expression was determined by real-time polymerase chain reaction and MIF plasma levels were measured by a solid-phase enzyme-linked immunosorbent assay. The frequency of the C allele and GC+CC genotype in MIF-173G/C was over-represented in spinal TB patients. The mean MIF mRNA level in spinal TB patients and patients with the GG and GC+CC genotype were significantly lower than controls; however, our study also indicated that the MIF concentration in spinal TB patients and patients with the GG and GC+CC genotypes were significantly higher than controls. Spinal TB patients with the GG genotype had higher MIF plasma levels than patients with the GC+CC genotype. The C-reactive protein level and erythrocyte sedimentation rate was correlated with the MIF plasma level. In summary, the association between the MIF-173G/C genetic polymorphism, reduced transcript and increased plasma levels of MIF in spinal TB patients, and MIF may play an important role in the occurrence, development, and damage of spinal TB in the northern Province population of China.


Assuntos
Fatores Inibidores da Migração de Macrófagos/sangue , Polimorfismo Genético/genética , RNA Mensageiro/genética , Tuberculose da Coluna Vertebral/genética , Adulto , Alelos , Sedimentação Sanguínea , Proteína C-Reativa/metabolismo , Estudos de Casos e Controles , China/epidemiologia , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase , Análise de Sequência de DNA/métodos
18.
PLoS One ; 15(7): e0235406, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32609774

RESUMO

Pathogens pose a major risk to wild host populations, especially in the face of ongoing biodiversity declines. Beak and feather disease virus (BFDV) can affect most if not all members of one of the largest and most threatened bird orders world-wide, the Psittaciformes. Signs of disease can be severe and mortality rates high. Its broad host range makes it a risk to threatened species in particular, because infection can occur via spill-over from abundant hosts. Despite these risks, surveillance of BFDV in locally abundant wild host species has been lacking. We used qPCR and haemagglutination assays to investigate BFDV prevalence, load and shedding in seven abundant host species in the wild in south-east Australia: Crimson Rosellas (Platycercus elegans), Eastern Rosellas (Platycercus eximius), Galahs (Eolophus roseicapillus), Sulphur-crested Cockatoos (Cacatua galerita), Blue-winged Parrots (Neophema chrysostoma), Rainbow Lorikeets (Trichoglossus moluccanus) and Red-rumped Parrots (Psephotus haematonotus). We found BFDV infection in clinically normal birds in six of the seven species sampled. We focused our analysis on the four most commonly caught species, namely Crimson Rosellas (BFDV prevalence in blood samples: 41.8%), Sulphur-crested Cockatoos (20.0%), Blue-winged Parrots (11.8%) and Galahs (8.8%). Species, but not sex, was a significant predictor for BFDV prevalence and load. 56.1% of BFDV positive individuals were excreting BFDV antigen into their feathers, indicative of active viral replication with shedding. Being BFDV positive in blood samples predicted shedding in Crimson Rosellas. Our study confirms that BFDV is endemic in our study region, and can inform targeted disease management by providing comparative data on interspecies variation in virus prevalence, load and shedding.


Assuntos
Doenças das Aves , Infecções por Circoviridae , Circovirus/isolamento & purificação , Psittaciformes/virologia , Animais , Austrália , Doenças das Aves/epidemiologia , Doenças das Aves/virologia , Infecções por Circoviridae/epidemiologia , Infecções por Circoviridae/virologia , Circovirus/fisiologia , DNA Viral/genética , Espécies em Perigo de Extinção , Prevalência , Análise de Sequência de DNA/métodos , Carga Viral , Replicação Viral , Eliminação de Partículas Virais
19.
Nat Commun ; 11(1): 3551, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-32669542

RESUMO

Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE's effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Anotação de Sequência Molecular/métodos , Fenótipo , Análise de Sequência de DNA/métodos , Sítios de Ligação/genética , Conjuntos de Dados como Assunto , Escherichia coli/genética , Técnicas de Inativação de Genes , Genoma Bacteriano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Sequências Reguladoras de Ácido Nucleico/genética , Ribossomos/metabolismo
20.
BMC Bioinformatics ; 21(1): 287, 2020 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-32631226

RESUMO

BACKGROUND: Software tools for analyzing DNA methylation do not provide graphical results which can be easily identified, but huge text files containing the alignment of the samples and their methylation status at a resolution of base pairs. There have been proposed different tools and methods for finding Differentially Methylated Regions (DMRs) among different samples, but the execution time required by these tools is large, and the visualization of their results is far from being interactive. Additionally, these methods show more accurate results when identifying simulated DM regions that are long and have small within-group variation, but they have low concordance when used with real datasets, probably due to the different approaches they use for DMR identification. Thus, a tool which automatically detects DMRs among different samples and interactively visualizes DMRs at different scales (from a bunch to ten of millions of DNA locations) can be the key for shortening the DNA methylation analysis process in many studies. RESULTS: In this paper, we propose a software tool based on the wavelet transform. This mathematical tool allows the fast automatic DMR detection by simple comparison of different signals at different resolution levels. Also, it allows an interactive visualization of the DMRs found at different resolution levels. The tool is publicly available at https://grev-uv.github.io/ , and it is part of a complete suite of tools which allow to carry out the complete process of DNA alignment and methylation analysis, creation of methylation maps of the whole genome, and the detection and visualization of DMRs between different samples. CONCLUSIONS: The validation of the developed software tool shows similar concordance with other well-known and extended tools when used with real and synthetic data. The batch mode of the tool is capable of automatically detecting the existing DMRs for half (twelve) of the human chromosomes between two sets of six samples (whose.csv files after the alignment and mapping procedures have an aggregated size of 108 Gigabytes) in around three hours and a half. When compared to other well-known tools, HPG-DHunter only requires around 15% of the execution time required by other tools for detecting the DMRs.


Assuntos
Metilação de DNA/genética , Análise de Sequência de DNA/métodos , Software/normas , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA