Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 10.177
Filtrar
1.
Nat Commun ; 11(1): 4918, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-33004800

RESUMO

In order to control and eradicate epidemic cholera, we need to understand how epidemics begin, how they spread, and how they decline and eventually end. This requires extensive sampling of epidemic disease over time, alongside the background of endemic disease that may exist concurrently with the epidemic. The unique circumstances surrounding the Argentinian cholera epidemic of 1992-1998 presented an opportunity to do this. Here, we use 490 Argentinian V. cholerae genome sequences to characterise the variation within, and between, epidemic and endemic V. cholerae. We show that, during the 1992-1998 cholera epidemic, the invariant epidemic clone co-existed alongside highly diverse members of the Vibrio cholerae species in Argentina, and we contrast the clonality of epidemic V. cholerae with the background diversity of local endemic bacteria. Our findings refine and add nuance to our genomic definitions of epidemic and endemic cholera, and are of direct relevance to controlling current and future cholera epidemics.


Assuntos
Cólera/microbiologia , Doenças Endêmicas/prevenção & controle , Genoma Bacteriano/genética , Pandemias/prevenção & controle , Vibrio cholerae/genética , Argentina/epidemiologia , Cólera/epidemiologia , Cólera/prevenção & controle , DNA Bacteriano/genética , DNA Bacteriano/isolamento & purificação , História do Século XIX , História do Século XX , Humanos , Anotação de Sequência Molecular , Pandemias/história , Filogenia , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Vibrio cholerae/isolamento & purificação , Vibrio cholerae/patogenicidade
2.
F1000Res ; 92020.
Artigo em Inglês | MEDLINE | ID: mdl-32489650

RESUMO

GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license  ( https://github.com/gpertea/gffread, https://github.com/gpertea/gffcompare).


Assuntos
Biologia Computacional , Genômica , Software , Genoma , Anotação de Sequência Molecular
3.
Nat Commun ; 11(1): 4703, 2020 09 17.
Artigo em Inglês | MEDLINE | ID: mdl-32943643

RESUMO

Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average N = 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations.


Assuntos
Aprendizado Profundo , Doença/genética , Anotação de Sequência Molecular , Alelos , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Histonas/genética , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único
4.
PLoS One ; 15(9): e0237493, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32946440

RESUMO

The phyllosphere epiphytic microbiome is composed of microorganisms that colonize the external aerial portions of plants. Relationships of plant responses to specific microorganisms-both pathogenic and beneficial-have been examined, but the phyllosphere microbiome functional and metabolic profile responses are not well described. Changing crop growth conditions, such as increased drought, can have profound impacts on crop productivity. Also, epiphytic microbial communities provide a new target for crop yield optimization. We compared Zea mays leaf microbiomes collected under drought and well-watered conditions by examining functional gene annotation patterns across three physically disparate locations each with and without drought treatment, through the application of short read metagenomic sequencing. Drought samples exhibited different functional sequence compositions at each of the three field sites. Maize phyllosphere functional profiles revealed a wide variety of metabolic and regulatory processes that differed in drought and normal water conditions and provide key baseline information for future selective breeding.


Assuntos
Folhas de Planta/genética , Folhas de Planta/microbiologia , Zea mays/genética , Zea mays/microbiologia , Secas , Redes Reguladoras de Genes , Genes de Plantas , Metagenômica , Microbiota , Anotação de Sequência Molecular , Folhas de Planta/fisiologia , Estresse Fisiológico , Água/metabolismo , Zea mays/fisiologia
5.
Nat Commun ; 11(1): 4488, 2020 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-32901040

RESUMO

Sustainable food production in the context of climate change necessitates diversification of agriculture and a more efficient utilization of plant genetic resources. Fonio millet (Digitaria exilis) is an orphan African cereal crop with a great potential for dryland agriculture. Here, we establish high-quality genomic resources to facilitate fonio improvement through molecular breeding. These include a chromosome-scale reference assembly and deep re-sequencing of 183 cultivated and wild Digitaria accessions, enabling insights into genetic diversity, population structure, and domestication. Fonio diversity is shaped by climatic, geographic, and ethnolinguistic factors. Two genes associated with seed size and shattering showed signatures of selection. Most known domestication genes from other cereal models however have not experienced strong selection in fonio, providing direct targets to rapidly improve this crop for agriculture in hot and dry environments.


Assuntos
Digitaria/genética , Grão Comestível/genética , África , Agricultura/métodos , Mudança Climática , Digitaria/classificação , Domesticação , Grão Comestível/classificação , Evolução Molecular , Variação Genética , Genoma de Planta , Anotação de Sequência Molecular , Seleção Genética , Especificidade da Espécie
6.
Medicine (Baltimore) ; 99(34): e21863, 2020 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-32846838

RESUMO

Dermatomyositis is a common connective tissue disease. The occurrence and development of dermatomyositis is a result of multiple factors, but its exact pathogenesis has not been fully elucidated. Here, we used biological information method to explore and predict the major disease related genes of dermatomyositis and to find the underlying pathogenic molecular mechanism.The gene expression data of GDS1956, GDS2153, GDS2855, and GDS3417 including 94 specimens, 66 cases of dermatomyositis specimens and 28 cases of normal specimens, were obtained from the Gene Expression Omnibus database. The 4 microarray gene data groups were combined to get differentially expressed genes (DEGs). The gene ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichments of DEGs were operated by the database for annotation, visualization and integrated discovery and KEGG orthology based annotation system databases, separately. The protein-protein interaction networks of the DEGs were built from the STRING website. A total of 4097 DEGs were extracted from the 4 Gene Expression Omnibus datasets, of which 2213 genes were upregulated, and 1884 genes were downregulated. Gene ontology analysis indicated that the biological functions of DEGs focused primarily on response to virus, type I interferon signaling pathway and negative regulation of viral genome replication. The main cellular components include extracellular space, cytoplasm, and blood microparticle. The molecular functions include protein binding, double-stranded RNA binding and MHC class I protein binding. KEGG pathway analysis showed that these DEGs were mainly involved in the toll-like receptor signaling pathway, cytosolic DNA-sensing pathway, RIG-I-like receptor signaling pathway, complement and coagulation cascades, arginine and proline metabolism, phagosome signaling pathway. The following 13 closely related genes, XAF1, NT5E, UGCG, GBP2, TLR3, DDX58, STAT1, GBP1, PLSCR1, OAS3, SP100, IGK, and RSAD2, were key nodes from the protein-protein interaction network.This research suggests that exploring for DEGs and pathways in dermatomyositis using integrated bioinformatics methods could help us realize the molecular mechanism underlying the development of dermatomyositis, be of actual implication for the early detection and prophylaxis of dermatomyositis and afford reliable goals for the curing of dermatomyositis.


Assuntos
Biologia Computacional/instrumentação , Dermatomiosite/genética , Ontologia Genética/tendências , Interferon Tipo I/genética , Mapas de Interação de Proteínas/genética , Dermatomiosite/epidemiologia , Motivo de Ligação ao RNA de Cadeia Dupla/genética , Regulação para Baixo , Humanos , Incidência , Análise em Microsséries/métodos , Anotação de Sequência Molecular/métodos , Ligação Proteica , Transdução de Sinais , Regulação para Cima
7.
PLoS Comput Biol ; 16(7): e1008104, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32735589

RESUMO

High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse.


Assuntos
Mapeamento de Sequências Contíguas , Genoma Helmíntico , Heterozigoto , Anotação de Sequência Molecular/métodos , Nematoides/genética , Membro 1 da Subfamília B de Cassetes de Ligação de ATP/metabolismo , Algoritmos , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Funções Verossimilhança , Proteoma , Análise de Sequência de DNA
8.
Plant Mol Biol ; 104(1-2): 173-185, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32734417

RESUMO

KEY MESSAGE: A novel and major QTL for the effective tiller number was identified on chromosomal arm 1BL and validated in two genetic backgrounds The effective tiller number (ETN) substantially influences plant architecture and the wheat yield improvement. In this study, we constructed a genetic map of the 2SY (20828/SY95-71) recombinant inbred line population based on the Wheat 55K array as well as the simple sequence repeat (SSR) and Kompetitive Allele Specific PCR (KASP) markers. A comparison between the genetic and physical maps indicated the marker positions were consistent in the two maps. Additionally, we identified seven tillering-related quantitative trait locus (QTLs), including Qetn-sau-1B.1, which is a major QTL localized to a 6.17-cM interval flanked by markers AX-89635557 and AX-111544678 on chromosome 1BL. The Qetn-sau-1B.1 QTL was detected in eight environments and explained 12.12-55.71% of the phenotypic variance. Three genes associated with the ETN were detected in the physical interval of Qetn-sau-1B.1. We used a tightly linked KASP marker, KASP-AX-110129912, to further validate this QTL in two other populations with different genetic backgrounds. The results indicated that Qetn-sau-1B.1 significantly increased the ETN by up to 23.5%. The results of this study will be useful for the precise mapping and cloning of Qetn-sau-1B.1.


Assuntos
Cromossomos de Plantas , Locos de Características Quantitativas/genética , Triticum/genética , Bangladesh , Mapeamento Cromossômico , Marcadores Genéticos/genética , Genótipo , Repetições de Microssatélites , Anotação de Sequência Molecular , Fenótipo
9.
Arch Virol ; 165(10): 2397-2400, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32748177

RESUMO

Enterobacter aerogenes is a member of the ESKAPE group of bacteria, and multi-drug-resistant strains are increasingly being found. In this study, a novel bacteriophage, ATCEA85, which infects E. aerogenes, has been isolated and characterized. ATCEA85 is seen to have a circularly permuted linear double-stranded DNA genome of 47,484 base pairs in length. The closest related phage found in the databases is the Klebsiella phage Kp3, which exhibits 77% identity over a 34% query coverage. The G+C content of ATCEA85 is 56.2%, and 15 putative open reading frames are functionally annotated.


Assuntos
DNA Viral/genética , Enterobacter aerogenes/virologia , Genoma Viral , Fases de Leitura Aberta , Filogenia , Siphoviridae/genética , Composição de Bases , DNA/genética , Ontologia Genética , Anotação de Sequência Molecular , Siphoviridae/classificação , Siphoviridae/isolamento & purificação , Sequenciamento Completo do Genoma
10.
Nature ; 584(7821): 403-409, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32760000

RESUMO

The tuatara (Sphenodon punctatus)-the only living member of the reptilian order Rhynchocephalia (Sphenodontia), once widespread across Gondwana1,2-is an iconic species that is endemic to New Zealand2,3. A key link to the now-extinct stem reptiles (from which dinosaurs, modern reptiles, birds and mammals evolved), the tuatara provides key insights into the ancestral amniotes2,4. Here we analyse the genome of the tuatara, which-at approximately 5 Gb-is among the largest of the vertebrate genomes yet assembled. Our analyses of this genome, along with comparisons with other vertebrate genomes, reinforce the uniqueness of the tuatara. Phylogenetic analyses indicate that the tuatara lineage diverged from that of snakes and lizards around 250 million years ago. This lineage also shows moderate rates of molecular evolution, with instances of punctuated evolution. Our genome sequence analysis identifies expansions of proteins, non-protein-coding RNA families and repeat elements, the latter of which show an amalgam of reptilian and mammalian features. The sequencing of the tuatara genome provides a valuable resource for deep comparative analyses of tetrapods, as well as for tuatara biology and conservation. Our study also provides important insights into both the technical challenges and the cultural obligations that are associated with genome sequencing.


Assuntos
Evolução Molecular , Genoma/genética , Filogenia , Répteis/genética , Animais , Conservação dos Recursos Naturais/tendências , Feminino , Genética Populacional , Lagartos/genética , Masculino , Anotação de Sequência Molecular , Nova Zelândia , Caracteres Sexuais , Serpentes/genética , Sintenia
11.
Nucleic Acids Res ; 48(15): 8320-8331, 2020 09 04.
Artigo em Inglês | MEDLINE | ID: mdl-32749457

RESUMO

The rat is an important model organism in biomedical research for studying human disease mechanisms and treatments, but its annotated transcriptome is far from complete. We constructed a Rat Transcriptome Re-annotation named RTR using RNA-seq data from 320 samples in 11 different organs generated by the SEQC consortium. Totally, there are 52 807 genes and 114 152 transcripts in RTR. Transcribed regions and exons in RTR account for ∼42% and ∼6.5% of the genome, respectively. Of all 73 074 newly annotated transcripts in RTR, 34 213 were annotated as high confident coding transcripts and 24 728 as high confident long noncoding transcripts. Different tissues rather than different stages have a significant influence on the expression patterns of transcripts. We also found that 11 715 genes and 15 852 transcripts were expressed in all 11 tissues and that 849 house-keeping genes expressed different isoforms among tissues. This comprehensive transcriptome is freely available at http://www.unimd.org/rtr/. Our new rat transcriptome provides essential reference for genetics and gene expression studies in rat disease and toxicity models.


Assuntos
Genoma/genética , Anotação de Sequência Molecular , RNA-Seq/métodos , Transcriptoma/genética , Processamento Alternativo/genética , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Ratos , Sequenciamento Completo do Exoma
12.
PLoS One ; 15(8): e0237657, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32817676

RESUMO

The majority of genome-wide association studies (GWAS) loci are not annotated to known genes in the human genome, which renders biological interpretations difficult. Transcriptome-wide association studies (TWAS) associate complex traits with genotype-based prediction of gene expression deriving from expression quantitative loci(eQTL) studies, thus improving the interpretability of GWAS findings. However, these results can sometimes suffer from a high false positive rate, because predicted expression of different genes may be highly correlated due to linkage disequilibrium between eQTL. We propose a novel statistical method, Gene Score Regression (GSR), to detect causal gene sets for complex traits while accounting for gene-to-gene correlations. We consider non-causal genes that are highly correlated with the causal genes will also exhibit a high marginal association with the complex trait. Consequently, by regressing on the marginal associations of complex traits with the sum of the gene-to-gene correlations in each gene set, we can assess the amount of variance of the complex traits explained by the predicted expression of the genes in each gene set and identify plausible causal gene sets. GSR can operate either on GWAS summary statistics or observed gene expression. Therefore, it may be widely applied to annotate GWAS results and identify the underlying biological pathways. We demonstrate the high accuracy and computational efficiency of GSR compared to state-of-the-art methods through simulations and real data applications. GSR is openly available at https://github.com/li-lab-mcgill/GSR.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Anotação de Sequência Molecular , Herança Multifatorial/genética , Transcriptoma/genética , Regulação da Expressão Gênica/genética , Predisposição Genética para Doença , Genoma Humano/genética , Genótipo , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas
13.
PLoS One ; 15(8): e0237818, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32853245

RESUMO

Functional and enduring mammary structure is pivotal for producer profitability, and animal health and welfare in beef production. Genetic evaluations for teat and udder score in Canadian Angus cattle have previously been developed. The aim of this study was to identify genomic regions associated with teat and udder structure in Canadian Angus cows thereby enhancing knowledge of the biological architecture of these traits. Thus, we performed a weighted single-step genome wide association study (WssGWAS) to identify candidate genes for teat and udder score in 1,582 Canadian Angus cows typed with the GeneSeek® Genomic Profiler Bovine 130K SNP array. Genomically enhanced estimated breeding values (GEBVs) were converted to SNP marker effects using unequal variances for markers to calculate weights for each SNP over three iterations. At the genome wide level, we detected windows of 20 consecutive SNPs that explained more than 0.5% of the variance observed in these traits. A total of 35 and 28 windows were identified for teat and udder score, respectively, with two SNP windows in common for both traits. Using Ensembl, the SNP windows were used to search for candidate genes and quantitative trait loci (QTL). A total of 94 and 71 characterized genes were identified in the regions for teat and udder score, respectively. Of these, 7 genes were common for both traits. Gene network and enrichment analysis, using Ingenuity Pathway Analysis (IPA), signified key pathways unique to each trait. Genes of interest were associated with immune response and wound healing, adipose tissue development and morphology, and epithelial and vascular development and morphology. Genetic architecture from this GWAS confirms that teat and udder score are distinct, polygenic traits involving varying and complex biological pathways, and that genetic selection for improved teat and udder score is possible.


Assuntos
Bovinos/anatomia & histologia , Bovinos/genética , Estudo de Associação Genômica Ampla , Glândulas Mamárias Animais/anatomia & histologia , Animais , Feminino , Redes Reguladoras de Genes , Anotação de Sequência Molecular , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Software , Estatística como Assunto
14.
PLoS One ; 15(8): e0237744, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32841246

RESUMO

Both the Mediterranean (MED) species of the Bemisia tabaci whitefly complex and the greenhouse whitefly (Trialeurodes vaporariorum, TV) are important agricultural pests. The two species of whiteflies differ in many aspects such as morphology, geographical distribution, host plant range, plant virus transmission, and resistance to insecticides. However, the molecular basis underlying their differences remains largely unknown. In this study, we analyzed the genetic divergences between the transcriptomes of MED and TV. In total, 2,944 pairs of orthologous genes were identified. The average identity of amino acid sequences between the two species is 93.6%. The average nonsynonymous (Ka) and synonymous (Ks) substitution rates and the ratio of Ka/Ks of the orthologous genes are 0.0389, 2.23 and 0.0204, respectively. The low average Ka/Ks ratio indicates that orthologous genes tend to be under strong purified selection. The most divergent gene classes are related to the metabolisms of xenobiotics, cofactors, vitamins and amino acids, and this divergence may underlie the different biological characteristics between the two species of whiteflies. Genes of differential expression between the two species are enriched in carbohydrate metabolism and regulation of autophagy. These findings provide molecular clues to uncover the biological and molecular differences between the two species of whiteflies.


Assuntos
Produção Agrícola , Genes de Insetos/genética , Especiação Genética , Hemípteros/genética , Proteínas de Insetos/genética , Sequência de Aminoácidos/genética , Substituição de Aminoácidos/genética , Aminoácidos/metabolismo , Animais , Hemípteros/metabolismo , Proteínas de Insetos/metabolismo , Resistência a Inseticidas/genética , Inseticidas/farmacologia , Região do Mediterrâneo , Anotação de Sequência Molecular , RNA-Seq , Homologia de Sequência de Aminoácidos , Especificidade da Espécie , Vitaminas/metabolismo , Xenobióticos/metabolismo
15.
Gene ; 762: 145041, 2020 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-32777523

RESUMO

Mitochondrial genome sequencing has become widely used in numerous fields, including systematics, phylogeny, and evolutionary genomics. To elucidate phylogenetic relationships among members of the family Characidae, we sequenced the mitogenomes of four species within this family, namely, Aphyocharax rathbuni, Hyphessobrycon herbertaxelrodi, Hyphessobrycon megalopterus, and Prionobrama filigera. The mitogenomes were found to be 16,678-16,841 bp and encode 37 typical mitochondrial genes (13 protein-coding, 2 ribosomal RNA, and 22 transfer RNA genes). Gene arrangements in the studied species are consistent with those in the inferred ancestral fish. Most protein-coding genes in these mitogenomes have typical ATN start codons and TAR or an incomplete stop codon T-. Phylogenetic relationships based on Bayesian inference and maximum-likelihood methods indicated that A. rathbuni, H. herbertaxelrodi, H. megalopterus, and P. filigera belong to the Characidae family. Of the 15 Characidae species studied, three pairs were of the same genus, but the results for only one pair were well supported. This phylogenetic classification is inconsistent with those described in previous morphological and taxonomic studies on this family. Thus, systematic classification of the Characidae requires further examination. Our findings yield new mitogenomic data that will provide a basis for future phylogenetic and taxonomic studies.


Assuntos
Caraciformes/genética , Genoma Mitocondrial , Filogenia , Animais , Caraciformes/classificação , Códon/genética , Anotação de Sequência Molecular , Fases de Leitura Aberta , RNA Ribossômico/genética , RNA de Transferência/genética
16.
Gene ; 762: 145026, 2020 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-32781193

RESUMO

Cannabis has been cultivated for millennia for medicinal, industrial and recreational uses. Our long-term goal is to compare the transcriptomes of cultivars with different cannabinoid profiles for therapeutic purposes. Here we describe the de novo assembly, annotation and initial analysis of two cultivars of Cannabis, a high THC variety and a CBD plus THC variety. Cultivars were grown under different lighting conditions; flower buds were sampled over 71 days. Cannabinoid profiles were determined by ESI-LC/MS. RNA samples were sequenced using the HiSeq4000 platform. Transcriptomes were assembled using the DRAP pipeline and annotated using the BLAST2GO pipeline and other tools. Each transcriptome contained over twenty thousand protein encoding transcripts with ORFs and flanking sequence. Identification of transcripts for cannabinoid pathway and related enzymes showed full-length ORFs that align with the draft genomes of the Purple Kush and Finola cultivars. Two transcripts were found for olivetolic acid cyclase (OAC) that mapped to distinct locations on the Purple Kush genome suggesting multiple genes for OAC are expressed in some cultivars. The ability to make high quality annotated reference transcriptomes in Cannabis or other plants can promote rapid comparative analysis between cultivars and growth conditions in Cannabis and other organisms without annotated genome assemblies.


Assuntos
Canabinoides/biossíntese , Cannabis/genética , Transcriptoma , Cannabis/classificação , Cannabis/metabolismo , Transferases Intramoleculares/genética , Transferases Intramoleculares/metabolismo , Anotação de Sequência Molecular , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo
17.
PLoS One ; 15(8): e0237087, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32813723

RESUMO

Water buffalo (Bubalus bubalis) is an important source of meat and milk in countries with relatively warm weather. Compared to the cattle genome, a little has been done to reveal its genome structure and genomic traits. This is due to the complications stemming from the large genome size, the complexity of the genome, and the high repetitive content. In this paper, we introduce a high-quality draft assembly of the Egyptian water buffalo genome. The Egyptian breed is used as a dual purpose animal (milk/meat). It is distinguished by its adaptability to the local environment, quality of feed changes, as well as its high resistance to diseases. The genome assembly of the Egyptian water buffalo has been achieved using a reference-based assembly workflow. Our workflow significantly reduced the computational complexity of the assembly process, and improved the assembly quality by integrating different public resources. We also compared our assembly to the currently available draft assemblies of water buffalo breeds. A total of 21,128 genes were identified in the produced assembly. A list of milk virgin-related genes; milk pregnancy-related genes; milk lactation-related genes; milk involution-related genes; and milk mastitis-related genes were identified in the assembly. Our results will significantly contribute to a better understanding of the genetics of the Egyptian water buffalo which will eventually support the ongoing breeding efforts and facilitate the future discovery of genes responsible for complex processes of dairy, meat production and disease resistance among other significant traits.


Assuntos
Búfalos/genética , Genoma , Animais , Anotação de Sequência Molecular , Sequenciamento Completo do Genoma
18.
Mol Cell ; 79(3): 504-520.e9, 2020 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-32707033

RESUMO

Protein kinases are essential for signal transduction and control of most cellular processes, including metabolism, membrane transport, motility, and cell cycle. Despite the critical role of kinases in cells and their strong association with diseases, good coverage of their interactions is available for only a fraction of the 535 human kinases. Here, we present a comprehensive mass-spectrometry-based analysis of a human kinase interaction network covering more than 300 kinases. The interaction dataset is a high-quality resource with more than 5,000 previously unreported interactions. We extensively characterized the obtained network and were able to identify previously described, as well as predict new, kinase functional associations, including those of the less well-studied kinases PIM3 and protein O-mannose kinase (POMK). Importantly, the presented interaction map is a valuable resource for assisting biomedical studies. We uncover dozens of kinase-disease associations spanning from genetic disorders to complex diseases, including cancer.


Assuntos
Redes Reguladoras de Genes , Doenças Genéticas Inatas/genética , Neoplasias/genética , Proteínas Quinases/genética , Proteínas Serina-Treonina Quinases/genética , Proteínas Proto-Oncogênicas/genética , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Ontologia Genética , Doenças Genéticas Inatas/enzimologia , Doenças Genéticas Inatas/patologia , Humanos , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Distrofias Musculares/enzimologia , Distrofias Musculares/genética , Distrofias Musculares/patologia , Neoplasias/enzimologia , Neoplasias/patologia , Doenças Neurodegenerativas/enzimologia , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/patologia , Mapeamento de Interação de Proteínas/métodos , Proteínas Quinases/química , Proteínas Quinases/classificação , Proteínas Quinases/metabolismo , Proteínas Serina-Treonina Quinases/química , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas Proto-Oncogênicas/química , Proteínas Proto-Oncogênicas/metabolismo , Transdução de Sinais
19.
BMC Bioinformatics ; 21(Suppl 12): 303, 2020 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-32703166

RESUMO

BACKGROUND: Illumina paired-end reads are often used for 16S analysis in metagenomic studies. Since DNA fragment size is usually smaller than the sum of lengths of paired reads, reads can be merged for downstream analysis. In spite of development of several tools for merging of paired-end reads, poor quality at the 3' ends within the overlapping region prevents the accurate combining of significant portion of read pairs. Recently CD-HIT-OTU-Miseq was presented as a new approach for 16S analysis using the paired-end reads, it completely avoids the reads merging process due to separate clustering of paired reads. CD-HIT-OTU-Miseq is a set of tools which are supposed to be successively launched by auxiliary shell scripts. This launch mode is not suitable for processing of big amounts of data generated in modern omics experiments. To solve this issue we created CDSnake - Snakemake pipeline utilizing CD-HIT tools for easier consecutive launch of CD-HIT-OTU-Miseq tools for complete processing of paired end reads in metagenomic studies. Usage of pipeline make 16S analysis easier due to one-command launch and helps to yield reproducible results. RESULTS: We benchmarked our pipeline against two commonly used pipelines for OTU retrieval, incorporated into popular workflow for microbiome analysis, QIIME2 - DADA2 and deblur. Three mock datasets having highly overlapping paired-end 2 × 250 bp reads were used for benchmarking - Balanced, HMP, and Extreme. CDSnake outputted less OTUs than DADA2 and deblur. However, on Balanced and HMP datasets number of OTUs outputted by CDSnake was closer to real number of strains which were used for mock community generation, than those outputted by DADA2 and deblur. Though generally slower than other pipelines, CDSnake outputted higher total counts, preserving more information from raw data. Inheriting this properties from original CD-HIT-OTU-MiSeq utilities, CDSnake made their usage handier due to simple scalability, easier automated runs and other Snakemake benefits. CONCLUSIONS: We developed Snakemake pipeline for OTU-MiSeq utilities, which simplified and automated data analysis. Benchmarking showed that this approach is capable to outperform popular tools in certain conditions.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Software , Bases de Dados Genéticas , Humanos , Microbiota/genética , RNA Ribossômico 16S/genética
20.
Nat Commun ; 11(1): 3697, 2020 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-32728101

RESUMO

As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mislabeled samples and ambiguous metadata annotations, representing ~1% of ENCODE datasets.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Desequilíbrio de Ligação/genética , Bases de Dados de Ácidos Nucleicos , Genótipo , Células HEK293 , Células Endoteliais da Veia Umbilical Humana/metabolismo , Humanos , Células K562 , Escore Lod , Anotação de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA