Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
BMC Bioinformatics ; 23(1): 240, 2022 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-35717172

RESUMO

BACKGROUND: G-quadruplexes (G4s), formed within guanine-rich nucleic acids, are secondary structures involved in important biological processes. Although every G4 motif has the potential to form a stable G4 structure, not every G4 motif would, and accurate energy-based methods are needed to assess their structural stability. Here, we present a decision tree-based prediction tool, G4Boost, to identify G4 motifs and predict their secondary structure folding probability and thermodynamic stability based on their sequences, nucleotide compositions, and estimated structural topologies. RESULTS: G4Boost predicted the quadruplex folding state with an accuracy greater then 93% and an F1-score of 0.96, and the folding energy with an RMSE of 4.28 and R2 of 0.95 only by the means of sequence intrinsic feature. G4Boost was successfully applied and validated to predict the stability of experimentally-determined G4 structures, including for plants and humans. CONCLUSION: G4Boost outperformed the three machine-learning based prediction tools, DeepG4, Quadron, and G4RNA Screener, in terms of both accuracy and F1-score, and can be highly useful for G4 prediction to understand gene regulation across species including plants and humans.


Assuntos
Quadruplex G , Regulação da Expressão Gênica , Guanina/química , Humanos , Aprendizado de Máquina , Termodinâmica
2.
BMC Plant Biol ; 22(1): 595, 2022 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-36529716

RESUMO

BACKGROUND: With the advances in the high throughput next generation sequencing technologies, genome-wide association studies (GWAS) have identified a large set of variants associated with complex phenotypic traits at a very fine scale. Despite the progress in GWAS, identification of genotype-phenotype relationship remains challenging in maize due to its nature with dozens of variants controlling the same trait. As the causal variations results in the change in expression, gene expression analyses carry a pivotal role in unraveling the transcriptional regulatory mechanisms behind the phenotypes. RESULTS: To address these challenges, we incorporated the gene expression and GWAS-driven traits to extend the knowledge of genotype-phenotype relationships and transcriptional regulatory mechanisms behind the phenotypes. We constructed a large collection of gene co-expression networks and identified more than 2 million co-expressing gene pairs in the GWAS-driven pan-network which contains all the gene-pairs in individual genomes of the nested association mapping (NAM) population. We defined four sub-categories for the pan-network: (1) core-network contains the highest represented ~ 1% of the gene-pairs, (2) near-core network contains the next highest represented 1-5% of the gene-pairs, (3) private-network contains ~ 50% of the gene pairs that are unique to individual genomes, and (4) the dispensable-network contains the remaining 50-95% of the gene-pairs in the maize pan-genome. Strikingly, the private-network contained almost all the genes in the pan-network but lacked half of the interactions. We performed gene ontology (GO) enrichment analysis for the pan-, core-, and private- networks and compared the contributions of variants overlapping with genes and promoters to the GWAS-driven pan-network. CONCLUSIONS: Gene co-expression networks revealed meaningful information about groups of co-regulated genes that play a central role in regulatory processes. Pan-network approach enabled us to visualize the global view of the gene regulatory network for the studied system that could not be well inferred by the core-network alone.


Assuntos
Estudo de Associação Genômica Ampla , Zea mays , Zea mays/genética , Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial , Fenótipo , Redes Reguladoras de Genes , Polimorfismo de Nucleotídeo Único/genética
3.
BMC Bioinformatics ; 22(1): 205, 2021 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-33879057

RESUMO

BACKGROUND: Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these sequences are often uncertain, making it difficult to identify and rectify errors in them. This hinders the creation of an accurate and holistic representation of the transcriptomic landscape across multiple tissue types and experimental conditions. Therefore, to gauge the extent of diversity in gene structures, a comprehensive analysis of genome-wide expression data is imperative. RESULTS: We present FINDER, a fully automated computational tool that optimizes the entire process of annotating genes and transcript structures. Unlike current state-of-the-art pipelines, FINDER automates the RNA-Seq pre-processing step by working directly with raw sequence reads and optimizes gene prediction from BRAKER2 by supplementing these reads with associated proteins. The FINDER pipeline (1) reports transcripts and recognizes genes that are expressed under specific conditions, (2) generates all possible alternatively spliced transcripts from expressed RNA-Seq data, (3) analyzes read coverage patterns to modify existing transcript models and create new ones, and (4) scores genes as high- or low-confidence based on the available evidence across multiple datasets. We demonstrate the ability of FINDER to automatically annotate a diverse pool of genomes from eight species. CONCLUSIONS: FINDER takes a completely automated approach to annotate genes directly from raw expression data. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision-ideal for bench researchers with limited experience in handling computational tools.


Assuntos
Eucariotos , Software , Eucariotos/genética , Genoma , Anotação de Sequência Molecular , RNA-Seq , Análise de Sequência de RNA
4.
Funct Integr Genomics ; 21(2): 195-204, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33635499

RESUMO

Following the elucidation of the critical roles they play in numerous important biological processes, long noncoding RNAs (lncRNAs) have gained vast attention in recent years. Manual annotation of lncRNAs is restricted by known gene annotations and is prone to false prediction due to the incompleteness of available data. However, with the advent of high-throughput sequencing technologies, a magnitude of high-quality data has become available for annotation, especially for plant species such as wheat. Here, we compared prediction accuracies of several machine learning algorithms using a 10-fold cross-validation. This study includes a comprehensive feature selection step to refine irrelevant and repeated features. We present a crop-specific, alignment-free coding potential prediction tool, LncMachine, that performs at higher prediction accuracies than the currently available popular tools (CPC2, CPAT, and CNIT) when used with the Random Forest algorithm. Further, LncMachine with Random Forest performed well on human and mouse data, with an average accuracy of 92.67%. LncMachine only requires either a FASTA file or a TAB separated CSV file containing features as input files. LncMachine can deploy several user-provided algorithms in real time and therefore be effortlessly applied to a wide range of studies.


Assuntos
Biologia Computacional , Anotação de Sequência Molecular , Plantas/genética , RNA Longo não Codificante/genética , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Aprendizado de Máquina , RNA Longo não Codificante/classificação
5.
PLoS Comput Biol ; 16(8): e1007261, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32810130

RESUMO

We describe JBrowse Connect, an optional expansion to the JBrowse genome browser, targeted at developers. JBrowse Connect allows live messaging, notifications for new annotation tracks, heavy-duty analyses initiated by the user from within the browser, and other dynamic features. We present example applications of JBrowse Connect that allow users 1) to specify and execute BLAST searches by either running on the same host as the webserver, with a self-contained BLAST module leveraging NCBI Blast+ commands, or via a managed Galaxy instance that can optionally run on a different host, and 2) to run the primer design service Primer3. JBrowse Connect allows users to track job progress and view results in the context of the browser. The software is available under a choice of open source licenses including LGPL and the Artistic License.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Software , Internet
6.
Nucleic Acids Res ; 47(D1): D1146-D1154, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30407532

RESUMO

Since its 2015 update, MaizeGDB, the Maize Genetics and Genomics database, has expanded to support the sequenced genomes of many maize inbred lines in addition to the B73 reference genome assembly. Curation and development efforts have targeted high quality datasets and tools to support maize trait analysis, germplasm analysis, genetic studies, and breeding. MaizeGDB hosts a wide range of data including recent support of new data types including genome metadata, RNA-seq, proteomics, synteny, and large-scale diversity. To improve access and visualization of data types several new tools have been implemented to: access large-scale maize diversity data (SNPversity), download and compare gene expression data (qTeller), visualize pedigree data (Pedigree Viewer), link genes with phenotype images (MaizeDIG), and enable flexible user-specified queries to the MaizeGDB database (MaizeMine). MaizeGDB also continues to be the community hub for maize research, coordinating activities and providing technical support to the maize research community. Here we report the changes MaizeGDB has made within the last three years to keep pace with recent software and research advances, as well as the pan-genomic landscape that cheaper and better sequencing technologies have made possible. MaizeGDB is accessible online at https://www.maizegdb.org.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma de Planta/genética , Genômica/métodos , Zea mays/genética , Regulação da Expressão Gênica de Plantas , Variação Genética , Armazenamento e Recuperação da Informação/métodos , Internet , Polimorfismo de Nucleotídeo Único , Proteômica/métodos , Interface Usuário-Computador , Zea mays/metabolismo
7.
Int J Mol Sci ; 22(19)2021 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-34638743

RESUMO

The highly challenging hexaploid wheat (Triticum aestivum) genome is becoming ever more accessible due to the continued development of multiple reference genomes, a factor which aids in the plight to better understand variation in important traits. Although the process of variant calling is relatively straightforward, selection of the best combination of the computational tools for read alignment and variant calling stages of the analysis and efficient filtering of the false variant calls are not always easy tasks. Previous studies have analyzed the impact of methods on the quality metrics in diploid organisms. Given that variant identification in wheat largely relies on accurate mining of exome data, there is a critical need to better understand how different methods affect the analysis of whole exome sequencing (WES) data in polyploid species. This study aims to address this by performing whole exome sequencing of 48 wheat cultivars and assessing the performance of various variant calling pipelines at their suggested settings. The results show that all the pipelines require filtering to eliminate false-positive calls. The high consensus among the reference SNPs called by the best-performing pipelines suggests that filtering provides accurate and reproducible results. This study also provides detailed comparisons for high sensitivity and precision at individual and population levels for the raw and filtered SNP calls.


Assuntos
Sequenciamento do Exoma , Genoma de Planta , Polimorfismo de Nucleotídeo Único , Poliploidia , Triticum/genética
8.
BMC Plant Biol ; 20(1): 4, 2020 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-31900107

RESUMO

BACKGROUND: Maize experienced a whole-genome duplication event approximately 5 to 12 million years ago. Because this event occurred after speciation from sorghum, the pre-duplication subgenomes can be partially reconstructed by mapping syntenic regions to the sorghum chromosomes. During evolution, maize has had uneven gene loss between each ancient subgenome. Fractionation and divergence between these genomes continue today, constantly changing genetic make-up and phenotypes and influencing agronomic traits. RESULTS: Here we regenerate the subgenome reconstructions for the most recent maize reference genome assembly. Based on both expression and abundance data for homeologous gene pairs across multiple tissues, we observed functional divergence of genes across subgenomes. Although the genes in the larger maize subgenome are often expressing more highly than their homeologs in the smaller subgenome, we observed cases where homeolog expression dominance switches in different tissues. We demonstrate for the first time that protein abundances are higher in the larger subgenome, but they also show tissue-specific dominance, a pattern similar to RNA expression dominance. We also find that pollen expression is uniquely decoupled from protein abundance. CONCLUSION: Our study shows that the larger subgenome has a greater range of functional assignments and that there is a relative lack of overlap between the subgenomes in terms of gene functions than would be suggested by similar patterns of gene expression and protein abundance. Our study also revealed that some reactions are catalyzed uniquely by the larger and smaller subgenomes. The tissue-specific, nonequivalent expression-level dominance pattern observed here implies a change in regulatory control which favors differentiated selective pressure on the retained duplicates leading to eventual change in gene functions.


Assuntos
Regulação da Expressão Gênica de Plantas/genética , Expressão Gênica/genética , Zea mays/genética , Mapeamento Cromossômico/métodos , Evolução Molecular , Duplicação Gênica , Ontologia Genética , Genes de Plantas , Genoma de Planta , Filogenia , Proteínas de Plantas/biossíntese , Proteínas de Plantas/genética , Pólen/genética , Poliploidia
9.
Bioinformatics ; 35(20): 4184-4186, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30903182

RESUMO

MOTIVATION: Plant breeding aims to improve current germplasm that can tolerate a wide range of biotic and abiotic stresses. To accomplish this goal, breeders rely on developing a deeper understanding of genetic makeup and relationships between plant varieties to make informed plant selections. Although rapid advances in genotyping technology generated a large amount of data for breeders, tools that facilitate pedigree analysis and visualization are scant, leaving breeders to use classical, but inherently limited, hierarchical pedigree diagrams for a handful of plant varieties. To answer this need, we developed a simple web-based tool that can be easily implemented at biological databases, called PedigreeNet, to create and visualize customizable pedigree relationships in a network context, displaying pre- and user-uploaded data. RESULTS: As a proof-of-concept, we implemented PedigreeNet at the maize model organism database, MaizeGDB. The PedigreeNet viewer at MaizeGDB has a dynamically-generated pedigree network of 4706 maize lines and 5487 relationships that are currently available as both a stand-alone web-based tool and integrated directly on the MaizeGDB Stock Pages. The tool allows the user to apply a number of filters, select or upload their own breeding relationships, center a pedigree network on a plant variety, identify the common ancestor between two varieties, and display the shortest path(s) between two varieties on the pedigree network. The PedigreeNet code layer is written as a JavaScript wrapper around Cytoscape Web. PedigreeNet fills a great need for breeders to have access to an online tool to represent and visually customize pedigree relationships. AVAILABILITY AND IMPLEMENTATION: PedigreeNet is accessible at https://www.maizegdb.org/breeders_toolbox. The open source code is publically and freely available at GitHub: https://github.com/Maize-Genetics-and-Genomics-Database/PedigreeNet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Zea mays , Bases de Dados Factuais , Bases de Dados Genéticas , Internet , Linhagem
10.
Nucleic Acids Res ; 44(D1): D1195-201, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26432828

RESUMO

MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, the original maize genetics database MaizeDB was created. In 2003, the combined contents of MaizeDB and the sequence data from ZmDB were made accessible as a single resource named MaizeGDB. Over the next decade, MaizeGDB became more sequence driven while still maintaining traditional maize genetics datasets. This enabled the project to meet the continued growing and evolving needs of the maize research community, yet the interface and underlying infrastructure remained unchanged. In 2015, the MaizeGDB team completed a multi-year effort to update the MaizeGDB resource by reorganizing existing data, upgrading hardware and infrastructure, creating new tools, incorporating new data types (including diversity data, expression data, gene models, and metabolic pathways), and developing and deploying a modern interface. In addition to coordinating a data resource, the MaizeGDB team coordinates activities and provides technical support to the maize research community. MaizeGDB is accessible online at http://www.maizegdb.org.


Assuntos
Bases de Dados Genéticas , Zea mays/genética , Expressão Gênica , Genes de Plantas , Variação Genética , Genoma de Planta , Redes e Vias Metabólicas , Modelos Genéticos , Software , Interface Usuário-Computador , Zea mays/metabolismo
11.
RNA ; 20(6): 815-24, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24757168

RESUMO

Aptamers can be highly specific for their targets, which implies precise molecular recognition between aptamer and target. However, as small polymers, their structures are more subject to environmental conditions than the more constrained longer RNAs such as those that constitute the ribosome. To understand the balance between structural and environmental factors in establishing ligand specificity of aptamers, we examined the RNA aptamer (NEO1A) previously reported as specific for neomycin-B. We show that NEO1A can recognize other aminoglycosides with similar affinities as for neomycin-B and its aminoglycoside specificity is strongly influenced by ionic strength and buffer composition. NMR and 2-aminopurine (2AP) fluorescence studies of the aptamer identified a flexible pentaloop and a stable binding pocket. Consistent with a well-structured binding pocket, docking analysis results correlated with experimental measures of the binding energy for most ligands. Steady state fluorescence studies of 2AP-substituted aptamers confirmed that A16 moves to a more solvent accessible position upon ligand binding while A14 moves to a less solvent accessible position, which is most likely a base stack. Analysis of binding affinities of NEO1A sequence variants showed that the base in position 16 interacts differently with each ligand and the interaction is a function of the buffer constituents. Our results show that the pentaloop provides NEO1A with the ability to adapt to external influences on its structure, with the critical base at position 16 adjusting to incorporate each ligand into a stable pocket by hydrophobic interactions and/or hydrogen bonds depending on the ligand and the ionic environment.


Assuntos
Aptâmeros de Nucleotídeos/química , Framicetina/química , RNA/química , 2-Aminopurina/química , Aminoglicosídeos/química , Sítios de Ligação , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Ligantes , Conformação de Ácido Nucleico , Concentração Osmolar , Especificidade por Substrato
13.
G3 (Bethesda) ; 14(5)2024 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-38492232

RESUMO

The recent assembly and annotation of the 26 maize nested association mapping population founder inbreds have enabled large-scale pan-genomic comparative studies. These studies have expanded our understanding of agronomically important traits by integrating pan-transcriptomic data with trait-specific gene candidates from previous association mapping results. In contrast to the availability of pan-transcriptomic data, obtaining reliable protein-protein interaction (PPI) data has remained a challenge due to its high cost and complexity. We generated predicted PPI networks for each of the 26 genomes using the established STRING database. The individual genome-interactomes were then integrated to generate core- and pan-interactomes. We deployed the PPI clustering algorithm ClusterONE to identify numerous PPI clusters that were functionally annotated using gene ontology (GO) functional enrichment, demonstrating a diverse range of enriched GO terms across different clusters. Additional cluster annotations were generated by integrating gene coexpression data and gene description annotations, providing additional useful information. We show that the functionally annotated PPI clusters establish a useful framework for protein function prediction and prioritization of candidate genes of interest. Our study not only provides a comprehensive resource of predicted PPI networks for 26 maize genomes but also offers annotated interactome clusters for predicting protein functions and prioritizing gene candidates. The source code for the Python implementation of the analysis workflow and a standalone web application for accessing the analysis results are available at https://github.com/eporetsky/PanPPI.


Assuntos
Zea mays , Zea mays/genética , Mapas de Interação de Proteínas/genética , Anotação de Sequência Molecular , Ontologia Genética , Genoma de Planta , Locos de Características Quantitativas , Biologia Computacional/métodos , Algoritmos , Genes de Plantas , Característica Quantitativa Herdável , Fenótipo , Bases de Dados Genéticas , Genômica/métodos
14.
Sci Data ; 11(1): 420, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38653999

RESUMO

Wheat (Triticum aestivum) is one of the most important food crops with an urgent need for increase in its production to feed the growing world. Triticum timopheevii (2n = 4x = 28) is an allotetraploid wheat wild relative species containing the At and G genomes that has been exploited in many pre-breeding programmes for wheat improvement. In this study, we report the generation of a chromosome-scale reference genome assembly of T. timopheevii accession PI 94760 based on PacBio HiFi reads and chromosome conformation capture (Hi-C). The assembly comprised a total size of 9.35 Gb, featuring a contig N50 of 42.4 Mb and included the mitochondrial and plastid genome sequences. Genome annotation predicted 166,325 gene models including 70,365 genes with high confidence. DNA methylation analysis showed that the G genome had on average more methylated bases than the At genome. In summary, the T. timopheevii genome assembly provides a valuable resource for genome-informed discovery of agronomically important genes for food security.


Assuntos
Cromossomos de Plantas , Genoma de Planta , Triticum , Triticum/genética , Cromossomos de Plantas/genética , Metilação de DNA
15.
Plant Direct ; 7(12): e554, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38124705

RESUMO

Protein phosphorylation is a dynamic and reversible post-translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein-protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine-learning approach that leverages protein language models and gradient-boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision-recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision-recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence-based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome-wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.

16.
Database (Oxford) ; 20232023 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-37971715

RESUMO

Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL  https://www.agbiodata.org/databases.


Assuntos
Gerenciamento de Dados , Melhoramento Vegetal , Animais , Genômica/métodos , Bases de Dados Factuais , Disseminação de Informação
17.
Langmuir ; 28(34): 12619-28, 2012 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-22856639

RESUMO

Poly(ethylene glycol)-based polyurethanes have been widely used in biomedical applications; however, they are prone to swelling. A natural polyol, castor oil, can be incorporated into these polyurethanes to control the degree of the swelling, which alters mechanical properties and protein adsorption characteristic of the polymers. In this work, we modeled poly(ethylene glycol) and castor oil copolymers of hexamethylene diisocyanate-based polyurethanes (PEG-HDI and CO-HDI, respectively) and compared their mechanisms for fibronectin adsorption using molecular mechanics and molecular dynamics simulations. Results showed that the interplay between the hydrophobic residues concentrated at the N-terminal end of the protein, the surface roughness, and the hydrophilicity of the polymer surface determine the overall protein adsorption affinity. Incorporating explicit water molecules in the simulations results in higher affinity for fibronectin adsorption to more hydrophobic surface of CO-HDI surfaces, emphasizing the role that water molecules play during adsorption. We also observed that the strain energies that are indicative of flexibility and consequently entropy are significantly affected by the changes in the patterns of ß-sheet formation/breaking. Our study lends supports to the view that while castor oil controls the degree of swelling, it increases the adsorption of fibronectin to a limited extent due to the interplay between its hydrophobicity and its surface roughness, which needs to be taken into account during the design of polyurethane-based biomaterials.


Assuntos
Fibronectinas/química , Simulação de Dinâmica Molecular , Poliuretanos/química , Adsorção , Sequência de Aminoácidos , Óleo de Rícino/química , Cianatos/química , Isocianatos , Dados de Sequência Molecular , Polietilenoglicóis/química , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Eletricidade Estática , Estresse Mecânico , Propriedades de Superfície , Termodinâmica , Água/química
18.
Front Artif Intell ; 5: 830170, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35719692

RESUMO

Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes.

19.
Foods ; 11(7)2022 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-35407001

RESUMO

GrainGenes is the USDA-ARS database and Web resource for wheat, barley, oat, rye, and their relatives. As a community Web hub and database for small grains, GrainGenes strives to provide resources for researchers, students, and plant breeders to improve traits such as quality, yield, and disease resistance. Quantitative trait loci (QTL), genes, and genetic maps for quality attributes in GrainGenes represent the historical approach to mapping genes for groat percentage, test weight, protein, fat, and ß-glucan content in oat (Avena spp.). Genetic maps are viewable in CMap, the comparative mapping tool that enables researchers to take advantage of highly populated consensus maps to increase the marker density around their genes-of-interest. GrainGenes hosts over 50 genome browsers and is launching an effort for community curation, including the manually curated tracks with beta-glucan QTL and significant markers found via GWAS and cloned cellulose synthase-like AsClF6 alleles.

20.
Database (Oxford) ; 20222022 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-35616118

RESUMO

As one of the US Department of Agriculture-Agricultural Research Service flagship databases, GrainGenes (https://wheat.pw.usda.gov) serves the data and community needs of globally distributed small grains researchers for the genetic improvement of the Triticeae family and Avena species that include wheat, barley, rye and oat. GrainGenes accomplishes its mission by continually enriching its cross-linked data content following the findable, accessible, interoperable and reusable principles, enhancing and maintaining an intuitive web interface, creating tools to enable easy data access and establishing data connections within and between GrainGenes and other biological databases to facilitate knowledge discovery. GrainGenes operates within the biological database community, collaborates with curators and genome sequencing groups and contributes to the AgBioData Consortium and the International Wheat Initiative through the Wheat Information System (WheatIS). Interactive and linked content is paramount for successful biological databases and GrainGenes now has 2917 manually curated gene records, including 289 genes and 254 alleles from the Wheat Gene Catalogue (WGC). There are >4.8 million gene models in 51 genome browser assemblies, 6273 quantitative trait loci and >1.4 million genetic loci on 4756 genetic and physical maps contained within 443 mapping sets, complete with standardized metadata. Most notably, 50 new genome browsers that include outputs from the Wheat and Barley PanGenome projects have been created. We provide an example of an expression quantitative trait loci track on the International Wheat Genome Sequencing Consortium Chinese Spring wheat browser to demonstrate how genome browser tracks can be adapted for different data types. To help users benefit more from its data, GrainGenes created four tutorials available on YouTube. GrainGenes is executing its vision of service by continuously responding to the needs of the global small grains community by creating a centralized, long-term, interconnected data repository. Database URL:https://wheat.pw.usda.gov.


Assuntos
Genoma de Planta , Hordeum , Avena/genética , Mapeamento Cromossômico , Bases de Dados Genéticas , Genoma de Planta/genética , Genômica , Hordeum/genética , Locos de Características Quantitativas , Triticum/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA