Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Methods ; 20(7): 1037-1047, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37336949

RESUMO

Technology for measuring 3D genome topology is increasingly important for studying gene regulation, for genome assembly and for mapping of genome rearrangements. Hi-C and other ligation-based methods have become routine but have specific biases. Here, we develop multiplex-GAM, a faster and more affordable version of genome architecture mapping (GAM), a ligation-free technique that maps chromatin contacts genome-wide. We perform a detailed comparison of multiplex-GAM and Hi-C using mouse embryonic stem cells. When examining the strongest contacts detected by either method, we find that only one-third of these are shared. The strongest contacts specifically found in GAM often involve 'active' regions, including many transcribed genes and super-enhancers, whereas in Hi-C they more often contain 'inactive' regions. Our work shows that active genomic regions are involved in extensive complex contacts that are currently underestimated in ligation-based approaches, and highlights the need for orthogonal advances in genome-wide contact mapping technologies.


Assuntos
Cromatina , Genoma , Animais , Camundongos , Cromatina/genética , Mapeamento Cromossômico/métodos , Cromossomos , Genômica/métodos
2.
Plants (Basel) ; 9(12)2020 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-33322028

RESUMO

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall structural proteins that function in various aspects of plant growth and development, including pollen tube growth. We have previously characterized protein sequence signatures for three family members in the HRGP superfamily: the hyperglycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). However, the mechanism of pollen-specific HRGP gene expression remains unexplored. To this end, we developed an integrative analysis pipeline combining RNA-seq gene expression and promoter sequences to identify cis-regulatory motifs responsible for pollen-specific expression of HRGP genes in Arabidopsis thaliana. Specifically, we mined the public RNA-seq datasets and identified 13 pollen-specific HRGP genes. Ensemble motif discovery identified 15 conserved promoter elements between A.thaliana and A. lyrata. Motif scanning revealed two pollen related transcription factors: GATA12 and brassinosteroid (BR) signaling pathway regulator BZR1. Finally, we performed a regression analysis and demonstrated that the 15 motifs provided a good model of HRGP gene expression in pollen (R = 0.61). In conclusion, we performed the first integrative analysis of cis-regulatory motifs in pollen-specific HRGP genes, revealing important insights into transcriptional regulation in pollen tissue.

3.
Methods Mol Biol ; 2149: 463-481, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32617951

RESUMO

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall proteins that function in diverse aspects of plant growth and development. This superfamily consists of three members: arabinogalactan-proteins (AGPs), extensins (EXTs), and proline-rich proteins (PRPs). Hybrid and chimeric HRGPs also exist. A bioinformatic software program, BIO OHIO 2.0, was developed to expedite the genome-wide identification and classification of AGPs, EXTs, and PRPs based on characteristic HRGP motifs and biased amino acid compositions. This chapter explains the principles of identifying HRGPs and provides a stepwise tutorial for using the BIO OHIO 2.0 program with genomic/proteomic data. Here, as an example, the genome/proteome of the common bean (Phaseolus vulgaris) is analyzed using the BIO OHIO 2.0 program to identify and characterize its set of HRGPs.


Assuntos
Biologia Computacional/métodos , Glicoproteínas/química , Glicoproteínas/classificação , Proteínas de Plantas/classificação , Software , Genoma de Planta , Glicoproteínas/genética , Mucoproteínas/química , Mucoproteínas/classificação , Mucoproteínas/genética , Phaseolus/química , Phaseolus/genética , Proteínas de Plantas/química , Proteínas de Plantas/genética , Domínios Proteicos Ricos em Prolina , Proteoma/análise , Análise de Sequência de Proteína/métodos
4.
PLoS One ; 14(11): e0224288, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31738797

RESUMO

Bioinformatics, a discipline that combines aspects of biology, statistics, mathematics, and computer science, is becoming increasingly important for biological research. However, bioinformatics instruction is not yet generally integrated into undergraduate life sciences curricula. To understand why we studied how bioinformatics is being included in biology education in the US by conducting a nationwide survey of faculty at two- and four-year institutions. The survey asked several open-ended questions that probed barriers to integration, the answers to which were analyzed using a mixed-methods approach. The barrier most frequently reported by the 1,260 respondents was lack of faculty expertise/training, but other deterrents-lack of student interest, overly-full curricula, and lack of student preparation-were also common. Interestingly, the barriers faculty face depended strongly on whether they are members of an underrepresented group and on the Carnegie Classification of their home institution. We were surprised to discover that the cohort of faculty who were awarded their terminal degree most recently reported the most preparation in bioinformatics but teach it at the lowest rate.


Assuntos
Biologia/educação , Biologia Computacional/educação , Currículo , Docentes/estatística & dados numéricos , Feminino , Humanos , Masculino , Motivação , Estudantes/psicologia , Inquéritos e Questionários/estatística & dados numéricos , Estados Unidos
5.
PLoS One ; 13(6): e0196878, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29870542

RESUMO

Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent's degree of training, time since degree earned, and/or the Carnegie Classification of the respondent's institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.


Assuntos
Biologia Computacional/educação , Competência Mental , Aprendizagem Baseada em Problemas , Adolescente , Adulto , Feminino , Humanos , Masculino , Estados Unidos
6.
BMC Plant Biol ; 16(1): 229, 2016 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-27769192

RESUMO

BACKGROUND: Hydroxyproline-rich glycoproteins (HRGPs) constitute a plant cell wall protein superfamily that functions in diverse aspects of growth and development. This superfamily contains three members: the highly glycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). Chimeric and hybrid HRGPs, however, also exist. A bioinformatics approach is employed here to identify and classify AGPs, EXTs, PRPs, chimeric HRGPs, and hybrid HRGPs from the proteins predicted by the completed genome sequence of poplar (Populus trichocarpa). This bioinformatics approach is based on searching for biased amino acid compositions and for particular protein motifs associated with known HRGPs with a newly revised and improved BIO OHIO 2.0 program. Proteins detected by the program are subsequently analyzed to identify the following: 1) repeating amino acid sequences, 2) signal peptide sequences, 3) glycosylphosphatidylinositol lipid anchor addition sequences, and 4) similar HRGPs using the Basic Local Alignment Search Tool (BLAST). RESULTS: The program was used to identify and classify 271 HRGPs from poplar including 162 AGPs, 60 EXTs, and 49 PRPs, which are each divided into various classes. This is in contrast to a previous analysis of the Arabidopsis proteome which identified 162 HRGPs consisting of 85 AGPs, 59 EXTs, and 18 PRPs. Poplar was observed to have fewer classical EXTs, to have more fasciclin-like AGPs, plastocyanin AGPs and AG peptides, and to contain a novel class of PRPs referred to as the proline-rich peptides. CONCLUSIONS: The newly revised and improved BIO OHIO 2.0 bioinformatics program was used to identify and classify the inventory of HRGPs in poplar in order to facilitate and guide basic and applied research on plant cell walls. The newly identified poplar HRGPs can now be examined to determine their respective structural and functional roles, including their possible applications in the areas plant biofuel and natural products for medicinal or industrial uses. Additionally, other plants whose genomes are sequenced can now be examined in a similar way using this bioinformatics program which will provide insight to the evolution of the HRGP family in the plant kingdom.


Assuntos
Glicoproteínas/genética , Proteínas de Plantas/genética , Populus/genética , Sequência de Aminoácidos , Biologia Computacional , Glicoproteínas/análise , Glicoproteínas/química , Glicoproteínas/metabolismo , Hidroxiprolina/metabolismo , Proteínas de Plantas/análise , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Populus/metabolismo
7.
PLoS One ; 11(2): e0150177, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26918442

RESUMO

Extensins (EXTs) are a family of plant cell wall hydroxyproline-rich glycoproteins (HRGPs) that are implicated to play important roles in plant growth, development, and defense. Structurally, EXTs are characterized by the repeated occurrence of serine (Ser) followed by three to five prolines (Pro) residues, which are hydroxylated as hydroxyproline (Hyp) and glycosylated. Some EXTs have Tyrosine (Tyr)-X-Tyr (where X can be any amino acid) motifs that are responsible for intramolecular or intermolecular cross-linkings. EXTs can be divided into several classes: classical EXTs, short EXTs, leucine-rich repeat extensins (LRXs), proline-rich extensin-like receptor kinases (PERKs), formin-homolog EXTs (FH EXTs), chimeric EXTs, and long chimeric EXTs. To guide future research on the EXTs and understand evolutionary history of EXTs in the plant kingdom, a bioinformatics study was conducted to identify and classify EXTs from 16 fully sequenced plant genomes, including Ostreococcus lucimarinus, Chlamydomonas reinhardtii, Volvox carteri, Klebsormidium flaccidum, Physcomitrella patens, Selaginella moellendorffii, Pinus taeda, Picea abies, Brachypodium distachyon, Zea mays, Oryza sativa, Glycine max, Medicago truncatula, Brassica rapa, Solanum lycopersicum, and Solanum tuberosum, to supplement data previously obtained from Arabidopsis thaliana and Populus trichocarpa. A total of 758 EXTs were newly identified, including 87 classical EXTs, 97 short EXTs, 61 LRXs, 75 PERKs, 54 FH EXTs, 38 long chimeric EXTs, and 346 other chimeric EXTs. Several notable findings were made: (1) classical EXTs were likely derived after the terrestrialization of plants; (2) LRXs, PERKs, and FHs were derived earlier than classical EXTs; (3) monocots have few classical EXTs; (4) Eudicots have the greatest number of classical EXTs and Tyr-X-Tyr cross-linking motifs are predominantly in classical EXTs; (5) green algae have no classical EXTs but have a number of long chimeric EXTs that are absent in embryophytes. Furthermore, phylogenetic analysis was conducted of LRXs, PERKs and FH EXTs, which shed light on the evolution of three EXT classes.


Assuntos
Glicoproteínas/classificação , Proteínas de Plantas/classificação , Proteínas de Algas/classificação , Proteínas de Algas/genética , Motivos de Aminoácidos , Sequência de Aminoácidos , Proteínas de Arabidopsis/classificação , Proteínas de Arabidopsis/genética , Evolução Biológica , Biologia Computacional , Genoma de Planta , Glicoproteínas/genética , Dados de Sequência Molecular , Filogenia , Proteínas de Plantas/genética , Plantas/classificação , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Especificidade da Espécie
9.
Plant Physiol ; 153(2): 485-513, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20395450

RESUMO

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall proteins that function in diverse aspects of plant growth and development. This superfamily consists of three members: hyperglycosylated arabinogalactan proteins (AGPs), moderately glycosylated extensins (EXTs), and lightly glycosylated proline-rich proteins (PRPs). Hybrid and chimeric versions of HRGP molecules also exist. In order to "mine" genomic databases for HRGPs and to facilitate and guide research in the field, the BIO OHIO software program was developed that identifies and classifies AGPs, EXTs, PRPs, hybrid HRGPs, and chimeric HRGPs from proteins predicted from DNA sequence data. This bioinformatics program is based on searching for biased amino acid compositions and for particular protein motifs associated with known HRGPs. HRGPs identified by the program are subsequently analyzed to elucidate the following: (1) repeating amino acid sequences, (2) signal peptide and glycosylphosphatidylinositol lipid anchor addition sequences, (3) similar HRGPs via Basic Local Alignment Search Tool, (4) expression patterns of their genes, (5) other HRGPs, glycosyl transferase, prolyl 4-hydroxylase, and peroxidase genes coexpressed with their genes, and (6) gene structure and whether genetic mutants exist in their genes. The program was used to identify and classify 166 HRGPs from Arabidopsis (Arabidopsis thaliana) as follows: 85 AGPs (including classical AGPs, lysine-rich AGPs, arabinogalactan peptides, fasciclin-like AGPs, plastocyanin AGPs, and other chimeric AGPs), 59 EXTs (including SP(5) EXTs, SP(5)/SP(4) EXTs, SP(4) EXTs, SP(4)/SP(3) EXTs, a SP(3) EXT, "short" EXTs, leucine-rich repeat-EXTs, proline-rich extensin-like receptor kinases, and other chimeric EXTs), 18 PRPs (including PRPs and chimeric PRPs), and AGP/EXT hybrid HRGPs.


Assuntos
Biologia Computacional/métodos , Glicoproteínas/química , Glicoproteínas/classificação , Proteínas de Plantas/química , Proteínas de Plantas/classificação , Motivos de Aminoácidos , Sequência de Aminoácidos , Arabidopsis/metabolismo , Mineração de Dados , Bases de Dados de Proteínas , Genes de Plantas , Dados de Sequência Molecular , Análise de Sequência de Proteína , Software
10.
BMC Bioinformatics ; 11 Suppl 12: S6, 2010 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-21210985

RESUMO

BACKGROUND: An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. METHODS: This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. RESULTS: A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. CONCLUSION: WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data.


Assuntos
DNA/química , Genômica/métodos , Sequências Reguladoras de Ácido Nucleico , Software , Algoritmos , Arabidopsis/genética , Genoma de Planta , Cadeias de Markov , Análise de Sequência de DNA
11.
BMC Genomics ; 10: 463, 2009 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-19814816

RESUMO

BACKGROUND: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. RESULTS: Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. CONCLUSION: Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.


Assuntos
Arabidopsis/genética , Biologia Computacional/métodos , Genoma de Planta , Modelos Estatísticos , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , DNA de Plantas/genética , Regulação da Expressão Gênica de Plantas , Íntrons , Cadeias de Markov , Regiões Promotoras Genéticas , Análise de Sequência de DNA
12.
BMC Genomics ; 10 Suppl 1: S18, 2009 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-19594877

RESUMO

BACKGROUND: DNA repair genes provide an important contribution towards the surveillance and repair of DNA damage. These genes produce a large network of interacting proteins whose mRNA expression is likely to be regulated by similar regulatory factors. Full characterization of promoters of DNA repair genes and the similarities among them will more fully elucidate the regulatory networks that activate or inhibit their expression. To address this goal, the authors introduce a technique to find regulatory genomic signatures, which represents a specific application of the genomic signature methodology to classify DNA sequences as putative functional elements within a single organism. RESULTS: The effectiveness of the regulatory genomic signatures is demonstrated via analysis of promoter sequences for genes in DNA repair pathways of humans. The promoters are divided into two classes, the bidirectional promoters and the unidirectional promoters, and distinct genomic signatures are calculated for each class. The genomic signatures include statistically overrepresented words, word clusters, and co-occurring words. The robustness of this method is confirmed by the ability to identify sequences that exist as motifs in TRANSFAC and JASPAR databases, and in overlap with verified binding sites in this set of promoter regions. CONCLUSION: The word-based signatures are shown to be effective by finding occurrences of known regulatory sites. Moreover, the signatures of the bidirectional and unidirectional promoters of human DNA repair pathways are clearly distinct, exhibiting virtually no overlap. In addition to providing an effective characterization method for related DNA sequences, the signatures elucidate putative regulatory aspects of DNA repair pathways, which are notably under-characterized.


Assuntos
Biologia Computacional/métodos , Reparo do DNA , Regiões Promotoras Genéticas , Composição de Bases , Análise por Conglomerados , Bases de Dados Genéticas , Humanos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...