RESUMO
The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.
Assuntos
Bases de Dados Factuais , Terapia de Alvo Molecular , Proteoma , Humanos , Produtos Biológicos , Descoberta de Drogas , Internet , Proteoma/efeitos dos fármacosRESUMO
The endangered whale shark (Rhincodon typus) is the largest fish on Earth and a long-lived member of the ancient Elasmobranchii clade. To characterize the relationship between genome features and biological traits, we sequenced and assembled the genome of the whale shark and compared its genomic and physiological features to those of 83 animals and yeast. We examined the scaling relationships between body size, temperature, metabolic rates, and genomic features and found both general correlations across the animal kingdom and features specific to the whale shark genome. Among animals, increased lifespan is positively correlated to body size and metabolic rate. Several genomic traits also significantly correlated with body size, including intron and gene length. Our large-scale comparative genomic analysis uncovered general features of metazoan genome architecture: Guanine and cytosine (GC) content and codon adaptation index are negatively correlated, and neural connectivity genes are longer than average genes in most genomes. Focusing on the whale shark genome, we identified multiple features that significantly correlate with lifespan. Among these were very long gene length, due to introns being highly enriched in repetitive elements such as CR1-like long interspersed nuclear elements, and considerably longer neural genes of several types, including connectivity, activity, and neurodegeneration genes. The whale shark genome also has the second slowest evolutionary rate observed in vertebrates to date. Our comparative genomics approach uncovered multiple genetic features associated with body size, metabolic rate, and lifespan and showed that the whale shark is a promising model for studies of neural architecture and lifespan.
Assuntos
Adaptação Fisiológica/genética , Tamanho Corporal/fisiologia , Tubarões/genética , Animais , Sequência de Bases/genética , Tamanho Corporal/genética , Genoma/genética , Genômica/métodos , Longevidade/genética , Tubarões/metabolismo , TemperaturaRESUMO
SARS-CoV-2 has infected over 128 million people worldwide, and until a vaccine is developed and widely disseminated, vigilant testing and contact tracing are the most effective ways to slow the spread of COVID-19. Typical clinical testing only confirms the presence or absence of the virus, but rather, a simple and rapid testing procedure that sequences the entire genome would be impactful and allow for tracing the spread of the virus and variants, as well as the appearance of new variants. However, traditional short read sequencing methods are time consuming and expensive. Herein, we describe a tiled genome array that we developed for rapid and inexpensive full viral genome resequencing, and we have applied our SARS-CoV-2-specific genome tiling array to rapidly and accurately resequence the viral genome from eight clinical samples. We have resequenced eight samples acquired from patients in Wyoming that tested positive for SARS-CoV-2. We were ultimately able to sequence over 95% of the genome of each sample with greater than 99.9% average accuracy.
Assuntos
COVID-19 , SARS-CoV-2 , Genoma Viral , Humanos , Análise de Sequência com Séries de OligonucleotídeosRESUMO
BACKGROUND: Unique among cnidarians, jellyfish have remarkable morphological and biochemical innovations that allow them to actively hunt in the water column and were some of the first animals to become free-swimming. The class Scyphozoa, or true jellyfish, are characterized by a predominant medusa life-stage consisting of a bell and venomous tentacles used for hunting and defense, as well as using pulsed jet propulsion for mobility. Here, we present the genome of the giant Nomura's jellyfish (Nemopilema nomurai) to understand the genetic basis of these key innovations. RESULTS: We sequenced the genome and transcriptomes of the bell and tentacles of the giant Nomura's jellyfish as well as transcriptomes across tissues and developmental stages of the Sanderia malayensis jellyfish. Analyses of the Nemopilema and other cnidarian genomes revealed adaptations associated with swimming, marked by codon bias in muscle contraction and expansion of neurotransmitter genes, along with expanded Myosin type II family and venom domains, possibly contributing to jellyfish mobility and active predation. We also identified gene family expansions of Wnt and posterior Hox genes and discovered the important role of retinoic acid signaling in this ancient lineage of metazoans, which together may be related to the unique jellyfish body plan (medusa formation). CONCLUSIONS: Taken together, the Nemopilema jellyfish genome and transcriptomes genetically confirm their unique morphological and physiological traits, which may have contributed to the success of jellyfish as early multi-cellular predators.
Assuntos
Evolução Molecular , Genoma/fisiologia , Comportamento Predatório , Cifozoários/fisiologia , Animais , Evolução Biológica , Filogenia , Cifozoários/genéticaRESUMO
BACKGROUND: Next-generation sequencing (NGS) has revolutionized almost all fields of biology, agriculture and medicine, and is widely utilized to analyse genetic variation. Over the past decade, the NGS pipeline has been steadily improved, and the entire process is currently relatively straightforward. However, NGS instrumentation still requires upfront library preparation, which can be a laborious process, requiring significant hands-on time. Herein, we present a simple but robust approach to streamline library preparation by utilizing surface bound transposases to construct DNA libraries directly on a flowcell surface. RESULTS: The surface bound transposases directly fragment genomic DNA while simultaneously attaching the library molecules to the flowcell. We sequenced and analysed a Drosophila genome library generated by this surface tagmentation approach, and we showed that our surface bound library quality was comparable to the quality of the library from a commercial kit. In addition to the time and cost savings, our approach does not require PCR amplification of the library, which eliminates potential problems associated with PCR duplicates. CONCLUSIONS: We described the first study to construct libraries directly on a flowcell. We believe our technique could be incorporated into the existing Illumina sequencing pipeline to simplify the workflow, reduce costs, and improve data quality.
Assuntos
Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Genômica , Propriedades de Superfície , Transposases/metabolismoRESUMO
Here we present a summary of the 2014 International Conference on Intelligent Biology and Medicine (ICIBM 2014) and the editorial report of the supplement to BMC Genomics and BMC Systems Biology that includes 20 research articles selected from ICIBM 2014. The conference was held on December 4-6, 2014 at San Antonio, Texas, USA, and included six scientific sessions, four tutorials, four keynote presentations, nine highlight talks, and a poster session that covered cutting-edge research in bioinformatics, systems biology, and computational medicine.
Assuntos
Pesquisa Biomédica/educação , Pesquisa Biomédica/métodos , Biologia Computacional , Comportamento Cooperativo , Humanos , Medicina de PrecisãoRESUMO
BACKGROUND: The horse (Equus ferus caballus) is one of the earliest domesticated species and has played an important role in the development of human societies over the past 5,000 years. In this study, we characterized the genome of the Marwari horse, a rare breed with unique phenotypic characteristics, including inwardly turned ear tips. It is thought to have originated from the crossbreeding of local Indian ponies with Arabian horses beginning in the 12th century. RESULTS: We generated 101 Gb (~30 × coverage) of whole genome sequences from a Marwari horse using the Illumina HiSeq2000 sequencer. The sequences were mapped to the horse reference genome at a mapping rate of ~98% and with ~95% of the genome having at least 10 × coverage. A total of 5.9 million single nucleotide variations, 0.6 million small insertions or deletions, and 2,569 copy number variation blocks were identified. We confirmed a strong Arabian and Mongolian component in the Marwari genome. Novel variants from the Marwari sequences were annotated, and were found to be enriched in olfactory functions. Additionally, we suggest a potential functional genetic variant in the TSHZ1 gene (p.Ala344>Val) associated with the inward-turning ear tip shape of the Marwari horses. CONCLUSIONS: Here, we present an analysis of the Marwari horse genome. This is the first genomic data for an Asian breed, and is an invaluable resource for future studies of genetic variation associated with phenotypes and diseases in horses.
Assuntos
Genoma/genética , Genômica , Cavalos/genética , Análise de Sequência de DNA , Sequência de Aminoácidos , Animais , Evolução Molecular , Variação Genética , Genótipo , Humanos , Hibridização Genética , Masculino , Dados de Sequência Molecular , Fenótipo , Seleção Genética , Especificidade da EspécieRESUMO
Advanced sequencing technologies enable rapid detection of sequence variants, aiming to uncover the molecular foundations of human genetic disorders. The challenge lies in interpreting the influence of new exome variants that lead to diverse phenotypes. Our study introduces a detailed, multi-tiered method for assessing the impact of novel variants, particularly focusing on the zinc finger protein 1 (ZPR1) gene. Herein, we employed a combination of variant effect predictors, protein stability analyses, and the American College of Medical Genetics and Association of Molecular Pathology (ACMG/AMP) guidelines. Our structural analysis pinpoints specific amino acid residues in the ZPR1 zinc finger domains that are sensitive to changes, distinguishing between benign and disease-causing coding variants using rigorous in silico tools. We examined 223 germline ZPR1 exome variants, uncovering significant ethnic disparities in the frequency of heterozygous harmful ZPR1 variants, ranging from 0.04% in the Ashkenazi Jewish population to 0.34% in African/African Americans. Additionally, the discovery of three homozygous carriers in European and South Asian groups suggests a higher occurrence of ZPR1 variants in these demographics, meriting further exploration. This research provides insights into the prevalence and implications of amino acid substitutions in the ZPR1 protein.
RESUMO
The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project aims to aggregate, update, and articulate protein-centric data knowledge for the entire human proteome, with emphasis on the understudied proteins from the three IDG protein families. KMC collates and analyzes data from over 70 resources to compile the Target Central Resource Database (TCRD), which is the web-based informatics platform (Pharos). These data include experimental, computational, and text-mined information on protein structures, compound interactions, and disease and phenotype associations. Based on this knowledge, proteins are classified into different Target Development Levels (TDLs) for identification of understudied targets. Additional work by the KMC focuses on enriching target knowledge and producing DrugCentral and other data visualization tools for expanding investigation of understudied targets.
Assuntos
Genoma , Gestão do Conhecimento , Humanos , Proteoma , Bases de Dados Factuais , InformáticaRESUMO
TIN-X (Target Importance and Novelty eXplorer) is an interactive visualization tool for illuminating associations between diseases and potential drug targets and is publicly available at newdrugtargets.org. TIN-X uses natural language processing to identify disease and protein mentions within PubMed content using previously published tools for named entity recognition (NER) of gene/protein and disease names. Target data is obtained from the Target Central Resource Database (TCRD). Two important metrics, novelty and importance, are computed from this data and when plotted as log(importance) vs. log(novelty), aid the user in visually exploring the novelty of drug targets and their associated importance to diseases. TIN-X Version 3.0 has been significantly improved with an expanded dataset, modernized architecture including a REST API, and an improved user interface (UI). The dataset has been expanded to include not only PubMed publication titles and abstracts, but also full-text articles when available. This results in approximately 9-fold more target/disease associations compared to previous versions of TIN-X. Additionally, the TIN-X database containing this expanded dataset is now hosted in the cloud via Amazon RDS. Recent enhancements to the UI focuses on making it more intuitive for users to find diseases or drug targets of interest while providing a new, sortable table-view mode to accompany the existing plot-view mode. UI improvements also help the user browse the associated PubMed publications to explore and understand the basis of TIN-X's predicted association between a specific disease and a target of interest. While implementing these upgrades, computational resources are balanced between the webserver and the user's web browser to achieve adequate performance while accommodating the expanded dataset. Together, these advances aim to extend the duration that users can benefit from TIN-X while providing both an expanded dataset and new features that researchers can use to better illuminate understudied proteins.
Assuntos
Interface Usuário-Computador , Humanos , Processamento de Linguagem Natural , PubMed , SoftwareRESUMO
ErbB1 overexpression is strongly linked to carcinogenesis, motivating better understanding of erbB1 dimerization and activation. Recent single-particle-tracking data have provided improved measures of dimer lifetimes and strong evidence that transient receptor coconfinement promotes repeated interactions between erbB1 monomers. Here, spatial stochastic simulations explore the potential impact of these parameters on erbB1 phosphorylation kinetics. This rule-based mathematical model incorporates structural evidence for conformational flux of the erbB1 extracellular domains, as well as asymmetrical orientation of erbB1 cytoplasmic kinase domains during dimerization. The asymmetric dimer model considers the theoretical consequences of restricted transactivation of erbB1 receptors within a dimer, where the N-lobe of one monomer docks with the C-lobe of the second monomer and triggers its catalytic activity. The dynamic nature of the erbB1 phosphorylation state is shown by monitoring activation states of individual monomers as they diffuse, bind, and rebind after ligand addition. The model reveals the complex interplay between interacting liganded and nonliganded species and the influence of their distribution and abundance within features of the membrane landscape.
Assuntos
Receptores ErbB/metabolismo , Modelos Biológicos , Membrana Celular/metabolismo , Receptores ErbB/química , Ligantes , Fosforilação , Estrutura Terciária de Proteína , Análise Espacial , Processos EstocásticosRESUMO
Survival and proliferation of immature B lymphocytes requires expression and tonic signaling of the pre-B cell receptor (pre-BCR). This low level, ligand-independent signaling is likely achieved through frequent, but short-lived, homo interactions. Tonic signaling is also central in the pathology of precursor B acute lymphoblastic leukemia (B-ALL). In order to understand how repeated, transient events can lead to sustained signaling and to assess the impact of receptor accumulation induced by the membrane landscape, we developed a spatial stochastic model of receptor aggregation and downstream signaling events. Our rule- and agent-based model builds on previous mature BCR signaling models and incorporates novel parameters derived from single particle tracking of pre-BCR on surfaces of two different B-ALL cell lines, 697 and Nalm6. Live cell tracking of receptors on the two cell lines revealed characteristic differences in their dimer dissociation rates and diffusion coefficients. We report here that these differences affect pre-BCR aggregation and consequent signal initiation events. Receptors on Nalm6 cells, which have a lower off-rate and lower diffusion coefficient, more frequently form higher order oligomers than pre-BCR on 697 cells, resulting in higher levels of downstream phosphorylation in the Nalm6 cell line.
Assuntos
Receptores de Células Precursoras de Linfócitos B , Receptores de Antígenos de Linfócitos B , Receptores de Células Precursoras de Linfócitos B/metabolismo , Receptores de Antígenos de Linfócitos B/metabolismo , Transdução de Sinais , Linhagem Celular , FosforilaçãoRESUMO
With over 5.5 million deaths worldwide attributed to the respiratory disease COVID-19 caused by the novel coronavirus SARS-CoV-2, it is essential that continued efforts be made to track the evolution and spread of the virus globally. The authors previously presented a rapid and cost-effective method to sequence the entire SARS-CoV-2 genome with 95% coverage and 99.9% accuracy. This method is advantageous for identifying and tracking variants in the SARS-CoV-2 genome compared with traditional short-read sequencing methods which can be time-consuming and costly. Herein, the addition of genotyping probes to a DNA chip that targets known SARS-CoV-2 variants is presented. The incorporation of genotyping probe sets along with the advent of a moving average filter improved the sequencing coverage and accuracy of the SARS-CoV-2 genome.
Throughout the COVID-19 pandemic the virus known as SARS-CoV-2 has continued to mutate and evolve. It is imperative to continue to track these mutations and where the virus has traveled to best inform healthcare practices and global strategies to combat the virus. The authors previously developed a method to investigate 95% of this viral genome with 99.9% accuracy that was more cost-effective and less time-consuming than previous methods. In this work, specific markers were added to the technology to allow tracking of mutations in the virus that have already been documented. In doing so, the accuracy and how much of the viral genome can be sequenced was improved.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/genética , Genótipo , Genoma Viral/genéticaRESUMO
A population of Saccharomyces cerevisiae was cultured for approximately 450 generations in the presence of high glucose to select for genetic variants that grew optimally under these conditions. Using the parental strain BY4741 as the starting population, an evolved culture was obtained after aerobic growth in a high glucose medium for approximately 450 generations. After the evolution period, three single colony isolates were selected for analysis. Next-generation Ion Torrent sequencing was used to evaluate genetic changes. Greater than 100 deletion/insertion changes were found with approximately half of these effecting genes. Additionally, over 180 SNPs were identified with more than one-quarter of these resulting in a nonsynonymous mutation. Affymetrix DNA microarrays and RNseq analysis were used to determine differences in gene expression in the evolved strains compared to the parental strain. It was established that approximately 900 genes demonstrated significantly altered expression in the evolved strains relative to the parental strain. Many of these genes showed similar alterations in their expression in all three evolved strains. Interestingly, genes with altered expression in the three evolved strains included genes with a role in oxidative metabolism. Overall these results are consistent with the physiological observations of optimal growth with glucose as the carbon source. Namely, the decreased ethanol production suggest that the underlying metabolism switched from fermentation to respiration during the selection for optimal growth on glucose.
Assuntos
Genoma Fúngico , Glucose/metabolismo , Saccharomyces cerevisiae/crescimento & desenvolvimento , Saccharomyces cerevisiae/genética , Etanol/metabolismo , Evolução Molecular , Perfilação da Expressão Gênica , Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , RNA Fúngico/genética , RNA Mensageiro/análise , Saccharomyces cerevisiae/metabolismo , Análise de Sequência de DNA , Biologia de Sistemas/métodosRESUMO
The molecular scaffold in the yeast pheromone pathway, Ste5, shuttles continuously between the nucleus and the cytoplasm. Ste5 undergoes oligomerization reaction in the nucleus. Upon pheromone stimulation, the Ste5 dimer is rapidly exported out of the nucleus and recruited to the plasma membrane for pathway activation. This clever device on part of the yeast cell is thought to prevent pathway misactivation at high enough levels of Ste5 in the absence of pheromone. We have built a spatiotemporal model of signaling in this pathway to describe its regulation. Our present work underscores the importance of spatial modeling of cell signaling networks to understand their control and functioning.
Assuntos
Modelos Biológicos , Feromônios/metabolismo , Saccharomyces cerevisiae/metabolismo , Transporte Ativo do Núcleo Celular , Proteínas Adaptadoras de Transdução de Sinal/química , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Sistema de Sinalização das MAP Quinases , Conceitos Matemáticos , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
The spatio-temporal landscape of the plasma membrane regulates activation and signal transduction of membrane bound receptors by restricting their two-dimensional mobility and by inducing receptor clustering. This regulation also extends to complex formation between receptors and adaptor proteins, which are the intermediate signaling molecules involved in cellular signaling that relay the received cues from cell surface to cytoplasm and eventually to the nucleus. Although their investigation poses challenging technical difficulties, there is a crucial need to understand the impact of the receptor diffusivity, clustering, and spatial heterogeneity, and of receptor-adaptor protein complex formation on the cellular signal transduction patterns. Building upon our earlier studies, we have developed an adaptive coarse-grained Monte Carlo method that can be used to investigate the role of diffusion, clustering and membrane corralling on receptor association and receptor-adaptor protein complex formation dynamics in three dimensions. The new Monte Carlo lattice based approach allowed us to introduce spatial resolution on the 2-D plasma membrane and to model the cytoplasm in three-dimensions. Being a multi-resolution approach, our new method makes it possible to represent various parts of the cellular system at different levels of detail and enabled us to utilize the locally homogeneous assumption when justified (e.g., cytoplasmic region away from the cell membrane) and avoid its use when high spatial resolution is needed (e.g., cell membrane and cytoplasmic region near the membrane) while keeping the required computational complexity manageable. Our results have shown that diffusion has a significant impact on receptor-receptor dimerization and receptor-adaptor protein complex formation kinetics. We have observed an "adaptor protein hopping" mechanism where the receptor binding proteins may hop between receptors to form short-lived transient complexes. This increased residence time of the adaptor proteins near cell membrane and their ability to frequently change signaling partners may explain the increase in signaling efficiency when receptors are clustered. We also hypothesize that the adaptor protein hopping mechanism can cause concurrent or sequential activation of multiple signaling pathways, thus leading to crosstalk between diverse biological functions.
RESUMO
We previously developed a method of defining receptor clusters in the membrane based on mutual distance and applied it to a set of transmission microscopy images of vascular endothelial growth factor receptors. An optimal length parameter was identified, resulting in cluster identification and a procedure that assigned a geometric shape to each cluster. We showed that the observed particle distribution results were consistent with the random placement of receptors within the clusters and, to a lesser extent, the random placement of the clusters on the cell membrane. Here, we develop and validate a stochastic model of clustering, based on a hypothesis of preexisting domains that have a high affinity for receptors. The proximate objective is to clarify the mechanism behind cluster formation and to estimate the effect on signaling. Receptor-enriched domains may significantly impact signaling pathways that rely on ligand-induced dimerization of receptors. We define a simple statistical model, based on the preexisting domain hypothesis, to predict the probability distribution of cluster sizes. The process yielded sets of parameter values that can readily be used in dynamical calculations as the estimates of the quantitative characteristics of the clustering domains.
RESUMO
We present LT1, the first high-quality human reference genome from the Baltic States. LT1 is a female de novo human reference genome assembly, constructed using 57× nanopore long reads and polished using 47× short paired-end reads. We utilized 72 GB of Hi-C chromosomal mapping data for scaffolding, to maximize assembly contiguity and accuracy. The contig assembly of LT1 was 2.73 Gbp in length, comprising 4490 contigs with an NG50 value of 12.0 Mbp. After scaffolding with Hi-C data and manual curation, the final assembly has an NG50 value of 137 Mbp and 4699 scaffolds. Assessment of gene prediction quality using Benchmarking Universal Single-Copy Orthologs (BUSCO) identified 89.3% of the single-copy orthologous genes included in the benchmark. Detailed characterization of LT1 suggests it has 73,744 predicted transcripts, 4.2 million autosomal SNPs, 974,616 short indels, and 12,079 large structural variants. These data may be used as a benchmark for further in-depth genomic analyses of Baltic populations.
RESUMO
BACKGROUND: Sequencing-by-ligation (SBL) is one of several next-generation sequencing methods that has been developed for massive sequencing of DNA immobilized on arrayed beads (or other clonal amplicons). SBL has the advantage of being easy to implement and accessible to all because it can be performed with off-the-shelf reagents. However, SBL has the limitation of very short read lengths. RESULTS: To overcome the read length limitation, research groups have developed complex library preparation processes, which can be time-consuming, difficult, and result in low complexity libraries. Herein we describe a variation on traditional SBL protocols that extends the number of sequential bases that can be sequenced by using Endonuclease V to nick a query primer, thus leaving a ligatable end extended into the unknown sequence for further SBL cycles. To demonstrate the protocol, we constructed a known DNA sequence and utilized our SBL variation, cyclic SBL (cSBL), to resequence this region. Using our method, we were able to read thirteen contiguous bases in the 3' - 5' direction. CONCLUSIONS: Combining this read length with sequencing in the 5' - 3' direction would allow a read length of over twenty bases on a single tage. Implementing mate-paired tags and this SBL variation could enable > 95% coverage of the genome.
Assuntos
Desoxirribonuclease (Dímero de Pirimidina)/metabolismo , Inosina/análogos & derivados , Oligonucleotídeos/metabolismo , Inosina/metabolismo , ProteóliseRESUMO
With over three million deaths worldwide attributed to the respiratory disease COVID-19 caused by the novel coronavirus SARS-CoV-2, it is essential that continued efforts be made to track the evolution and spread of the virus globally. We previously presented a rapid and cost-effective method to sequence the entire SARS-CoV-2 genome with 95% coverage and 99.9% accuracy. This method is advantageous for identifying and tracking variants in the SARS-CoV-2 genome when compared to traditional short read sequencing methods which can be time consuming and costly. Herein we present the addition of genotyping probes to our DNA chip which target known SARS-CoV-2 variants. The incorporation of the genotyping probe sets along with the advent of a moving average filter have improved our sequencing coverage and accuracy of the SARS-CoV-2 genome.