Search | VHL CLAP/WR-PAHO/WHO

1.

Getting 'Ïψχal' with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles.

Amarasinghe, Piyumi R; Allison, Lloyd; Stuckey, Peter J; Garcia de la Banda, Maria; Lesk, Arthur M; Konagurthu, Arun S.

Bioinformatics ; 39(39 Suppl 1): i357-i367, 2023 06 30.

Article in English | MEDLINE | ID: mdl-37387189

ABSTRACT

The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles (ãÏ,ψ,χ1,χ2,ã) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles (ãχ1,χ2,ã) as a function of backbone ãÏ,ψã conformations. A 'good' model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal (Ïψχal). AVAILABILITY AND IMPLEMENTATION: PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical.

Subject(s)

Data Compression , Libraries , Amino Acids , Gene Library , Information Theory

2.

On the reliability and the limits of inference of amino acid sequence alignments.

Rajapaksa, Sandun; Sumanaweera, Dinithi; Lesk, Arthur M; Allison, Lloyd; Stuckey, Peter J; Garcia de la Banda, Maria; Abramson, David; Konagurthu, Arun S.

Bioinformatics ; 38(Suppl 1): i255-i263, 2022 06 24.

Article in English | MEDLINE | ID: mdl-35758808

ABSTRACT

MOTIVATION: Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS: By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Amino Acids , Proteins , Algorithms , Amino Acid Sequence , Proteins/chemistry , Reproducibility of Results , Sequence Alignment , Sequence Homology, Amino Acid

3.

Invisible leashes: The tethering VAPs from infectious diseases to neurodegeneration.

Dudás, Erika F; Huynen, Martijn A; Lesk, Arthur M; Pastore, Annalisa.

J Biol Chem ; 296: 100421, 2021.

Article in English | MEDLINE | ID: mdl-33609524

ABSTRACT

Intracellular organelles do not, as thought for a long time, act in isolation but are dynamically tethered together by entire machines responsible for interorganelle trafficking and positioning. Among the proteins responsible for tethering is the family of VAMP-associated proteins (VAPs) that appear in all eukaryotes and are localized primarily in the endoplasmic reticulum. The major functional role of VAPs is to tether the endoplasmic reticulum with different organelles and regulate lipid metabolism and transport. VAPs have gained increasing attention because of their role in human pathology where they contribute to infections by viruses and bacteria and participate in neurodegeneration. In this review, we discuss the structure, evolution, and functions of VAPs, focusing more specifically on VAP-B for its relationship with amyotrophic lateral sclerosis and other neurodegenerative diseases.

Subject(s)

Communicable Diseases/metabolism , Neurodegenerative Diseases/metabolism , Vesicular Transport Proteins/metabolism , Animals , Humans , Lipid Metabolism , Mutation , Vesicular Transport Proteins/genetics

4.

Protein structure prediction improves the quality of amino-acid sequence alignment.

Lesk, Arthur M; Konagurthu, Arun S.

Proteins ; 90(12): 2144-2147, 2022 12.

Article in English | MEDLINE | ID: mdl-35754316

ABSTRACT

The basic operation in analysis of protein evolution is alignment: the specification of residue-residue correspondences. A structural alignment is a specification of residue-residue correspondences based on the atomic positions in the structures of two or more proteins. It is well-known that structural alignments are more accurate, over a much wider range of divergence, than pairwise alignments based solely on sequences-for instance computed with the Needleman-Wunsch algorithm with affine gap penalties. Given the amino-acid sequences of two proteins, alignments based solely on the sequences fall into "daylight", "twilight", and "midnight" zones, in which the fidelity of the correspondences diminishes in accuracy, and in strength of ability to distinguish true homology from noise. The success of AlphaFold2 in template-free modeling of three-dimensional structures from one-dimensional amino-acid sequence information implies that: given the amino-acid sequences of two or more proteins, in the absence of experimentally determined structures, reliable alignments-even for very highly diverged proteins-could in many cases be achieved by applying AlphaFold2 to the sequences, and performing structural alignments of the models.

Subject(s)

Algorithms , Proteins , Sequence Alignment , Amino Acid Sequence , Proteins/chemistry

5.

Neighbourhoods in the yeast regulatory network in different physiological states.

Lesk, Arthur M; Konagurthu, Arun S.

Bioinformatics ; 37(4): 551-558, 2021 05 01.

Article in English | MEDLINE | ID: mdl-32976569

ABSTRACT

MOTIVATION: The gene expression regulatory network in yeast controls the selective implementation of the information contained in the genome sequence. We seek to understand how, in different physiological states, the network reconfigures itself to produce a different proteome. RESULTS: This article analyses this reconfiguration, focussing on changes in the local structure of the network. In particular, we define, extract and compare the 1-neighbourhoods of each transcription factor, where a 1-neighbourhood of a node in a network is the minimal subgraph of the network containing all nodes connected to the central node by an edge. We report the similarities and differences in the topologies and connectivities of these neighbourhoods in five physiological states for which data are available: cell cycle, DNA damage, stress response, diauxic shift and sporulation. Based on our analysis, it seems apt to regard the components of the regulatory network as 'software', and the responses to changes in state, 'reprogramming'.

Subject(s)

Gene Regulatory Networks , Saccharomyces cerevisiae , Cell Cycle , Saccharomyces cerevisiae/genetics , Software , Transcription Factors/genetics

6.

Computer modeling of a potential agent against SARS-Cov-2 (COVID-19) protease.

Lesk, Arthur M; Konagurthu, Arun S; Allison, Lloyd; Garcia de la Banda, Maria; Stuckey, Peter J; Abramson, David.

Proteins ; 88(12): 1557-1558, 2020 12.

Article in English | MEDLINE | ID: mdl-32662915

ABSTRACT

We have modeled modifications of a known ligand to the SARS-CoV-2 (COVID-19) protease, that can form a covalent adduct, plus additional ligand-protein hydrogen bonds.

Subject(s)

Antiviral Agents , Aphids , Coronavirus Infections , Insecticides , Pandemics , Pneumonia, Viral , Acetylcholinesterase , Animals , Betacoronavirus , COVID-19 , Cysteine Endopeptidases , Humans , Molecular Docking Simulation , Protease Inhibitors , SARS-CoV-2 , Viral Nonstructural Proteins

7.

Statistical inference of protein structural alignments using information and compression.

Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S.

Bioinformatics ; 33(7): 1005-1013, 2017 04 01.

Article in English | MEDLINE | ID: mdl-28065899

ABSTRACT

Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . Contact: arun.konagurthu@monash.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Subject(s)

Data Compression , Models, Statistical , Proteins/chemistry , Sequence Alignment , Algorithms , Bayes Theorem , Reproducibility of Results , Software

8.

A new statistical framework to assess structural alignment quality using information compression.

Collier, James H; Allison, Lloyd; Lesk, Arthur M; Garcia de la Banda, Maria; Konagurthu, Arun S.

Bioinformatics ; 30(17): i512-8, 2014 Sep 01.

Article in English | MEDLINE | ID: mdl-25161241

ABSTRACT

MOTIVATION: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field. RESULTS: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field. AVAILABILITY: http://lcb.infotech.monash.edu.au/I-value. SUPPLEMENTARY INFORMATION: Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html.

Subject(s)

Structural Homology, Protein , Algorithms , Data Compression , Data Interpretation, Statistical

9.

Sizes of interface residues account for cross-class binding affinity patterns in Eph receptor-ephrin families.

Guo, Fei-Yi; Lesk, Arthur M.

Proteins ; 82(3): 349-53, 2014 Mar.

Article in English | MEDLINE | ID: mdl-24105818

ABSTRACT

Eph receptors comprise the largest known family of receptor tyrosine kinases in mammals. They bind members of a second family, the ephrins. As both Eph receptors and ephrins are membrane bound, interactions permit unusual bidirectional cell-cell signaling. Eph receptors and ephrins each form two classes, A and B, based on sequences, structures, and patterns of affinity: Class A Eph receptors bind class A ephrins, and class B Eph receptors bind class B ephrins. The only known exceptions are the receptor EphA4, which can bind ephrinB2 and ephrinB3 in addition to the ephrin-As (Bowden et al., Structure 2009;17:1386-1397); and EphB2, which can bind ephrin-A5 in addition to the ephrin-Bs (Himanen et al., Nat Neurosci 2004;7:501-509). A crystal structure is available of the interacting domains of the EphA4-ephrin B2 complex (wwPDB entry 2WO2) (Bowden et al., Structure 2009;17:1386-1397). In this complex, the ligand-binding domain of EphA4 adopts an EphB-like conformation. To understand why other cross-class EphA receptor-ephrinB complexes do not form, we modeled hypothetical complexes between (1) EphA4-ephrinB1, (2) EphA4-ephrinB3, and (3) EphA2-ephrinB2. We identify particular residues in the interface region, the size variations of which cause steric clashes that prevent formation of the unobserved complexes. The sizes of the sidechains of residues at these positions correlate with the pattern of binding affinity.

Subject(s)

Ephrins/chemistry , Ephrins/metabolism , Receptor, EphA4/chemistry , Receptor, EphA4/metabolism , Amino Acid Sequence , Humans , Models, Molecular , Molecular Sequence Data , Sequence Alignment , Surface Properties

10.

How precise are reported protein coordinate data?

Konagurthu, Arun S; Allison, Lloyd; Abramson, David; Stuckey, Peter J; Lesk, Arthur M.

Acta Crystallogr D Biol Crystallogr ; 70(Pt 3): 904-6, 2014 Mar.

Article in English | MEDLINE | ID: mdl-24598758

ABSTRACT

Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally reported to greater precision than the experimental structure determinations have actually achieved. By using information theory and data compression to study the compressibility of protein atomic coordinates, it is possible to quantify the amount of randomness in the coordinate data and thereby to determine the realistic precision of the reported coordinates. On average, the value of each C(α) coordinate in a set of selected protein structures solved at a variety of resolutions is good to about 0.1âÅ.

Subject(s)

Databases, Protein/standards , User-Computer Interface , Crystallography, X-Ray/standards , Dictionaries, Chemical as Topic , Magnetic Resonance Spectroscopy/standards , Microscopy, Electron/standards , Predictive Value of Tests , Random Allocation

11.

Sequencing the nuclear genome of the extinct woolly mammoth.

Miller, Webb; Drautz, Daniela I; Ratan, Aakrosh; Pusey, Barbara; Qi, Ji; Lesk, Arthur M; Tomsho, Lynn P; Packard, Michael D; Zhao, Fangqing; Sher, Andrei; Tikhonov, Alexei; Raney, Brian; Patterson, Nick; Lindblad-Toh, Kerstin; Lander, Eric S; Knight, James R; Irzyk, Gerard P; Fredrikson, Karin M; Harkins, Timothy T; Sheridan, Sharon; Pringle, Tom; Schuster, Stephan C.

Nature ; 456(7220): 387-90, 2008 Nov 20.

Article in English | MEDLINE | ID: mdl-19020620

ABSTRACT

In 1994, two independent groups extracted DNA from several Pleistocene epoch mammoths and noted differences among individual specimens. Subsequently, DNA sequences have been published for a number of extinct species. However, such ancient DNA is often fragmented and damaged, and studies to date have typically focused on short mitochondrial sequences, never yielding more than a fraction of a per cent of any nuclear genome. Here we describe 4.17 billion bases (Gb) of sequence from several mammoth specimens, 3.3 billion (80%) of which are from the woolly mammoth (Mammuthus primigenius) genome and thus comprise an extensive set of genome-wide sequence from an extinct species. Our data support earlier reports that elephantid genomes exceed 4 Gb. The estimated divergence rate between mammoth and African elephant is half of that between human and chimpanzee. The observed number of nucleotide differences between two particular mammoths was approximately one-eighth of that between one of them and the African elephant, corresponding to a separation between the mammoths of 1.5-2.0 Myr. The estimated probability that orthologous elephant and mammoth amino acids differ is 0.002, corresponding to about one residue per protein. Differences were discovered between mammoth and African elephant in amino-acid positions that are otherwise invariant over several billion years of combined mammalian evolution. This study shows that nuclear genome sequencing of extinct species can reveal population differences not evident from the fossil record, and perhaps even discover genetic factors that affect extinction.

Subject(s)

Cell Nucleus/genetics , Elephants/genetics , Evolution, Molecular , Extinction, Biological , Fossils , Genome/genetics , Genomics , Sequence Analysis, DNA/methods , Africa , Animals , Conserved Sequence/genetics , Elephants/anatomy & histology , Female , Hair/metabolism , Humans , India , Male , Phylogeny

12.

Super: a web server to rapidly screen superposable oligopeptide fragments from the protein data bank.

Collier, James H; Lesk, Arthur M; Garcia de la Banda, Maria; Konagurthu, Arun S.

Nucleic Acids Res ; 40(Web Server issue): W334-9, 2012 Jul.

Article in English | MEDLINE | ID: mdl-22638586

ABSTRACT

Searching for well-fitting 3D oligopeptide fragments within a large collection of protein structures is an important task central to many analyses involving protein structures. This article reports a new web server, Super, dedicated to the task of rapidly screening the protein data bank (PDB) to identify all fragments that superpose with a query under a prespecified threshold of root-mean-square deviation (RMSD). Super relies on efficiently computing a mathematical bound on the commonly used structural similarity measure, RMSD of superposition. This allows the server to filter out a large proportion of fragments that are unrelated to the query; >99% of the total number of fragments in some cases. For a typical query, Super scans the current PDB containing over 80,500 structures (with â¼40 million potential oligopeptide fragments to match) in under a minute. Super web server is freely accessible from: http://lcb.infotech.monash.edu.au/super.

Subject(s)

Oligopeptides/chemistry , Software , Algorithms , Databases, Protein , Internet , Peptide Fragments/chemistry , User-Computer Interface

13.

Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil).

Miller, Webb; Hayes, Vanessa M; Ratan, Aakrosh; Petersen, Desiree C; Wittekindt, Nicola E; Miller, Jason; Walenz, Brian; Knight, James; Qi, Ji; Zhao, Fangqing; Wang, Qingyu; Bedoya-Reina, Oscar C; Katiyar, Neerja; Tomsho, Lynn P; Kasson, Lindsay McClellan; Hardie, Rae-Anne; Woodbridge, Paula; Tindall, Elizabeth A; Bertelsen, Mads Frost; Dixon, Dale; Pyecroft, Stephen; Helgen, Kristofer M; Lesk, Arthur M; Pringle, Thomas H; Patterson, Nick; Zhang, Yu; Kreiss, Alexandre; Woods, Gregory M; Jones, Menna E; Schuster, Stephan C.

Proc Natl Acad Sci U S A ; 108(30): 12348-53, 2011 Jul 26.

Article in English | MEDLINE | ID: mdl-21709235

ABSTRACT

The Tasmanian devil (Sarcophilus harrisii) is threatened with extinction because of a contagious cancer known as Devil Facial Tumor Disease. The inability to mount an immune response and to reject these tumors might be caused by a lack of genetic diversity within a dwindling population. Here we report a whole-genome analysis of two animals originating from extreme northwest and southeast Tasmania, the maximal geographic spread, together with the genome from a tumor taken from one of them. A 3.3-Gb de novo assembly of the sequence data from two complementary next-generation sequencing platforms was used to identify 1 million polymorphic genomic positions, roughly one-quarter of the number observed between two genetically distant human genomes. Analysis of 14 complete mitochondrial genomes from current and museum specimens, as well as mitochondrial and nuclear SNP markers in 175 animals, suggests that the observed low genetic diversity in today's population preceded the Devil Facial Tumor Disease disease outbreak by at least 100 y. Using a genetically characterized breeding stock based on the genome sequence will enable preservation of the extant genetic diversity in future Tasmanian devil populations.

Subject(s)

Genetic Variation , Marsupialia/genetics , Animals , Breeding , DNA, Mitochondrial/genetics , DNA, Neoplasm/genetics , Extinction, Biological , Facial Neoplasms/genetics , Facial Neoplasms/veterinary , Genetics, Population , Genome, Mitochondrial , Humans , Models, Molecular , Molecular Sequence Data , Neoplasm Proteins/chemistry , Neoplasm Proteins/genetics , Neoplasms/genetics , Neoplasms/veterinary , Phylogeny , Polymorphism, Single Nucleotide , Tasmania , Time Factors

14.

Minimum message length inference of secondary structure from protein coordinate data.

Konagurthu, Arun S; Lesk, Arthur M; Allison, Lloyd.

Bioinformatics ; 28(12): i97-105, 2012 Jun 15.

Article in English | MEDLINE | ID: mdl-22689785

ABSTRACT

MOTIVATION: Secondary structure underpins the folding pattern and architecture of most proteins. Accurate assignment of the secondary structure elements is therefore an important problem. Although many approximate solutions of the secondary structure assignment problem exist, the statement of the problem has resisted a consistent and mathematically rigorous definition. A variety of comparative studies have highlighted major disagreements in the way the available methods define and assign secondary structure to coordinate data. RESULTS: We report a new method to infer secondary structure based on the Bayesian method of minimum message length inference. It treats assignments of secondary structure as hypotheses that explain the given coordinate data. The method seeks to maximize the joint probability of a hypothesis and the data. There is a natural null hypothesis and any assignment that cannot better it is unacceptable. We developed a program SST based on this approach and compared it with popular programs, such as DSSP and STRIDE among others. Our evaluation suggests that SST gives reliable assignments even on low-resolution structures. AVAILABILITY: http://www.csse.monash.edu.au/~karun/sst.

Subject(s)

Computational Biology/methods , Protein Structure, Secondary , Proteins/analysis , Algorithms , Bayes Theorem , Models, Theoretical

15.

Sequence and structure alignments in post-AlphaFold era.

Rajapaksa, Sandun; Konagurthu, Arun S; Lesk, Arthur M.

Curr Opin Struct Biol ; 79: 102539, 2023 04.

Article in English | MEDLINE | ID: mdl-36753924

ABSTRACT

Sequence alignment is fundamental for analyzing protein structure and function. For all but closely-related proteins, alignments based on structures are more accurate than alignments based purely on amino-acid sequences. However, the disparity between the large amount of sequence data and the relative paucity of experimentally-determined structures has precluded the general applicability of structure alignment. Based on the success of AlphaFold (and its likes) in producing high-quality structure predictions, we suggest that when aligning homologous proteins, lacking experimental structures, better results can be obtained by a structural alignment of predicted structures than by an alignment based only on amino-acid sequences. We present a quantitative evaluation, based on pairwise alignments of sequences and structures (both predicted and experimental) to support this hypothesis.

Subject(s)

Algorithms , Proteins , Proteins/chemistry , Amino Acid Sequence , Sequence Alignment

16.

Piecewise linear approximation of protein structures using the principle of minimum message length.

Konagurthu, Arun S; Allison, Lloyd; Stuckey, Peter J; Lesk, Arthur M.

Bioinformatics ; 27(13): i43-51, 2011 Jul 01.

Article in English | MEDLINE | ID: mdl-21685100

ABSTRACT

UNLABELLED: Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure. AVAILABILITY: http://www.csse.monash.edu.au/~karun/pmml.

Subject(s)

Algorithms , Proteins/chemistry , Clostridium/chemistry , Hydrogen Bonding , Models, Molecular , Protein Folding , Protein Structure, Secondary , Proteins/metabolism

17.

Three-dimensional Structure Databases of Biological Macromolecules.

Waman, Vaishali P; Orengo, Christine; Kleywegt, Gerard J; Lesk, Arthur M.

Methods Mol Biol ; 2449: 43-91, 2022.

Article in English | MEDLINE | ID: mdl-35507259

ABSTRACT

Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a) Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results. (b) Information-retrieval tools to allow searching to identify entries of interest and provide access to them. (c) Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction. (d) Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and "helpdesks," as well as links to external software.Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains.

Subject(s)

Proteins , Software , Computational Biology , Databases, Protein , Protein Conformation , Proteins/chemistry

18.

Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial genomes.

Gilbert, M Thomas P; Drautz, Daniela I; Lesk, Arthur M; Ho, Simon Y W; Qi, Ji; Ratan, Aakrosh; Hsu, Chih-Hao; Sher, Andrei; Dalén, Love; Götherström, Anders; Tomsho, Lynn P; Rendulic, Snjezana; Packard, Michael; Campos, Paula F; Kuznetsova, Tatyana V; Shidlovskiy, Fyodor; Tikhonov, Alexei; Willerslev, Eske; Iacumin, Paola; Buigues, Bernard; Ericson, Per G P; Germonpré, Mietje; Kosintsev, Pavel; Nikolaev, Vladimir; Nowak-Kemp, Malgosia; Knight, James R; Irzyk, Gerard P; Perbost, Clotilde S; Fredrikson, Karin M; Harkins, Timothy T; Sheridan, Sharon; Miller, Webb; Schuster, Stephan C.

Proc Natl Acad Sci U S A ; 105(24): 8327-32, 2008 Jun 17.

Article in English | MEDLINE | ID: mdl-18541911

ABSTRACT

We report five new complete mitochondrial DNA (mtDNA) genomes of Siberian woolly mammoth (Mammuthus primigenius), sequenced with up to 73-fold coverage from DNA extracted from hair shaft material. Three of the sequences present the first complete mtDNA genomes of mammoth clade II. Analysis of these and 13 recently published mtDNA genomes demonstrates the existence of two apparently sympatric mtDNA clades that exhibit high interclade divergence. The analytical power afforded by the analysis of the complete mtDNA genomes reveals a surprisingly ancient coalescence age of the two clades, approximately 1-2 million years, depending on the calibration technique. Furthermore, statistical analysis of the temporal distribution of the (14)C ages of these and previously identified members of the two mammoth clades suggests that clade II went extinct before clade I. Modeling of protein structures failed to indicate any important functional difference between genomes belonging to the two clades, suggesting that the loss of clade II more likely is due to genetic drift than a selective sweep.

Subject(s)

Elephants/classification , Elephants/genetics , Genome, Mitochondrial , Paleontology , Phylogeny , Animals , Base Sequence , DNA, Mitochondrial/analysis , DNA, Mitochondrial/genetics , Genetic Variation , Hair/chemistry , Molecular Sequence Data , Sequence Analysis, DNA

19.

Paths Through the Yeast Regulatory Network in Different Physiological States.

Lesk, Arthur M; Konagurthu, Arun S.

J Mol Biol ; 433(21): 167181, 2021 10 15.

Article in English | MEDLINE | ID: mdl-34339724

ABSTRACT

We analyse paths through the regulatory networks that control gene-expression patterns in Yeast, in five different physiological states: cell cycle, DNA damage, stress response, diauxic shift, and sporulation. The network in each state is specified as a directed graph, containing different sets of edges connecting pairs selected from a combined set of 1475 nodes. Each network contains some nodes that have no parents, and others that have no children. We call these, respectively, 'source' and 'sink' nodes. For each network we enumerate paths between source and sink nodes. In a previous paper (Lesk and Konagurthu, 2020), we defined, extracted and compared the neighbourhoods of each transcription factor in different physiological states, and how the system reconfigures itself. Here we compare the usage of nodes and edges by different networks, and how they are assembled into paths. The picture that emerges is that the networks are not disjoint but show substantial sharing of nodes and edges; however, they assemble these materials into different sets of paths. Four of the networks, other than the cell-cycle network, contain paths between only a small fraction (<13%) of possible source-sink pairs. Although the cell-cycle network is not an outlier in terms of total number of nodes and edges, and number of sink nodes, it is very much an outlier in having a greater proportion of source-to-sink paths than the other networks.

Subject(s)

Cell Cycle/genetics , Gene Regulatory Networks , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , Stress, Physiological/genetics , Transcription Factors/genetics , Computational Biology/methods , DNA Damage , Gene Expression Profiling , Gene Expression Regulation, Fungal , Gene Ontology , Molecular Sequence Annotation , Saccharomyces cerevisiae/growth & development , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/classification , Saccharomyces cerevisiae Proteins/metabolism , Signal Transduction , Spores, Fungal/genetics , Spores, Fungal/growth & development , Spores, Fungal/metabolism , Transcription Factors/classification , Transcription Factors/metabolism

20.

Cataloging topologies of protein folding patterns.

Konagurthu, Arun S; Lesk, Arthur M.

J Mol Recognit ; 23(2): 253-7, 2010.

Article in English | MEDLINE | ID: mdl-20151416

ABSTRACT

Comparing and classifying protein folding patterns allows organizing the known structures, structure search and retrieval, and investigation of general principles of protein architecture. We have been developing a concise tableau representation of protein folding patterns, based on the order and contact patterns of elements of secondary structure: helices and strands of sheet (Lesk, 1995; Kamat and Lesk, 2007; Konagurthu et al., 2008). The tableaux provide a database, derived from the world-wide protein data bank, mineable in studies of protein architecture, including: (i) determination of statistical properties of secondary structure contacts in an unbiased set of protein domains, (ii) investigations of the range of, and relationships among, protein topologies, (iii) investigation of the relationship between local structure of proteins and the complete folding topology, (iv) potential for fold identification from amino acid sequence, and (v) the basis for a complete enumeration of possible protein folding patterns, which can be compared with the corpus of known structures.

Subject(s)

Protein Folding , Protein Structure, Secondary , Proteins/chemistry , Databases, Protein , Models, Molecular , Software , Thermodynamics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL