RESUMO
The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.
Assuntos
Balaenoptera , Neoplasias , Animais , Balaenoptera/genética , Duplicações Segmentares Genômicas , Genoma , Demografia , Neoplasias/genéticaRESUMO
In this study the primary objective is to design prediction model for the free vibration analysis of thin circular cylindrical steel silos having various aspect ratios in empty and varying filled conditions for different types of closures. A finite element method (FEM) is used to carry out the free vibration analysis of steel silos. It is found that the effect of different aspect ratios slender, intermediate slender and squat steel silos is very significant for dynamic response of silo. The silos are considered having open, flat and cone type of closures at its top end. The clamped-free (cantilever) boundary condition is taken for this approach as actual silos are fixed at flat-base. The structural mass of thin cylindrical shell steel silo is made constant for a particular height and diameter. The eigenvalues of the thin cylindrical shell steel silos are extracted by using block Lanczos method. The free vibrations of thin cylindrical shell steel silo with different aspect ratios, radius to thickness ratio are studied. From the present studies it is seen that as aspect ratio increases the fundamental frequency is reduced in empty silo. It is more in the case of squat silo. It can be seen that the fundamental frequency is less in the case of flat closure in all the aspect ratios of the silo. The frequency values are more in the case of cone closure is observed. Also as the mode number increases the modal frequency value increases. Further, as the filling level is increased the modal frequency also increases. Finally, regression approach is adopted for predicting the mode frequency of empty and filled silos for wide range of aspect ratios.
RESUMO
BACKGROUND: The Nile rat (Avicanthis niloticus) is an important animal model because of its robust diurnal rhythm, a cone-rich retina, and a propensity to develop diet-induced diabetes without chemical or genetic modifications. A closer similarity to humans in these aspects, compared to the widely used Mus musculus and Rattus norvegicus models, holds the promise of better translation of research findings to the clinic. RESULTS: We report a 2.5 Gb, chromosome-level reference genome assembly with fully resolved parental haplotypes, generated with the Vertebrate Genomes Project (VGP). The assembly is highly contiguous, with contig N50 of 11.1 Mb, scaffold N50 of 83 Mb, and 95.2% of the sequence assigned to chromosomes. We used a novel workflow to identify 3613 segmental duplications and quantify duplicated genes. Comparative analyses revealed unique genomic features of the Nile rat, including some that affect genes associated with type 2 diabetes and metabolic dysfunctions. We discuss 14 genes that are heterozygous in the Nile rat or highly diverged from the house mouse. CONCLUSIONS: Our findings reflect the exceptional level of genomic resolution present in this assembly, which will greatly expand the potential of the Nile rat as a model organism.
Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Animais , Haplótipos , Diabetes Mellitus Tipo 2/genética , Murinae , Genoma , GenômicaRESUMO
Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method.
RESUMO
Introduction: Type 2 Diabetes Mellitus (T2DM) is increasing in epidemic proportions. In addition to the morbidity and mortality, for those treated with insulin, the physical, psychological, and financial tolls are often greater. Our real-world study evaluated a Low Carbohydrate Diet (LCD) in patients with T2DM on insulin with respect to glycemic control, insulin reduction, and weight loss. Materials and Methods: A prospective cohort study was conducted via an Electronic Medical Record search for patients attending the Virginia Commonwealth University Medical Weight Loss Program from 2014 to 2020 with Type 2 Diabetes Mellitus who initially presented on insulin. Data was extracted for 1 year after enrollment. The weight loss program focuses on a LCD. Results: Of 185 participants, the mean (± SD) age was 56.1 (9.9) years. Seventy percent were female and 63% were black. Eighty-five completed 12 months (45.9%), reduced their median (25-75% interquartile range, IQR) insulin dose from 69 to 0 units (0-18, p < 0.0001), HbA1c from 8 to 6.9% (6.2-7.8, p < 0.0001), and weight from 116 to 99 kg (85-120, p < 001). Eighty six percent who completed 12 months were able to reduce or discontinue insulin, with 70.6% completely discontinuing. Among all participants who completed 3, 6, or 12 months, 97.6% were able to reduce or eliminate insulin use. Conclusion: In patients with T2DM on a LCD, it is possible to reduce and even discontinue insulin use while facilitating weight loss and achieving glycemic control. A Low Carbohydrate Diet should be offered to all patients with diabetes, especially those using insulin.
RESUMO
Transcatheter aortic valve implantation has emerged as a therapeutic option for patients with symptomatic severe aortic stenosis who are inoperable, or at very high risk of open-heart surgery. Recently, we encountered a patient with aortic stenosis and Larsen syndrome, who had short stature, obesity, kyphoscoliosis, multiple musculoskeletal deformities, and severe restrictive lung disease. An open-heart surgery in such a patient involves substantial peri-operative risk. A successful transcaval aortic valve implantation was done under general anesthesia.
RESUMO
Protein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA's feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling.
Assuntos
Aprendizado Profundo , Proteínas/química , Alinhamento de Sequência/métodos , Caspases/química , Caspases/genética , Modelos Moleculares , Redes Neurais de Computação , Domínios e Motivos de Interação entre Proteínas , Estrutura Terciária de Proteína , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de ProteínaRESUMO
MOTIVATION: Protein structure prediction remains as one of the most important problems in computational biology and biophysics. In the past few years, protein residue-residue contact prediction has undergone substantial improvement, which has made it a critical driving force for successful protein structure prediction. Boosting the accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. RESULTS: We show a novel contact map refinement method, ContactGAN, which uses Generative Adversarial Networks (GAN). ContactGAN was able to make a significant improvement over predictions made by recent contact prediction methods when tested on three datasets including protein structure modeling targets in CASP13 and CASP14. We show improvement of precision in contact prediction, which translated into improvement in the accuracy of protein tertiary structure models. On the other hand, observed improvement over trRosetta was relatively small, reasons for which are discussed. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy. AVAILABILITY AND IMPLEMENTATION: https://github.com/kiharalab/ContactGAN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
With advancements in synthetic biology, the cost and the time needed for designing and synthesizing customized gene products have been steadily decreasing. Many research laboratories in academia as well as industry routinely create genetically engineered proteins as a part of their research activities. However, manipulation of protein sequences could result in unintentional production of toxic proteins. Therefore, being able to identify the toxicity of a protein before the synthesis would reduce the risk of potential hazards. Existing methods are too specific, which limits their application. Here, we extended general function prediction methods for predicting the toxicity of proteins. Protein function prediction methods have been actively studied in the bioinformatics community and have shown significant improvement over the last decade. We have previously developed successful function prediction methods, which were shown to be among top-performing methods in the community-wide functional annotation experiment, CAFA. Based on our function prediction method, we developed a neural network model, named NNTox, which uses predicted GO terms for a target protein to further predict the possibility of the protein being toxic. We have also developed a multi-label model, which can predict the specific toxicity type of the query sequence. Together, this work analyses the relationship between GO terms and protein toxicity and builds predictor models of protein toxicity.
Assuntos
Redes Neurais de Computação , Proteínas/química , Análise de Sequência de Proteína/métodos , Toxinas Biológicas/química , Animais , Ontologia Genética , Humanos , Proteínas/genética , Proteínas/toxicidade , Software , Toxinas Biológicas/genética , Toxinas Biológicas/toxicidadeRESUMO
Intrinsically disordered proteins (IDPs) or regions (IDRs) perform diverse cellular functions, but are also prone to forming promiscuous and potentially deleterious interactions. We investigate the extent to which the properties of, and content in, IDRs have adapted to enable functional diversity while limiting interference from promiscuous interactions in the crowded cellular environment. Information on protein sequences, their predicted intrinsic disorder, and 3D structure contents is related to data on protein cellular concentrations, gene co-expression, and protein-protein interactions in the well-studied yeast Saccharomyces cerevisiae. Results reveal that both the protein IDR content and the frequency of "sticky" amino acids in IDRs (those more frequently involved in protein interfaces) decrease with increasing protein cellular concentration. This implies that the IDR content and the amino acid composition of IDRs experience negative selection as the protein concentration increases. In the S. cerevisiae protein-protein interaction network, the higher a protein's IDR content, the more frequently it interacts with IDR-containing partners, and the more functionally diverse the partners are. Employing a clustering analysis of Gene Ontology terms, we newly identify ~600 putative multifunctional proteins in S. cerevisiae. Strikingly, these proteins are enriched in IDRs and contribute significantly to all the observed trends. In particular, IDRs of multi-functional proteins feature more sticky amino acids than IDRs of their non-multifunctional counterparts, or the surfaces of structured yeast proteins. This property likely affords sufficient binding affinity for the functional interactions, commonly mediated by short IDR segments, thereby counterbalancing the loss in overall IDR conformational entropy upon binding.
Assuntos
Proteínas Intrinsicamente Desordenadas/metabolismo , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/químicaRESUMO
MOTIVATION: Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. RESULTS: Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP's predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. AVAILABILITY AND IMPLEMENTATION: Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional , Filogenia , Bases de Dados de Proteínas , Proteínas , Análise de Sequência de ProteínaRESUMO
MOTIVATION: Biological experiments including proteomics and transcriptomics approaches often reveal sets of proteins that are most likely to be involved in a disease/disorder. To understand the functional nature of a set of proteins, it is important to capture the function of the proteins as a group, even in cases where function of individual proteins is not known. In this work, we propose a model that takes groups of proteins found to work together in a certain biological context, integrates them into functional relevance networks, and subsequently employs an iterative inference on graphical models to identify group functions of the proteins, which are then extended to predict function of individual proteins. RESULTS: The proposed algorithm, iterative group function prediction (iGFP), depicts proteins as a graph that represents functional relevance of proteins considering their known functional, proteomics and transcriptional features. Proteins in the graph will be clustered into groups by their mutual functional relevance, which is iteratively updated using a probabilistic graphical model, the conditional random field. iGFP showed robust accuracy even when substantial amount of GO annotations were missing. The perspective of 'group' function annotation opens up novel approaches for understanding functional nature of proteins in biological systems.Availability and implementation: http://kiharalab.org/iGFP/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional , Algoritmos , Proteínas , ProteômicaRESUMO
Moonlighting proteins is an emerging concept for considering protein functions, which indicate proteins with two or more independent and distinct functions. An increasing number of moonlighting proteins have been reported in the past years; however, a systematic study of the topic has been hindered because the secondary functions of proteins are usually found serendipitously by experiments. Toward systematic identification and study of moonlighting proteins, computational methods for identifying moonlighting proteins from several different information sources, database entries, literature, and large-scale omics data have been developed. In this study, an overview for finding moonlighting proteins is discussed. Then, the literature-mining method, DextMP, is applied to find new moonlighting proteins in three genomes, Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. Potential moonlighting proteins identified by DextMP are further examined by a two-step manual literature checking procedure, which finally yielded 13 new moonlighting proteins. Identified moonlighting proteins are categorized into two classes based on the clarity of the distinctness of two functions of the proteins. A few cases of the identified moonlighting proteins are described in detail. Further direction for improving the DextMP algorithm is also discussed.
Assuntos
Mineração de Dados/métodos , Genômica/métodos , Animais , Arabidopsis/genética , Caenorhabditis elegans/genética , Drosophila melanogaster/genéticaRESUMO
The prevalence of epilepsy worldwide is around 0.5%-2% of the population. Antiepileptic medications are the first line of treatment in most of the cases but approximately 25%-30% epilepsy patients are refractory to the single or combination therapy. The surgical option for temporal lobe epilepsy is temporal lobectomy, which has its inherent risk of neurological deficits after the surgery. Patients who are either refractory to combination therapy or do not want surgical temporal lobectomy are the candidates for electrical stimulation therapy. Refractory cases require implantable device such as vagal nerve stimulator (VNS). We are reporting perioperative management of a patient, with an implanted VNS, posted for pericardiectomy. It is important for the anesthesiologist to be familiar with the mechanism of VNS for proper perioperative care.
Assuntos
Procedimentos Cirúrgicos Cardíacos/métodos , Epilepsia/terapia , Assistência Perioperatória , Estimulação do Nervo Vago , Adulto , Anestesia/métodos , Humanos , Masculino , Estimulação do Nervo Vago/instrumentaçãoRESUMO
Genome mapping involves the confinement of long DNA molecules, in excess of 150 kilobase pairs, in nanochannels near the circa 50 nm persistence length of DNA. The fidelity of the map relies on the assumption that the DNA is linearized by channel confinement, which assumes the absence of knots. We have computed the probability of forming different knot types and the size of these knots for long chains (approximately 164 kilobase pairs) via pruned-enriched Rosenbluth method simulations of a discrete wormlike chain model of DNA in channel sizes ranging from 35 nm to 60 nm. Compared to prior simulations of short DNA in similar confinement, these long molecules exhibit both complex knots, with up to seven crossings, and multiple knots per chain. The knotting probability is a very strong function of channel size, ranging from 0.3% to 60%, and rationalized in the context of Odijk's theory for confined semiflexible chains. Overall, the knotting probability and knot size obtained from these equilibrium measurements are not consistent with experimental measurements of the properties of anomalously bright regions along the DNA backbone during genome mapping experiments. This result suggests that these events in experiments are either knots formed during the processing of the DNA prior to injection into the nanochannel or regions of locally high DNA concentration without a topological constraint. If so, knots during genome mapping are not an intrinsic problem for genome mapping technology.
RESUMO
We have developed a multi-scale model describing the dynamics of internal segments of DNA in nanochannels used for genome mapping. In addition to the channel geometry, the model takes as its inputs the DNA properties in free solution (persistence length, effective width, molecular weight, and segmental hydrodynamic radius) and buffer properties (temperature and viscosity). Using pruned-enriched Rosenbluth simulations of a discrete wormlike chain model with circa 10 base pair resolution and a numerical solution for the hydrodynamic interactions in confinement, we convert these experimentally available inputs into the necessary parameters for a one-dimensional, Rouse-like model of the confined chain. The resulting coarse-grained model resolves the DNA at a length scale of approximately 6 kilobase pairs in the absence of any global hairpin folds, and is readily studied using a normal-mode analysis or Brownian dynamics simulations. The Rouse-like model successfully reproduces both the trends and order of magnitude of the relaxation time of the distance between labeled segments of DNA obtained in experiments. The model also provides insights that are not readily accessible from experiments, such as the role of the molecular weight of the DNA and location of the labeled segments that impact the statistical models used to construct genome maps from data acquired in nanochannels. The multi-scale approach used here, while focused towards a technologically relevant scenario, is readily adapted to other channel sizes and polymers.
RESUMO
We use Brownian dynamics with hydrodynamic interactions to calculate both the Kirkwood (short-time) diffusivity and the long-time diffusivity of DNA chains from free solution down to channel confinement in the de Gennes regime. The Kirkwood diffusivity in confinement is always higher than the diffusivity obtained from the mean-squared displacement of the center-of-mass, as is the case in free solution. Moreover, the divergence of the local diffusion tensor, which is non-zero in confinement, makes a negligible contribution to the latter diffusivity in confinement. The maximum error in the Kirkwood approximation in our simulations is about 2% for experimentally relevant simulation times. The error decreases with increasing confinement, consistent with arguments from blob theory and the molecular-weight dependence of the error in free solution. In light of the typical experimental errors in measuring the properties of channel-confined DNA, our results suggest that the Kirkwood approximation is sufficiently accurate to model experimental data.
RESUMO
Modeling the dynamics of a confined, semi exible polymer is a challenging problem, owing to the complicated interplay between the configurations of the chain, which are strongly affected by the length scale for the confinement relative to the persistence length of the chain, and the polymer-wall hydrodynamic interactions. At the same time, understanding these dynamics are crucial to the advancement of emerging genomic technologies that use confinement to stretch out DNA and "read" a genomic signature. In this mini-review, we begin by considering what is known experimentally and theoretically about the friction of a wormlike chain such as DNA confined in a slit or a channel. We then discuss how to estimate the friction coefficient of such a chain, either with dynamic simulations or via Monte Carlo sampling and the Kirk-wood pre-averaging approximation. We then review our recent work on computing the diffusivity of DNA in nanoslits and nanochannels, and conclude with some promising avenues for future work and caveats about our approach.
RESUMO
The crossover region in the phase diagram of polymer solutions, in the regime above the overlap concentration, is explored by Brownian dynamics simulations to map out the universal crossover scaling functions for the gyration radius and the single-chain diffusion constant. Scaling considerations, our simulation results, and recently reported data on the polymer contribution to the viscosity obtained from rheological measurements on DNA systems support the assumption that there are simple relations between these functions, such that they can be inferred from one another.
RESUMO
Simulating the static and dynamic properties of semidilute polymer solutions with Brownian dynamics (BD) requires the computation of a large system of polymer chains coupled to one another through excluded-volume and hydrodynamic interactions. In the presence of periodic boundary conditions, long-ranged hydrodynamic interactions are frequently summed with the Ewald summation technique. By performing detailed simulations that shed light on the influence of several tuning parameters involved both in the Ewald summation method, and in the efficient treatment of Brownian forces, we develop a BD algorithm in which the computational cost scales as O(N(1.8)), where N is the number of monomers in the simulation box. We show that Beenakker's original implementation of the Ewald sum, which is only valid for systems without bead overlap, can be modified so that θ solutions can be simulated by switching off excluded-volume interactions. A comparison of the predictions of the radius of gyration, the end-to-end vector, and the self-diffusion coefficient by BD, at a range of concentrations, with the hybrid lattice Boltzmann-molecular dynamics (LB-MD) method shows excellent agreement between the two methods. In contrast to the situation for dilute solutions, the LB-MD method is shown to be significantly more computationally efficient than the current implementation of BD for simulating semidilute solutions. We argue, however, that further optimizations should be possible.