ABSTRACT
Reaction networks of hydrocarbons are explored using first principles calculations, data science, and experiments. Transforming hydrocarbon data into networks reveals the prevalence of the formation and reaction of various molecules. Graph theory is implemented to extract knowledge from the reaction network. In particular, centralities analysis reveals that H+, CîCC, CH3+, CîC, and [CH2+]C have high degrees and are thus very likely to form or react with other molecules. Additionally, H+, CH3+, C2H5+, C8H15+, C8H17+, and C6H11+ are found to have high control throughout the network and lead towards a series of additional reactions. The constructed network is also validated in experiments while the shortest path analysis is implemented for further comparison between experiment and the network. Thus, combining network analysis with first principles calculations uncovers key points in the development of various hydrocarbons that can be used to improve catalyst design and targeted synthesis of desired hydrocarbons.
Subject(s)
Data Science , Hydrocarbons , Hydrocarbons/chemistryABSTRACT
In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.