RESUMEN
Detection of de novo variants (DNVs) is critical for studies of disease-related variation and mutation rates. To accelerate DNV calling, we developed a graphics processing units-based workflow. We applied our workflow to whole-genome sequencing data from three parent-child sequenced cohorts including the Simons Simplex Collection (SSC), Simons Foundation Powering Autism Research (SPARK), and the 1000 Genomes Project (1000G) that were sequenced using DNA from blood, saliva, and lymphoblastoid cell lines (LCLs), respectively. The SSC and SPARK DNV callsets were within expectations for number of DNVs, percent at CpG sites, phasing to the paternal chromosome of origin, and average allele balance. However, the 1000G DNV callset was not within expectations and contained excessive DNVs that are likely cell line artifacts. Mutation signature analysis revealed 30% of 1000G DNV signatures matched B-cell lymphoma. Furthermore, we found variants in DNA repair genes and at Clinvar pathogenic or likely-pathogenic sites and significant excess of protein-coding DNVs in IGLL5; a gene known to be involved in B-cell lymphomas. Our study provides a new rapid DNV caller for the field and elucidates important implications of using sequencing data from LCLs for reference building and disease-related projects.
Asunto(s)
Neoplasias , Humanos , Alelos , Mutación , Neoplasias/genética , Secuenciación Completa del GenomaRESUMEN
BACKGROUND: Higher protein (HP) intake and physical activity (PA) have been associated with improved lean soft tissue (LST) and reduced fat mass (FM). Puerto Ricans have among the highest age-adjusted prevalence (42.5%) of obesity, which may be associated with inadequate protein consumption and PA. We examined the relationship between protein intake and PA with body composition and biomarkers of cardiometabolic health in Puerto Rican adults. METHODS: Participants included 959 Puerto Rican adults (71.4% women, 28.6% men) from the Boston Puerto Rican Health Study (BPRHS), aged 46-79 y (Women: age, 60.4 ± 7.6 y, BMI, 32.9 ± 6.8 kg/m2; Men: age, 59.8 ± 7.9 y, BMI, 30.1 ± 5.2 kg/m2). Protein intake was assessed using a food frequency questionnaire and expressed as g/kg body weight/day in energy intake-adjusted equal cut point tertile categories (lower, moderate, higher: LP < 0.91 g/kg/d, MP ≥ 0.91 ≤ 1.11 g/kg/d, and HP > 1.11 g/kg/d). PA was assessed by questionnaire and expressed in tertile categories (low, moderate and high; PA1: <0.8 km/d, PA2: ≥0.8 ≤ 3.2 km/d, PA3: >3.2 km/d). RESULTS: Participants with energy-adjusted HP had lower appendicular LST (ALST: 16.2 ± 3.8 kg), LST (39.7 ± 8.0 kg) and FM (25.6 ± 8.1 kg) when compared to LP (ALST: 20.1 ± 4.5 kg; LST: 49.5 ± 10.0 kg; FM: 40.8 ± 12.3 kg; P < 0.001) and MP (ALST: 18.2 ± 4.3 kg; LST: 44.1 ± 8.8 kg; FM: 32.2 ± 9.8 kg; P < 0.001). However, when adjusted for total body weight (kg), relative LST was significantly greater in HP (58 ± 9%) when compared to LP (53 ± 9%; P < 0.001) and MP (56 ± 9%; P < 0.001). Participants in PA3 had greater ALST (19.5 ± 5.4 kg), and LST (58 ± 10%), compared to PA1 (ALST: 17.2 ± 4.3 kg; LST: 53 ± 9%; P < 0.001) or PA2 (ALST: 17.7 ± 4.7 kg; LST: 56 ± 9%; P < 0.05). Those in HP + PA3 or MP + PA2 had lower c-reactive protein (CRP; HP + PA3: 5.1 ± 6.8 mg/L; MP + PA2: 6.4 ± 10.0 mg/L), when compared to LP + PA1 (8.7 ± 8.8 mg/L; P < 0.05). Insulin concentration was lower for those in both the HP and PA3 (HP + PA3; 11.4 ± 7.9 IU/mL) compared to those in both the LP and PA1 (LP + PA1; 20.7 ± 16.3 UI/mL) (P < 0.001). CONCLUSIONS: The highest tertiles of energy-adjusted protein intake (≥1.11 g/kg/d) and PA (>3.2 km/d) were associated with more desirable indicators of overall body composition and cardiometabolic health, when adjusted for body weight, than those in the lower protein intake and PA in Puerto Rican adults.
Asunto(s)
Composición Corporal/fisiología , Proteínas en la Dieta/administración & dosificación , Ingestión de Energía/fisiología , Metabolismo Energético/fisiología , Ejercicio Físico/fisiología , Hispánicos o Latinos , Absorciometría de Fotón , Anciano , Femenino , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Encuestas Nutricionales , Fenómenos Fisiológicos de la Nutrición , Clase SocialRESUMEN
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
Asunto(s)
Mapeo Cromosómico/métodos , Genoma Humano/genética , Bases del Conocimiento , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Interfaz Usuario-Computador , Algoritmos , Simulación por Computador , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Humanos , Alineación de Secuencia/métodosRESUMEN
Salmonella enterica serovars often have a broad host range, and some cause both gastrointestinal and systemic disease. But the serovars Paratyphi A and Typhi are restricted to humans and cause only systemic disease. It has been estimated that Typhi arose in the last few thousand years. The sequence and microarray analysis of the Paratyphi A genome indicates that it is similar to the Typhi genome but suggests that it has a more recent evolutionary origin. Both genomes have independently accumulated many pseudogenes among their approximately 4,400 protein coding sequences: 173 in Paratyphi A and approximately 210 in Typhi. The recent convergence of these two similar genomes on a similar phenotype is subtly reflected in their genotypes: only 30 genes are degraded in both serovars. Nevertheless, these 30 genes include three known to be important in gastroenteritis, which does not occur in these serovars, and four for Salmonella-translocated effectors, which are normally secreted into host cells to subvert host functions. Loss of function also occurs by mutation in different genes in the same pathway (e.g., in chemotaxis and in the production of fimbriae).
Asunto(s)
Evolución Molecular , Variación Genética , Genoma Bacteriano , Mutación/genética , Salmonella paratyphi A/genética , Salmonella typhi/genética , Secuencia de Bases , Biblioteca de Genes , Componentes Genómicos/genética , Humanos , Análisis por Micromatrices , Datos de Secuencia Molecular , Seudogenes/genética , Análisis de Secuencia de ADN , Especificidad de la EspecieRESUMEN
Human chromosome 7 has historically received prominent attention in the human genetics community, primarily related to the search for the cystic fibrosis gene and the frequent cytogenetic changes associated with various forms of cancer. Here we present more than 153 million base pairs representing 99.4% of the euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far. The sequence has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence (8.2%), with marked differences between the two arms. Our initial analyses have identified 1,150 protein-coding genes, 605 of which have been confirmed by complementary DNA sequences, and an additional 941 pseudogenes. Of genes confirmed by transcript sequences, some are polymorphic for mutations that disrupt the reading frame.
Asunto(s)
Cromosomas Humanos Par 7 , Animales , Secuencia de Bases , Duplicación de Gen , Humanos , Ratones , Datos de Secuencia Molecular , Mapeo Físico de Cromosoma , Proteínas/genética , Seudogenes , ARN no Traducido , Análisis de Secuencia de ADN , Especificidad de la Especie , Síndrome de Williams/genéticaRESUMEN
The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length. Here, we report that the MSY is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed, X-degenerate and ampliconic. These classes contain all 156 known transcription units, which include 78 protein-coding genes that collectively encode 27 distinct proteins. The X-transposed sequences exhibit 99% identity to the X chromosome. The X-degenerate sequences are remnants of ancient autosomes from which the modern X and Y chromosomes evolved. The ampliconic class includes large regions (about 30% of the MSY euchromatin) where sequence pairs show greater than 99.9% identity, which is maintained by frequent gene conversion (non-reciprocal transfer). The most prominent features here are eight massive palindromes, at least six of which contain testis genes.