Pesquisa | Portal de Pesquisa da BVS

1.

Accurate sequencing of DNA motifs able to form alternative (non-B) structures.

Weissensteiner, Matthias H; Cremona, Marzia A; Guiblet, Wilfried M; Stoler, Nicholas; Harris, Robert S; Cechova, Monika; Eckert, Kristin A; Chiaromonte, Francesca; Huang, Yi-Fei; Makova, Kateryna D.

Genome Res ; 33(6): 907-922, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37433640

RESUMO

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.

Assuntos

DNA Forma Z , Nanoporos , Humanos , Motivos de Nucleotídeos , Análise de Sequência de DNA , DNA/genética , Composição de Bases , Sequenciamento de Nucleotídeos em Larga Escala

2.

Advanced age increases frequencies of de novo mitochondrial mutations in macaque oocytes and somatic tissues.

Arbeithuber, Barbara; Cremona, Marzia A; Hester, James; Barrett, Alison; Higgins, Bonnie; Anthony, Kate; Chiaromonte, Francesca; Diaz, Francisco J; Makova, Kateryna D.

Proc Natl Acad Sci U S A ; 119(15): e2118740119, 2022 04 12.

Artigo em Inglês | MEDLINE | ID: mdl-35394879

RESUMO

Mutations in mitochondrial DNA (mtDNA) contribute to multiple diseases. However, how new mtDNA mutations arise and accumulate with age remains understudied because of the high error rates of current sequencing technologies. Duplex sequencing reduces error rates by several orders of magnitude via independently tagging and analyzing each of the two template DNA strands. Here, using duplex sequencing, we obtained high-quality mtDNA sequences for somatic tissues (liver and skeletal muscle) and single oocytes of 30 unrelated rhesus macaques, from 1 to 23 y of age. Sequencing single oocytes minimized effects of natural selection on germline mutations. In total, we identified 17,637 tissue-specific de novo mutations. Their frequency increased â¼3.5-fold in liver and â¼2.8-fold in muscle over the â¼20 y assessed. Mutation frequency in oocytes increased â¼2.5-fold until the age of 9 y, but did not increase after that, suggesting that oocytes of older animals maintain the quality of their mtDNA. We found the light-strand origin of replication (OriL) to be a hotspot for mutation accumulation with aging in liver. Indeed, the 33-nucleotide-long OriL harbored 12 variant hotspots, 10 of which likely disrupt its hairpin structure and affect replication efficiency. Moreover, in somatic tissues, protein-coding variants were subject to positive selection (potentially mitigating toxic effects of mitochondrial activity), the strength of which increased with the number of macaques harboring variants. Our work illuminates the origins and accumulation of somatic and germline mtDNA mutations with aging in primates and has implications for delayed reproduction in modern human societies.

Assuntos

Envelhecimento , Mitocôndrias , Mutação , Oócitos , Animais , DNA Mitocondrial/genética , DNA Mitocondrial/metabolismo , Humanos , Macaca mulatta/genética , Mitocôndrias/genética , Oócitos/metabolismo

3.

Selection and thermostability suggest G-quadruplexes are novel functional elements of the human genome.

Guiblet, Wilfried M; DeGiorgio, Michael; Cheng, Xiaoheng; Chiaromonte, Francesca; Eckert, Kristin A; Huang, Yi-Fei; Makova, Kateryna D.

Genome Res ; 31(7): 1136-1149, 2021 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-34187812

RESUMO

Approximately 1% of the human genome has the ability to fold into G-quadruplexes (G4s)-noncanonical strand-specific DNA structures forming at G-rich motifs. G4s regulate several key cellular processes (e.g., transcription) and have been hypothesized to participate in others (e.g., firing of replication origins). Moreover, G4s differ in their thermostability, and this may affect their function. Yet, G4s may also hinder replication, transcription, and translation and may increase genome instability and mutation rates. Therefore, depending on their genomic location, thermostability, and functionality, G4 loci might evolve under different selective pressures, which has never been investigated. Here we conducted the first genome-wide analysis of G4 distribution, thermostability, and selection. We found an overrepresentation, high thermostability, and purifying selection for G4s within genic components in which they are expected to be functional-promoters, CpG islands, and 5' and 3' UTRs. A similar pattern was observed for G4s within replication origins, enhancers, eQTLs, and TAD boundary regions, strongly suggesting their functionality. In contrast, G4s on the nontranscribed strand of exons were underrepresented, were unstable, and evolved neutrally. In general, G4s on the nontranscribed strand of genic components had lower density and were less stable than those on the transcribed strand, suggesting that the former are avoided at the RNA level. Across the genome, purifying selection was stronger at stable G4s. Our results suggest that purifying selection preserves the sequences of functional G4s, whereas nonfunctional G4s are too costly to be tolerated in the genome. Thus, G4s are emerging as fundamental, functional genomic elements.

4.

Age-related accumulation of de novo mitochondrial mutations in mammalian oocytes and somatic tissues.

Arbeithuber, Barbara; Hester, James; Cremona, Marzia A; Stoler, Nicholas; Zaidi, Arslan; Higgins, Bonnie; Anthony, Kate; Chiaromonte, Francesca; Diaz, Francisco J; Makova, Kateryna D.

PLoS Biol ; 18(7): e3000745, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32667908

RESUMO

Mutations create genetic variation for other evolutionary forces to operate on and cause numerous genetic diseases. Nevertheless, how de novo mutations arise remains poorly understood. Progress in the area is hindered by the fact that error rates of conventional sequencing technologies (1 in 100 or 1,000 base pairs) are several orders of magnitude higher than de novo mutation rates (1 in 10,000,000 or 100,000,000 base pairs per generation). Moreover, previous analyses of germline de novo mutations examined pedigrees (and not germ cells) and thus were likely affected by selection. Here, we applied highly accurate duplex sequencing to detect low-frequency, de novo mutations in mitochondrial DNA (mtDNA) directly from oocytes and from somatic tissues (brain and muscle) of 36 mice from two independent pedigrees. We found mtDNA mutation frequencies 2- to 3-fold higher in 10-month-old than in 1-month-old mice, demonstrating mutation accumulation during the period of only 9 mo. Mutation frequencies and patterns differed between germline and somatic tissues and among mtDNA regions, suggestive of distinct mutagenesis mechanisms. Additionally, we discovered a more pronounced genetic drift of mitochondrial genetic variants in the germline of older versus younger mice, arguing for mtDNA turnover during oocyte meiotic arrest. Our study deciphered for the first time the intricacies of germline de novo mutagenesis using duplex sequencing directly in oocytes, which provided unprecedented resolution and minimized selection effects present in pedigree studies. Moreover, our work provides important information about the origins and accumulation of mutations with aging/maturation and has implications for delayed reproduction in modern human societies. Furthermore, the duplex sequencing method we optimized for single cells opens avenues for investigating low-frequency mutations in other studies.

Assuntos

Envelhecimento/genética , Mamíferos/genética , Mitocôndrias/genética , Mutação/genética , Oócitos/metabolismo , Especificidade de Órgãos/genética , Animais , Análise Mutacional de DNA , DNA Mitocondrial/genética , Feminino , Frequência do Gene/genética , Deriva Genética , Células Germinativas/metabolismo , Padrões de Herança/genética , Modelos Logísticos , Masculino , Camundongos , Modelos Genéticos , Taxa de Mutação , Nucleotídeos/genética , Linhagem

5.

Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome.

Guiblet, Wilfried M; Cremona, Marzia A; Harris, Robert S; Chen, Di; Eckert, Kristin A; Chiaromonte, Francesca; Huang, Yi-Fei; Makova, Kateryna D.

Nucleic Acids Res ; 49(3): 1497-1516, 2021 02 22.

Artigo em Inglês | MEDLINE | ID: mdl-33450015

RESUMO

Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.

Assuntos

DNA/química , Variação Genética , Genoma Humano , Animais , Loci Gênicos , Humanos , Taxa de Mutação , Polimorfismo de Nucleotídeo Único , Pongo pygmaeus

6.

Learning the properties of adaptive regions with functional data analysis.

Mughal, Mehreen R; Koch, Hillary; Huang, Jinguo; Chiaromonte, Francesca; DeGiorgio, Michael.

PLoS Genet ; 16(8): e1008896, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-32853200

RESUMO

Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in genomic windows to functional data. By transforming our discrete data points to be outputs of continuous functions defined over genomic space, we are able to learn the features of these functions that signify selection. This enables us to confidently identify complex modes of natural selection, including adaptive introgression. We are also able to predict important selection parameters that are responsible for shaping the inferred selection events. By applying our model to human population-genomic data, we recapitulate previously identified regions of selective sweeps, such as OCA2 in Europeans, and predict that its beneficial mutation reached a frequency of 0.02 before it swept 1,802 generations ago, a time when humans were relatively new to Europe. In addition, we identify BNC2 in Europeans as a target of adaptive introgression, and predict that it harbors a beneficial mutation that arose in an archaic human population that split from modern humans within the hypothesized modern human-Neanderthal divergence range.

Assuntos

Modelos Genéticos , Taxa de Mutação , População Branca/genética , Animais , Proteínas de Ligação a DNA/genética , Variação Genética , Humanos , Proteínas de Membrana Transportadoras , Homem de Neandertal/genética , Seleção Genética , Software

7.

Simultaneous feature selection and outlier detection with optimality guarantees.

Insolia, Luca; Kenney, Ana; Chiaromonte, Francesca; Felici, Giovanni.

Biometrics ; 78(4): 1592-1603, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-34437713

RESUMO

Biomedical research is increasingly data rich, with studies comprising ever growing numbers of features. The larger a study, the higher the likelihood that a substantial portion of the features may be redundant and/or contain contamination (outlying values). This poses serious challenges, which are exacerbated in cases where the sample sizes are relatively small. Effective and efficient approaches to perform sparse estimation in the presence of outliers are critical for these studies, and have received considerable attention in the last decade. We contribute to this area considering high-dimensional regressions contaminated by multiple mean-shift outliers affecting both the response and the design matrix. We develop a general framework and use mixed-integer programming to simultaneously perform feature selection and outlier detection with provably optimal guarantees. We prove theoretical properties for our approach, that is, a necessary and sufficient condition for the robustly strong oracle property, where the number of features can increase exponentially with the sample size; the optimal estimation of parameters; and the breakdown point of the resulting estimates. Moreover, we provide computationally efficient procedures to tune integer constraints and warm-start the algorithm. We show the superior performance of our proposal compared to existing heuristic methods through simulations and use it to study the relationships between childhood obesity and the human microbiome.

Assuntos

Obesidade Infantil , Criança , Humanos , Algoritmos , Tamanho da Amostra , Probabilidade

8.

Evidence for sharp increase in the economic damages of extreme natural disasters.

Coronese, Matteo; Lamperti, Francesco; Keller, Klaus; Chiaromonte, Francesca; Roventini, Andrea.

Proc Natl Acad Sci U S A ; 116(43): 21450-21455, 2019 10 22.

Artigo em Inglês | MEDLINE | ID: mdl-31591192

RESUMO

Climate change has increased the frequency and intensity of natural disasters. Does this translate into increased economic damages? To date, empirical assessments of damage trends have been inconclusive. Our study demonstrates a temporal increase in extreme damages, after controlling for a number of factors. We analyze event-level data using quantile regressions to capture patterns in the damage distribution (not just its mean) and find strong evidence of progressive rightward skewing and tail-fattening over time. While the effect of time on averages is hard to detect, effects on extreme damages are large, statistically significant, and growing with increasing percentiles. Our results are consistent with an upwardly curved, convex damage function, which is commonly assumed in climate-economics models. They are also robust to different specifications of control variables and time range considered and indicate that the risk of extreme damages has increased more in temperate areas than in tropical ones. We use simulations to show that underreporting bias in the data does not weaken our inferences; in fact, it may make them overly conservative.

Assuntos

Desastres Naturais/economia , Mudança Climática/economia , Humanos , Modelos Econômicos

9.

Human L1 Transposition Dynamics Unraveled with Functional Data Analysis.

Chen, Di; Cremona, Marzia A; Qi, Zongtai; Mitra, Robi D; Chiaromonte, Francesca; Makova, Kateryna D.

Mol Biol Evol ; 37(12): 3576-3600, 2020 12 16.

Artigo em Inglês | MEDLINE | ID: mdl-32722770

RESUMO

Long INterspersed Elements-1 (L1s) constitute >17% of the human genome and still actively transpose in it. Characterizing L1 transposition across the genome is critical for understanding genome evolution and somatic mutations. However, to date, L1 insertion and fixation patterns have not been studied comprehensively. To fill this gap, we investigated three genome-wide data sets of L1s that integrated at different evolutionary times: 17,037 de novo L1s (from an L1 insertion cell-line experiment conducted in-house), and 1,212 polymorphic and 1,205 human-specific L1s (from public databases). We characterized 49 genomic features-proxying chromatin accessibility, transcriptional activity, replication, recombination, etc.-in the ±50 kb flanks of these elements. These features were contrasted between the three L1 data sets and L1-free regions using state-of-the-art Functional Data Analysis statistical methods, which treat high-resolution data as mathematical functions. Our results indicate that de novo, polymorphic, and human-specific L1s are surrounded by different genomic features acting at specific locations and scales. This led to an integrative model of L1 transposition, according to which L1s preferentially integrate into open-chromatin regions enriched in non-B DNA motifs, whereas they are fixed in regions largely free of purifying selection-depleted of genes and noncoding most conserved elements. Intriguingly, our results suggest that L1 insertions modify local genomic landscape by extending CpG methylation and increasing mononucleotide microsatellite density. Altogether, our findings substantially facilitate understanding of L1 integration and fixation preferences, pave the way for uncovering their role in aging and cancer, and inform their use as mutagenesis tools in genetic studies.

Assuntos

Evolução Biológica , Elementos de DNA Transponíveis , Genoma Humano , Elementos Nucleotídeos Longos e Dispersos , Modelos Genéticos , Humanos , Mutagênese Insercional

10.

Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.

Guiblet, Wilfried M; Cremona, Marzia A; Cechova, Monika; Harris, Robert S; Kejnovská, Iva; Kejnovsky, Eduard; Eckert, Kristin; Chiaromonte, Francesca; Makova, Kateryna D.

Genome Res ; 28(12): 1767-1778, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-30401733

RESUMO

DNA conformation may deviate from the classical B-form in â¼13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.

Assuntos

DNA/química , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Conformação de Ácido Nucleico , Análise de Sequência de DNA , Replicação do DNA , Quadruplex G , Genômica/métodos , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Cinética , Mutação , Motivos de Nucleotídeos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos

11.

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment.

Nanni, Mirco; Andrienko, Gennady; Barabási, Albert-László; Boldrini, Chiara; Bonchi, Francesco; Cattuto, Ciro; Chiaromonte, Francesca; Comandé, Giovanni; Conti, Marco; Coté, Mark; Dignum, Frank; Dignum, Virginia; Domingo-Ferrer, Josep; Ferragina, Paolo; Giannotti, Fosca; Guidotti, Riccardo; Helbing, Dirk; Kaski, Kimmo; Kertesz, Janos; Lehmann, Sune; Lepri, Bruno; Lukowicz, Paul; Matwin, Stan; Jiménez, David Megías; Monreale, Anna; Morik, Katharina; Oliver, Nuria; Passarella, Andrea; Passerini, Andrea; Pedreschi, Dino; Pentland, Alex; Pianesi, Fabio; Pratesi, Francesca; Rinzivillo, Salvatore; Ruggieri, Salvatore; Siebes, Arno; Torra, Vicenc; Trasarti, Roberto; Hoven, Jeroen van den; Vespignani, Alessandro.

Ethics Inf Technol ; 23(Suppl 1): 1-6, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33551673

RESUMO

The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the "phase 2" of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates-if and when they want and for specific aims-with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.

12.

High Satellite Repeat Turnover in Great Apes Studied with Short- and Long-Read Technologies.

Cechova, Monika; Harris, Robert S; Tomaszkiewicz, Marta; Arbeithuber, Barbara; Chiaromonte, Francesca; Makova, Kateryna D.

Mol Biol Evol ; 36(11): 2415-2431, 2019 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-31273383

RESUMO

Satellite repeats are a structural component of centromeres and telomeres, and in some instances, their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50 bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However, clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males versus females; using Y chromosome assemblies or Fluorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59 kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions.

13.

Functional data analysis for computational biology.

Cremona, Marzia A; Xu, Hongyan; Makova, Kateryna D; Reimherr, Matthew; Chiaromonte, Francesca; Madrigal, Pedro.

Bioinformatics ; 35(17): 3211-3213, 2019 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-30668667

RESUMO

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

14.

IWTomics: testing high-resolution sequence-based 'Omics' data at multiple locations and scales.

Cremona, Marzia A; Pini, Alessia; Cumbo, Fabio; Makova, Kateryna D; Chiaromonte, Francesca; Vantini, Simone.

Bioinformatics ; 34(13): 2289-2291, 2018 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-29474526

RESUMO

Summary: With increased generation of high-resolution sequence-based 'Omics' data, detecting statistically significant effects at different genomic locations and scales has become key to addressing several scientific questions. IWTomics is an R/Bioconductor package (integrated in Galaxy) that, exploiting sophisticated Functional Data Analysis techniques (i.e. statistical techniques that deal with the analysis of curves), allows users to pre-process, visualize and test these data at multiple locations and scales. The package provides a friendly, flexible and complete workflow that can be employed in many genomic and epigenomic applications. Availability and implementation: IWTomics is freely available at the Bioconductor website (http://bioconductor.org/packages/IWTomics) and on the main Galaxy instance (https://usegalaxy.org/). Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Bases de Dados Factuais , Genômica/métodos , Software , Genoma , Análise de Sequência , Fluxo de Trabalho

15.

Reply to Geiger and Stomper: On capital intensity and observed increases in the economic damages of extreme natural disasters.

Coronese, Matteo; Lamperti, Francesco; Keller, Klaus; Chiaromonte, Francesca; Roventini, Andrea.

Proc Natl Acad Sci U S A ; 117(12): 6314-6315, 2020 03 24.

Artigo em Inglês | MEDLINE | ID: mdl-32156737

Assuntos

Desastres , Desastres Naturais

16.

Integration and Fixation Preferences of Human and Mouse Endogenous Retroviruses Uncovered with Functional Data Analysis.

Campos-Sánchez, Rebeca; Cremona, Marzia A; Pini, Alessia; Chiaromonte, Francesca; Makova, Kateryna D.

PLoS Comput Biol ; 12(6): e1004956, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-27309962

RESUMO

Endogenous retroviruses (ERVs), the remnants of retroviral infections in the germ line, occupy ~8% and ~10% of the human and mouse genomes, respectively, and affect their structure, evolution, and function. Yet we still have a limited understanding of how the genomic landscape influences integration and fixation of ERVs. Here we conducted a genome-wide study of the most recently active ERVs in the human and mouse genome. We investigated 826 fixed and 1,065 in vitro HERV-Ks in human, and 1,624 fixed and 242 polymorphic ETns, as well as 3,964 fixed and 1,986 polymorphic IAPs, in mouse. We quantitated >40 human and mouse genomic features (e.g., non-B DNA structure, recombination rates, and histone modifications) in ±32 kb of these ERVs' integration sites and in control regions, and analyzed them using Functional Data Analysis (FDA) methodology. In one of the first applications of FDA in genomics, we identified genomic scales and locations at which these features display their influence, and how they work in concert, to provide signals essential for integration and fixation of ERVs. The investigation of ERVs of different evolutionary ages (young in vitro and polymorphic ERVs, older fixed ERVs) allowed us to disentangle integration vs. fixation preferences. As a result of these analyses, we built a comprehensive model explaining the uneven distribution of ERVs along the genome. We found that ERVs integrate in late-replicating AT-rich regions with abundant microsatellites, mirror repeats, and repressive histone marks. Regions favoring fixation are depleted of genes and evolutionarily conserved elements, and have low recombination rates, reflecting the effects of purifying selection and ectopic recombination removing ERVs from the genome. In addition to providing these biological insights, our study demonstrates the power of exploiting multiple scales and localization with FDA. These powerful techniques are expected to be applicable to many other genomic investigations.

Assuntos

Retrovirus Endógenos/genética , Integração Viral/genética , Animais , Mapeamento Cromossômico , Biologia Computacional , Replicação do DNA , Interpretação Estatística de Dados , Epigênese Genética , Genoma Humano , Humanos , Modelos Logísticos , Camundongos , Modelos Biológicos , Recombinação Genética , Sequências Repetitivas de Ácido Nucleico , Seleção Genética

17.

Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units.

Liu, Yang; Chiaromonte, Francesca; Li, Bing.

Biometrics ; 73(2): 529-539, 2017 06.

Artigo em Inglês | MEDLINE | ID: mdl-27649087

RESUMO

In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package "sSDR," publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.

Assuntos

Análise dos Mínimos Quadrados , Biometria , Genômica

18.

Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA.

Rebolledo-Jaramillo, Boris; Su, Marcia Shu-Wei; Stoler, Nicholas; McElhoe, Jennifer A; Dickins, Benjamin; Blankenberg, Daniel; Korneliussen, Thorfinn S; Chiaromonte, Francesca; Nielsen, Rasmus; Holland, Mitchell M; Paul, Ian M; Nekrutenko, Anton; Makova, Kateryna D.

Proc Natl Acad Sci U S A ; 111(43): 15474-9, 2014 Oct 28.

Artigo em Inglês | MEDLINE | ID: mdl-25313049

RESUMO

The manifestation of mitochondrial DNA (mtDNA) diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted owing to a lack of data on the size of the mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may abruptly transform a benign (low) frequency in a mother into a disease-causing (high) frequency in her child. Here we present a high-resolution study of heteroplasmy transmission conducted on blood and buccal mtDNA of 39 healthy mother-child pairs of European ancestry (a total of 156 samples, each sequenced at â¼20,000× per site). On average, each individual carried one heteroplasmy, and one in eight individuals carried a disease-associated heteroplasmy, with minor allele frequency ≥1%. We observed frequent drastic heteroplasmy frequency shifts between generations and estimated the effective size of the germ-line mtDNA bottleneck at only â¼30-35 (interquartile range from 9 to 141). Accounting for heteroplasmies, we estimated the mtDNA germ-line mutation rate at 1.3 × 10(-8) (interquartile range from 4.2 × 10(-9) to 4.1 × 10(-8)) mutations per site per year, an order of magnitude higher than for nuclear DNA. Notably, we found a positive association between the number of heteroplasmies in a child and maternal age at fertilization, likely attributable to oocyte aging. This study also took advantage of droplet digital PCR (ddPCR) to validate heteroplasmies and confirm a de novo mutation. Our results can be used to predict the transmission of disease-causing mtDNA variants and illuminate evolutionary dynamics of the mitochondrial genome.

Assuntos

DNA Mitocondrial/genética , Células Germinativas/metabolismo , Padrões de Herança/genética , Idade Materna , Fatores Etários , Criança , Doença/genética , Feminino , Frequência do Gene/genética , Humanos , Mutação INDEL/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA

19.

On the bias of H-scores for comparing biclusters, and how to correct it.

Di Iorio, Jacopo; Chiaromonte, Francesca; Cremona, Marzia A.

Bioinformatics ; 36(9): 2955-2957, 2020 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-31985794

Assuntos

Algoritmos , Análise por Conglomerados , Análise de Sequência com Séries de Oligonucleotídeos

20.

Complete Khoisan and Bantu genomes from southern Africa.

Schuster, Stephan C; Miller, Webb; Ratan, Aakrosh; Tomsho, Lynn P; Giardine, Belinda; Kasson, Lindsay R; Harris, Robert S; Petersen, Desiree C; Zhao, Fangqing; Qi, Ji; Alkan, Can; Kidd, Jeffrey M; Sun, Yazhou; Drautz, Daniela I; Bouffard, Pascal; Muzny, Donna M; Reid, Jeffrey G; Nazareth, Lynne V; Wang, Qingyu; Burhans, Richard; Riemer, Cathy; Wittekindt, Nicola E; Moorjani, Priya; Tindall, Elizabeth A; Danko, Charles G; Teo, Wee Siang; Buboltz, Anne M; Zhang, Zhenhai; Ma, Qianyi; Oosthuysen, Arno; Steenkamp, Abraham W; Oostuisen, Hermann; Venter, Philippus; Gajewski, John; Zhang, Yu; Pugh, B Franklin; Makova, Kateryna D; Nekrutenko, Anton; Mardis, Elaine R; Patterson, Nick; Pringle, Tom H; Chiaromonte, Francesca; Mullikin, James C; Eichler, Evan E; Hardison, Ross C; Gibbs, Richard A; Harkins, Timothy T; Hayes, Vanessa M.

Nature ; 463(7283): 943-7, 2010 Feb 18.

Artigo em Inglês | MEDLINE | ID: mdl-20164927

RESUMO

The genetic structure of the indigenous hunter-gatherer peoples of southern Africa, the oldest known lineage of modern human, is important for understanding human diversity. Studies based on mitochondrial and small sets of nuclear markers have shown that these hunter-gatherers, known as Khoisan, San, or Bushmen, are genetically divergent from other humans. However, until now, fully sequenced human genomes have been limited to recently diverged populations. Here we present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, including 13,146 novel amino acid variants. In terms of nucleotide substitutions, the Bushmen seem to be, on average, more different from each other than, for example, a European and an Asian. Observed genomic differences between the hunter-gatherers and others may help to pinpoint genetic adaptations to an agricultural lifestyle. Adding the described variants to current databases will facilitate inclusion of southern Africans in medical research efforts, particularly when family and medical histories can be correlated with genome-wide data.

Assuntos

População Negra/genética , Etnicidade/genética , Genoma Humano/genética , Povo Asiático/genética , Éxons/genética , Genética Médica , Humanos , Filogenia , Polimorfismo de Nucleotídeo Único/genética , África do Sul/etnologia , População Branca/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA