Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 622(7982): 348-358, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37794188

RESUMO

High-throughput proteomics platforms measuring thousands of proteins in plasma combined with genomic and phenotypic information have the power to bridge the gap between the genome and diseases. Here we performed association studies of Olink Explore 3072 data generated by the UK Biobank Pharma Proteomics Project1 on plasma samples from more than 50,000 UK Biobank participants with phenotypic and genotypic data, stratifying on British or Irish, African and South Asian ancestries. We compared the results with those of a SomaScan v4 study on plasma from 36,000 Icelandic people2, for 1,514 of whom Olink data were also available. We found modest correlation between the two platforms. Although cis protein quantitative trait loci were detected for a similar absolute number of assays on the two platforms (2,101 on Olink versus 2,120 on SomaScan), the proportion of assays with such supporting evidence for assay performance was higher on the Olink platform (72% versus 43%). A considerable number of proteins had genomic associations that differed between the platforms. We provide examples where differences between platforms may influence conclusions drawn from the integration of protein levels with the study of diseases. We demonstrate how leveraging the diverse ancestries of participants in the UK Biobank helps to detect novel associations and refine genomic location. Our results show the value of the information provided by the two most commonly used high-throughput proteomics platforms and demonstrate the differences between them that at times provides useful complementarity.


Assuntos
Proteínas Sanguíneas , Suscetibilidade a Doenças , Genômica , Genótipo , Fenótipo , Proteômica , Humanos , África/etnologia , Ásia Meridional/etnologia , Bancos de Espécimes Biológicos , Proteínas Sanguíneas/análise , Proteínas Sanguíneas/genética , Conjuntos de Dados como Assunto , Genoma Humano/genética , Islândia/etnologia , Irlanda/etnologia , Plasma/química , Proteoma/análise , Proteoma/genética , Proteômica/métodos , Locos de Características Quantitativas , Reino Unido
2.
Nature ; 607(7920): 732-740, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35859178

RESUMO

Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data1,2. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank3. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.


Assuntos
Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Variação Genética , Genoma Humano , Genômica , Sequenciamento Completo do Genoma , África/etnologia , Ásia/etnologia , Estudos de Coortes , Sequência Conservada , Éxons/genética , Genoma Humano/genética , Haplótipos/genética , Humanos , Mutação INDEL , Irlanda/etnologia , Repetições de Microssatélites , Polimorfismo de Nucleotídeo Único/genética , Reino Unido
3.
Nature ; 584(7822): 619-623, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32581359

RESUMO

Autoimmune thyroid disease is the most common autoimmune disease and is highly heritable1. Here, by using a genome-wide association study of 30,234 cases and 725,172 controls from Iceland and the UK Biobank, we find 99 sequence variants at 93 loci, of which 84 variants are previously unreported2-7. A low-frequency (1.36%) intronic variant in FLT3 (rs76428106-C) has the largest effect on risk of autoimmune thyroid disease (odds ratio (OR) = 1.46, P = 2.37 × 10-24). rs76428106-C is also associated with systemic lupus erythematosus (OR = 1.90, P = 6.46 × 10-4), rheumatoid factor and/or anti-CCP-positive rheumatoid arthritis (OR = 1.41, P = 4.31 × 10-4) and coeliac disease (OR = 1.62, P = 1.20 × 10-4). FLT3 encodes fms-related tyrosine kinase 3, a receptor that regulates haematopoietic progenitor and dendritic cells. RNA sequencing revealed that rs76428106-C generates a cryptic splice site, which introduces a stop codon in 30% of transcripts that are predicted to encode a truncated protein, which lacks its tyrosine kinase domains. Each copy of rs76428106-C doubles the plasma levels of the FTL3 ligand. Activating somatic mutations in FLT3 are associated with acute myeloid leukaemia8 with a poor prognosis and rs76428106-C also predisposes individuals to acute myeloid leukaemia (OR = 1.90, P = 5.40 × 10-3). Thus, a predicted loss-of-function germline mutation in FLT3 causes a reduction in full-length FLT3, with a compensatory increase in the levels of its ligand and an increased disease risk, similar to that of a gain-of-function mutation.


Assuntos
Códon sem Sentido/genética , Predisposição Genética para Doença/genética , Ligantes , Mutação , Tireoidite Autoimune/genética , Tirosina Quinase 3 Semelhante a fms/genética , Tirosina Quinase 3 Semelhante a fms/metabolismo , Alelos , Doenças Autoimunes/genética , Bases de Dados Factuais , Estudo de Associação Genômica Ampla , Mutação em Linhagem Germinativa , Humanos , Islândia , Íntrons/genética , Leucemia Mieloide Aguda , Mutação com Perda de Função , Sítios de Splice de RNA/genética , Reino Unido
5.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37129540

RESUMO

SUMMARY: We describe a compression scheme for BUS files and an implementation of the algorithm in the BUStools software. Our compression algorithm yields smaller file sizes than gzip, at significantly faster compression and decompression speeds. We evaluated our algorithm on 533 BUS files from scRNA-seq experiments with a total size of 1TB. Our compression is 2.2× faster than the fastest gzip option 35% slower than the fastest zstd option and results in 1.5× smaller files than both methods. This amounts to an 8.3× reduction in the file size, resulting in a compressed size of 122GB for the dataset. AVAILABILITY AND IMPLEMENTATION: A complete description of the format is available at https://github.com/BUStools/BUSZ-format and an implementation at https://github.com/BUStools/bustools. The code to reproduce the results of this article is available at https://github.com/pmelsted/BUSZ_paper.


Assuntos
Compressão de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Software , Compressão de Dados/métodos , Sequenciamento do Exoma
6.
N Engl J Med ; 382(24): 2302-2315, 2020 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-32289214

RESUMO

BACKGROUND: During the current worldwide pandemic, coronavirus disease 2019 (Covid-19) was first diagnosed in Iceland at the end of February. However, data are limited on how SARS-CoV-2, the virus that causes Covid-19, enters and spreads in a population. METHODS: We targeted testing to persons living in Iceland who were at high risk for infection (mainly those who were symptomatic, had recently traveled to high-risk countries, or had contact with infected persons). We also carried out population screening using two strategies: issuing an open invitation to 10,797 persons and sending random invitations to 2283 persons. We sequenced SARS-CoV-2 from 643 samples. RESULTS: As of April 4, a total of 1221 of 9199 persons (13.3%) who were recruited for targeted testing had positive results for infection with SARS-CoV-2. Of those tested in the general population, 87 (0.8%) in the open-invitation screening and 13 (0.6%) in the random-population screening tested positive for the virus. In total, 6% of the population was screened. Most persons in the targeted-testing group who received positive tests early in the study had recently traveled internationally, in contrast to those who tested positive later in the study. Children under 10 years of age were less likely to receive a positive result than were persons 10 years of age or older, with percentages of 6.7% and 13.7%, respectively, for targeted testing; in the population screening, no child under 10 years of age had a positive result, as compared with 0.8% of those 10 years of age or older. Fewer females than males received positive results both in targeted testing (11.0% vs. 16.7%) and in population screening (0.6% vs. 0.9%). The haplotypes of the sequenced SARS-CoV-2 viruses were diverse and changed over time. The percentage of infected participants that was determined through population screening remained stable for the 20-day duration of screening. CONCLUSIONS: In a population-based study in Iceland, children under 10 years of age and females had a lower incidence of SARS-CoV-2 infection than adolescents or adults and males. The proportion of infected persons identified through population screening did not change substantially during the screening period, which was consistent with a beneficial effect of containment efforts. (Funded by deCODE Genetics-Amgen.).


Assuntos
Infecções por Coronavirus/epidemiologia , Monitoramento Epidemiológico , Pneumonia Viral/epidemiologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Betacoronavirus/genética , COVID-19 , Criança , Pré-Escolar , Busca de Comunicante , Feminino , Haplótipos , Humanos , Islândia/epidemiologia , Lactente , Masculino , Programas de Rastreamento , Pessoa de Meia-Idade , Pandemias , SARS-CoV-2 , Viagem , Adulto Jovem
7.
N Engl J Med ; 383(18): 1724-1734, 2020 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-32871063

RESUMO

BACKGROUND: Little is known about the nature and durability of the humoral immune response to infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). METHODS: We measured antibodies in serum samples from 30,576 persons in Iceland, using six assays (including two pan-immunoglobulin [pan-Ig] assays), and we determined that the appropriate measure of seropositivity was a positive result with both pan-Ig assays. We tested 2102 samples collected from 1237 persons up to 4 months after diagnosis by a quantitative polymerase-chain-reaction (qPCR) assay. We measured antibodies in 4222 quarantined persons who had been exposed to SARS-CoV-2 and in 23,452 persons not known to have been exposed. RESULTS: Of the 1797 persons who had recovered from SARS-CoV-2 infection, 1107 of the 1215 who were tested (91.1%) were seropositive; antiviral antibody titers assayed by two pan-Ig assays increased during 2 months after diagnosis by qPCR and remained on a plateau for the remainder of the study. Of quarantined persons, 2.3% were seropositive; of those with unknown exposure, 0.3% were positive. We estimate that 0.9% of Icelanders were infected with SARS-CoV-2 and that the infection was fatal in 0.3%. We also estimate that 56% of all SARS-CoV-2 infections in Iceland had been diagnosed with qPCR, 14% had occurred in quarantined persons who had not been tested with qPCR (or who had not received a positive result, if tested), and 30% had occurred in persons outside quarantine and not tested with qPCR. CONCLUSIONS: Our results indicate that antiviral antibodies against SARS-CoV-2 did not decline within 4 months after diagnosis. We estimate that the risk of death from infection was 0.3% and that 44% of persons infected with SARS-CoV-2 in Iceland were not diagnosed by qPCR.


Assuntos
Infecções por Coronavirus/imunologia , Imunidade Humoral , Pneumonia Viral/imunologia , Estudos Soroepidemiológicos , Adulto , Idoso , Anticorpos Antivirais/sangue , Betacoronavirus , COVID-19 , Infecções por Coronavirus/mortalidade , Feminino , Humanos , Islândia/epidemiologia , Masculino , Pessoa de Meia-Idade , Pandemias , Pneumonia Viral/mortalidade , Reação em Cadeia da Polimerase , Quarentena , SARS-CoV-2
8.
Nat Methods ; 16(2): 163-166, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30664774

RESUMO

Single-cell RNA-seq makes it possible to characterize the transcriptomes of cell types across different conditions and to identify their transcriptional signatures via differential analysis. Our method detects changes in transcript dynamics and in overall gene abundance in large numbers of cells to determine differential expression. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3' single-cell RNA-seq that can identify previously undetectable marker genes.


Assuntos
Análise de Sequência de RNA , Análise de Célula Única/instrumentação , Análise de Célula Única/métodos , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Marcadores Genéticos , Humanos , Leucócitos Mononucleares/citologia , Isoformas de Proteínas , RNA/genética , Análise de Regressão , Software , Linfócitos T Citotóxicos/citologia , Transcriptoma
9.
Ann Rheum Dis ; 81(8): 1085-1095, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35470158

RESUMO

OBJECTIVES: To find causal genes for rheumatoid arthritis (RA) and its seropositive (RF and/or ACPA positive) and seronegative subsets. METHODS: We performed a genome-wide association study (GWAS) of 31 313 RA cases (68% seropositive) and ~1 million controls from Northwestern Europe. We searched for causal genes outside the HLA-locus through effect on coding, mRNA expression in several tissues and/or levels of plasma proteins (SomaScan) and did network analysis (Qiagen). RESULTS: We found 25 sequence variants for RA overall, 33 for seropositive and 2 for seronegative RA, altogether 37 sequence variants at 34 non-HLA loci, of which 15 are novel. Genomic, transcriptomic and proteomic analysis of these yielded 25 causal genes in seropositive RA and additional two overall. Most encode proteins in the network of interferon-alpha/beta and IL-12/23 that signal through the JAK/STAT-pathway. Highlighting those with largest effect on seropositive RA, a rare missense variant in STAT4 (rs140675301-A) that is independent of reported non-coding STAT4-variants, increases the risk of seropositive RA 2.27-fold (p=2.1×10-9), more than the rs2476601-A missense variant in PTPN22 (OR=1.59, p=1.3×10-160). STAT4 rs140675301-A replaces hydrophilic glutamic acid with hydrophobic valine (Glu128Val) in a conserved, surface-exposed loop. A stop-mutation (rs76428106-C) in FLT3 increases seropositive RA risk (OR=1.35, p=6.6×10-11). Independent missense variants in TYK2 (rs34536443-C, rs12720356-C, rs35018800-A, latter two novel) associate with decreased risk of seropositive RA (ORs=0.63-0.87, p=10-9-10-27) and decreased plasma levels of interferon-alpha/beta receptor 1 that signals through TYK2/JAK1/STAT4. CONCLUSION: Sequence variants pointing to causal genes in the JAK/STAT pathway have largest effect on seropositive RA, while associations with seronegative RA remain scarce.


Assuntos
Artrite Reumatoide , Estudo de Associação Genômica Ampla , Artrite Reumatoide/genética , Predisposição Genética para Doença/genética , Humanos , Interferon-alfa , Janus Quinases/genética , Proteína Tirosina Fosfatase não Receptora Tipo 22/genética , Proteômica , Fatores de Transcrição STAT/genética , Transdução de Sinais/genética
10.
Nat Methods ; 14(7): 687-690, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28581496

RESUMO

We describe sleuth (http://pachterlab.github.io/sleuth), a method for the differential analysis of gene expression data that utilizes bootstrapping in conjunction with response error linear modeling to decouple biological variance from inferential variance. sleuth is implemented in an interactive shiny app that utilizes kallisto quantifications and bootstraps for fast and accurate analysis of data from RNA-seq experiments.


Assuntos
Simulação por Computador , Expressão Gênica/fisiologia , RNA/genética , Software , Sequência de Bases , Modelos Biológicos
11.
Bioinformatics ; 35(21): 4472-4473, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31073610

RESUMO

SUMMARY: We introduce the Barcode-UMI-Set format (BUS) for representing pseudoalignments of reads from single-cell RNA-seq experiments. The format can be used with all single-cell RNA-seq technologies, and we show that BUS files can be efficiently generated. BUStools is a suite of tools for working with BUS files and facilitates rapid quantification and analysis of single-cell RNA-seq data. The BUS format therefore makes possible the development of modular, technology-specific and robust workflows for single-cell RNA-seq analysis. AVAILABILITY AND IMPLEMENTATION: http://BUStools.github.io/ and http://pachterlab.github.io/kallisto/singlecell.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Análise de Sequência de RNA , Análise de Célula Única , Sequenciamento do Exoma
12.
Bioinformatics ; 32(14): 2202-4, 2016 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153590

RESUMO

UNLABELLED: Advances in sequencing capacity have led to the generation of unprecedented amounts of genomic data. The processing of this data frequently leads to I/O bottlenecks, e. g. when analyzing a small genomic region across a large number of samples. The largest I/O burden is, however, often not imposed by the amount of data needed for the analysis but rather by index files that help retrieving this data. We have developed chopBAI, a program that can chop a BAM index (BAI) file into small pieces. The program outputs a list of BAI files each indexing a specified genomic interval. The output files are much smaller in size but maintain compatibility with existing software tools. We show how preprocessing BAI files with chopBAI can lead to a reduction of I/O by more than 95% during the analysis of 10 kb genomic regions, eventually enabling the joint analysis of more than 10 000 individuals. AVAILABILITY AND IMPLEMENTATION: The software is implemented in C ++, GPL licensed and available at http://github.com/DecodeGenetics/chopBAIContact:birte.kehr@decode.is.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Software , Humanos
13.
Bioinformatics ; 32(1): 140-1, 2016 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-26363028

RESUMO

SUMMARY: Large resequencing projects require a significant amount of storage for raw sequences, as well as alignment files. Because the raw sequences are redundant once the alignment has been generated, it is possible to keep only the alignment files. We present BamHash, a checksum based method to ensure that the read pairs in FASTQ files match exactly the read pairs stored in BAM files, regardless of the ordering of reads. BamHash can be used to verify the integrity of the files stored and discover any discrepancies. Thus, BamHash can be used to determine if it is safe to delete the FASTQ files storing raw sequencing read after alignment, without the loss of data. AVAILABILITY AND IMPLEMENTATION: The software is implemented in C++, GPL licensed and available at https://github.com/DecodeGenetics/BamHash CONTACT: pmelsted@hi.is.


Assuntos
Análise de Sequência , Software , Estatística como Assunto , Humanos , Reprodutibilidade dos Testes
14.
Bioinformatics ; 32(7): 961-7, 2016 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-25926346

RESUMO

MOTIVATION: The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions. RESULTS: We have developed the program PopIns, which can discover and characterize non-reference insertions of 100 bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach. AVAILABILITY AND IMPLEMENTATION: The source code of PopIns is available from http://github.com/bkehr/popins CONTACT: birte.kehr@decode.is SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Variação Estrutural do Genoma , Humanos , Mutagênese Insercional , Reprodutibilidade dos Testes
15.
BMC Bioinformatics ; 17(1): 490, 2016 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-27905880

RESUMO

Increased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is difficult to use in practice. We introduce a series of tools for processing and analyzing RNA-Seq data in the Sequence Read Archive, that together have allowed us to build an easily extendable resource for analysis of data underlying published papers. Our system makes the exploration of data easily accessible and usable without technical expertise. Our database and associated tools can be accessed at The Lair: http://pachterlab.github.io/lair .


Assuntos
Bases de Dados de Ácidos Nucleicos , Análise de Sequência de RNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Reprodutibilidade dos Testes
16.
Mol Ecol ; 25(2): 570-80, 2016 01.
Artigo em Inglês | MEDLINE | ID: mdl-26607571

RESUMO

Tracking past population fluctuations can give insight into current levels of genetic variation present within species. Analysing population dynamics over larger timescales can be aligned to known climatic changes to determine the response of species to varying environments. Here, we applied the Pairwise Sequentially Markovian Coalescent (psmc) model to infer past population dynamics of three widespread grouse species; black grouse, willow grouse and rock ptarmigan. This allowed the tracking of the effective population size (Ne ) of all three species beyond 1 Mya, revealing that (i) early Pleistocene cooling (~2.5 Mya) caused an increase in the willow grouse and rock ptarmigan populations, (ii) the mid-Brunhes event (~430 kya) and following climatic oscillations decreased the Ne of willow grouse and rock ptarmigan, but increased the Ne of black grouse and (iii) all three species reacted differently to the last glacial maximum (LGM) - black grouse increased prior to it, rock ptarmigan experienced a severe bottleneck and willow grouse was maintained at large population size. We postulate that the varying psmc signal throughout the LGM depicts only the local history of the species. Nevertheless, the large population fluctuations in willow grouse and rock ptarmigan indicate that both species are opportunistic breeders while black grouse tracks the climatic changes more slowly and is maintained at lower Ne . Our results highlight the usefulness of the psmc approach in investigating species' reaction to climate change in the deep past, but also that caution should be taken in drawing general conclusions about the recent past.


Assuntos
Evolução Biológica , Mudança Climática , Galliformes/genética , Animais , Regiões Árticas , Galliformes/classificação , Variação Genética , Taxa de Mutação , Densidade Demográfica , Dinâmica Populacional
17.
Genome Res ; 22(4): 602-10, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22207615

RESUMO

Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success.


Assuntos
Variação Genética , Primatas/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Animais , Espécies em Perigo de Extinção , Evolução Molecular , Genoma/genética , Humanos , Fígado/metabolismo , Filogenia , Primatas/classificação , Especificidade da Espécie
18.
Bioinformatics ; 30(24): 3541-7, 2014 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-25355787

RESUMO

MOTIVATION: Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment. RESULTS: We present KmerStream, a streaming algorithm for estimating the number of distinct k-mers present in high-throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are logarithmic in the size of the input. We derive a simple model that allows us to estimate the error rate of the sequencing experiment, as well as the genome size, using only the aggregate statistics reported by KmerStream. As an application we show how KmerStream can be used to compute the error rate of a DNA sequencing experiment. We run KmerStream on a set of 2656 whole genome sequenced individuals and compare the error rate to quality values reported by the sequencing equipment. We discover that while the quality values alone are largely reliable as a predictor of error rate, there is considerable variability in the error rates between sequencing runs, even when accounting for reported quality values.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Tamanho do Genoma , Genoma Humano , Genômica/métodos , Humanos , Software
19.
bioRxiv ; 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38045414

RESUMO

The term "RNA-seq" refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, from single cells, or from single nuclei. The kallisto, bustools, and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples, or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data.

20.
bioRxiv ; 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38617255

RESUMO

Standard single-cell RNA-sequencing analysis (scRNA-seq) workflows consist of converting raw read data into cell-gene count matrices through sequence alignment, followed by analyses including filtering, highly variable gene selection, dimensionality reduction, clustering, and differential expression analysis. Seurat and Scanpy are the most widely-used packages implementing such workflows, and are generally thought to implement individual steps similarly. We investigate in detail the algorithms and methods underlying Seurat and Scanpy and find that there are, in fact, considerable differences in the outputs of Seurat and Scanpy. The extent of differences between the programs is approximately equivalent to the variability that would be introduced in benchmarking scRNA-seq datasets by sequencing less than 5% of the reads or analyzing less than 20% of the cell population. Additionally, distinct versions of Seurat and Scanpy can produce very different results, especially during parts of differential expression analysis. Our analysis highlights the need for users of scRNA-seq to carefully assess the tools on which they rely, and the importance of developers of scientific software to prioritize transparency, consistency, and reproducibility for their tools.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA