Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39131277

RESUMO

We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives.

2.
Nature ; 630(8016): 401-411, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38811727

RESUMO

Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.


Assuntos
Hominidae , Cromossomo X , Cromossomo Y , Animais , Feminino , Masculino , Gorilla gorilla/genética , Hominidae/genética , Hominidae/classificação , Hylobatidae/genética , Pan paniscus/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Pongo pygmaeus/genética , Telômero/genética , Cromossomo X/genética , Cromossomo Y/genética , Evolução Molecular , Variações do Número de Cópias de DNA/genética , Humanos , Espécies em Perigo de Extinção , Padrões de Referência
4.
bioRxiv ; 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38077089

RESUMO

Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.

5.
bioRxiv ; 2023 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-37425881

RESUMO

Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).

6.
Genes (Basel) ; 13(8)2022 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-36011253

RESUMO

Protein-protein functional interactions arise from either transitory or permanent biomolecular associations and often lead to the coevolution of the interacting residues. Although mutual information has traditionally been used to identify coevolving residues within the same protein, its application between coevolving proteins remains largely uncharacterized. Therefore, we developed the Protein Interactions Calculator (PIC) to efficiently identify coevolving residues between two protein sequences using mutual information. We verified the algorithm using 2102 known human protein interactions and 233 known bacterial protein interactions, with a respective 1975 and 252 non-interacting protein controls. The average PIC score for known human protein interactions was 4.5 times higher than non-interacting proteins (p = 1.03 × 10-108) and 1.94 times higher in bacteria (p = 1.22 × 10-35). We then used the PIC scores to determine the probability that two proteins interact. Using those probabilities, we paired 37 Alzheimer's disease-associated proteins with 8608 other proteins and determined the likelihood that each pair interacts, which we report through a web interface. The PIC had significantly higher sensitivity and residue-specific resolution not available in other algorithms. Therefore, we propose that the PIC can be used to prioritize potential protein interactions, which can lead to a better understanding of biological processes and additional therapeutic targets belonging to protein interaction groups.


Assuntos
Doença de Alzheimer , Proteoma , Doença de Alzheimer/genética , Evolução Molecular , Humanos , Internet , Software
7.
GigaByte ; 2022: gigabyte67, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36824527

RESUMO

Caranx ignobilis, commonly known as giant kingfish or giant trevally, is a large, reef-associated apex predator. It is a prized sportfish, targeted throughout its tropical and subtropical range in the Indian and Pacific Oceans. It also gained significant interest in aquaculture due to its unusual freshwater tolerance. Here, we present a draft assembly of the estimated 625.92 Mbp nuclear genome of a C. ignobilis individual from Hawaiian waters, which host a genetically distinct population. Our 97.4% BUSCO-complete assembly has a contig NG50 of 7.3 Mbp and a scaffold NG50 of 46.3 Mbp. Twenty-five of the 203 scaffolds contain 90% of the genome. We also present noisy, long-read DNA, Hi-C, and RNA-seq datasets, the latter containing eight distinct tissues and can help with annotations and studies of freshwater tolerance. Our genome assembly and its supporting data are valuable tools for ecological and comparative genomics studies of kingfishes and other carangoid fishes.

8.
GigaByte ; 2022: gigabyte44, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36968794

RESUMO

The roundjaw bonefish, Albula glossodonta, is the most widespread albulid in the Indo-Pacific and is vulnerable to extinction. We assembled the genome of a roundjaw bonefish from Hawai'i, USA, which will be instrumental for effective transboundary management and conservation when paired with population genomics datasets. The 1.05 gigabase pair (Gbp) contig-level assembly had a 4.75 megabase pair (Mbp) NG50 and a maximum contig length of 28.2 Mbp. Scaffolding yielded an LG50 of 20 and an NG50 of 14.49 Mbp, with the longest scaffold reaching 42.29 Mbp. The genome comprised 6.5% repetitive elements and was annotated with 28.3 K protein-coding genes. We then evaluated population genetic connectivity between six atolls in the Western Indian Ocean with 38,355 SNP loci across 66 A. glossodonta individuals. We discerned shallow population structure and observed genetic homogeneity between atolls in Seychelles and reduced gene flow between Seychelles and Mauritius. The South Equatorial Current might be the limiting mechanism of this reduced gene flow. The genome assembly will be useful for addressing taxonomic uncertainties of bonefishes globally.

9.
G3 (Bethesda) ; 11(10)2021 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-34568914

RESUMO

The bluefin trevally, Caranx melampygus, also known as the bluefin kingfish or bluefin jack, is known for its remarkable, bright-blue fins. This marine teleost is a widely prized sportfish, but few resources have been devoted to the genomics and conservation of this species because it is not targeted by large-scale commercial fisheries. Population declines from recreational and artisanal overfishing have been observed in Hawai'i, USA, resulting in both an interest in aquaculture and concerns about the long-term conservation of this species. Most research to-date has been performed in Hawai'i, raising questions about the status of bluefin trevally populations across its Indo-Pacific range. Genomic resources allow for expanded research on stock status, genetic diversity, and population demography. We present a high quality, 711 Mb nuclear genome assembly of a Hawaiian bluefin trevally from noisy long-reads with a contig NG50 of 1.2 Mb and longest contig length of 8.9 Mb. As measured by single-copy orthologs, the assembly was 95% complete, and the genome is comprised of 16.9% repetitive elements. The assembly was annotated with 33.1 K protein-coding genes, 71.4% of which were assigned putative functions, using RNA-seq data from eight tissues from the same individual. This is the first whole-genome assembly published for the carangoid genus Caranx. Using this assembled genome, a multiple sequentially Markovian coalescent model was implemented to assess population demography. Estimates of effective population size suggest population expansion has occurred since the Late Pleistocene. This genome will be a valuable resource for comparative phylogenomic studies of carangoid fishes and will help elucidate demographic history and delineate stock structure for bluefin trevally populations throughout the Indo-Pacific.


Assuntos
Conservação dos Recursos Naturais , Perciformes , Animais , Pesqueiros , Peixes/genética , Genoma , Perciformes/genética
10.
Genome ; 62(12): 785-792, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31491336

RESUMO

Carbapenem-resistant bacteria have quickly become a worldwide concern in nosocomial infections. Of the seven known carbapenemases, four have been shown to be particularly problematic: KPC, NDM, IMP, and VIM. To date, many local and species- or carbapenemase-specific epidemiological studies have been performed, which often focus on the organism itself. This report attempts to perform an inclusive (encompass both species and carbapenemase) epidemiologic study using publicly available plasmid sequences from NCBI. In this report, the gene content of these various plasmids has been characterized, replicon types of the plasmids identified, and the global spread and species promiscuity of the plasmids analyzed. Additionally, support to several groups targeting plasmid maintenance and transfer mechanisms to slow the spread of resistance plasmids is given.


Assuntos
Proteínas de Bactérias/genética , Farmacorresistência Bacteriana/genética , Plasmídeos/genética , beta-Lactamases/genética , Antibacterianos , Enterobacteriáceas Resistentes a Carbapenêmicos/genética , Carbapenêmicos , China , Bases de Dados de Ácidos Nucleicos , Plasmídeos/classificação , Replicon , Estados Unidos
11.
Bioinformatics ; 35(4): 546-552, 2019 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-30084941

RESUMO

MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorithms require all-versus-all BLAST comparisons, which are time-consuming and memory intensive. RESULTS: In contrast to existing approaches, JustOrthologs exploits the conservation of gene structure by using the lengths of coding sequence regions and dinucleotide percentages to identify orthologs. In comparison to OrthoMCL, OMA and OrthoFinder, JustOrthologs decreases ortholog identification runtime by more than 96% and achieves comparable precision and recall scores. The computational speedup allowed us to conduct pairwise comparisons of 1197 complete genomes (780 eukaryotes and 417 archaea). We confirmed gene annotations for 384 120 genes, grouped 1 675 415 genes in previously unreported ortholog groups, and identified 51 429 potentially mislabeled genes across 622 843 ortholog groups. AVAILABILITY AND IMPLEMENTATION: JustOrthologs is an open source collaborative software package available in the GitHub repository: https://github.com/ridgelab/JustOrthologs/. All test FASTA files used for comparisons are freely available at https://github.com/ridgelab/JustOrthologs/comparisonFastaFiles/. Reference genomes used in this work are available for download from the NCBI repository: ftp://ftp.ncbi.nih.gov/genomes/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genômica , Software , Biologia Computacional , Anotação de Sequência Molecular , Filogenia
12.
Bioinformatics ; 33(24): 3922-3928, 2017 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-28968741

RESUMO

MOTIVATION: One of the main challenges with bioinformatics software is that the size and complexity of datasets necessitate trading speed for accuracy, or completeness. To combat this problem of computational complexity, a plethora of heuristic algorithms have arisen that report a 'good enough' solution to biological questions. However, in instances such as Simple Sequence Repeats (SSRs), a 'good enough' solution may not accurately portray results in population genetics, phylogenetics and forensics, which require accurate SSRs to calculate intra- and inter-species interactions. RESULTS: We present Kmer-SSR, which finds all SSRs faster than most heuristic SSR identification algorithms in a parallelized, easy-to-use manner. The exhaustive Kmer-SSR option has 100% precision and 100% recall and accurately identifies every SSR of any specified length. To identify more biologically pertinent SSRs, we also developed several filters that allow users to easily view a subset of SSRs based on user input. Kmer-SSR, coupled with the filter options, accurately and intuitively identifies SSRs quickly and in a more user-friendly manner than any other SSR identification algorithm. AVAILABILITY AND IMPLEMENTATION: The source code is freely available on GitHub at https://github.com/ridgelab/Kmer-SSR. CONTACT: perry.ridge@byu.edu.


Assuntos
Algoritmos , Repetições de Microssatélites , Software , Biologia Computacional/métodos
13.
Genome ; 60(5): 384-392, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28177839

RESUMO

Species of the genus Poa are taxonomically and genetically difficult to delineate owing to high and variable polyploidy, aneuploidy, and challenging breeding systems. Approximately 5% of the proposed species in Poa are considered to include or comprise diploids, but very few of those diploids are represented in seed collections. Recent phylogenetic studies of Poa have included some diploid species to elucidate Poa genome relationships. In this study, we build upon that foundation of diploid Poa relationships with additional confirmed diploid species and accessions, and with additional chloroplast sequences. We also include samples of P. pratensis and P. arachnifera to hone in on possible ancestral genomes in these two agronomic and highly polyploidy species. Relative to most species of Poa, Poa section Dioicopoa (P. ligularis, P. iridifolia, and P. arachnifera) contained relatively large chromosomes. Phylogenies were constructed using the TLF gene region and five additional chloroplast genes, and the placement of new species and accessions fit within chloroplast lineages previously reported better than by taxonomic subgenera and sections. Low-ploidy species in the P chloroplast lineage, such as P. iberica and P. remota, grouped closest to P. pratensis.


Assuntos
DNA de Cloroplastos/genética , Filogenia , Ploidias , Poa/genética , DNA de Cloroplastos/química , DNA de Cloroplastos/classificação , DNA de Plantas/química , DNA de Plantas/genética , Diploide , Geografia , Poa/classificação , Poliploidia , RNA de Transferência/genética , Análise de Sequência de DNA , Especificidade da Espécie
14.
BMC Bioinformatics ; 17 Suppl 7: 239, 2016 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-27454357

RESUMO

BACKGROUND: Analyzing next-generation sequencing data is difficult because datasets are large, second generation sequencing platforms have high error rates, and because each position in the target genome (exome, transcriptome, etc.) is sequenced multiple times. Given these challenges, numerous bioinformatic algorithms have been developed to analyze these data. These algorithms aim to find an appropriate balance between data loss, errors, analysis time, and memory footprint. Typical analysis pipelines require multiple steps. If one or more of these steps is unnecessary, it would significantly decrease compute time and data manipulation to remove the step. One step in many pipelines is PCR duplicate removal, where PCR duplicates arise from multiple PCR products from the same template molecule binding on the flowcell. These are often removed because there is concern they can lead to false positive variant calls. Picard (MarkDuplicates) and SAMTools (rmdup) are the two main softwares used for PCR duplicate removal. RESULTS: Approximately 92 % of the 17+ million variants called were called whether we removed duplicates with Picard or SAMTools, or left the PCR duplicates in the dataset. There were no significant differences between the unique variant sets when comparing the transition/transversion ratios (p = 1.0), percentage of novel variants (p = 0.99), average population frequencies (p = 0.99), and the percentage of protein-changing variants (p = 1.0). Results were similar for variants in the American College of Medical Genetics genes. Genotype concordance between NGS and SNP chips was above 99 % for all genotype groups (e.g., homozygous reference). CONCLUSIONS: Our results suggest that PCR duplicate removal has minimal effect on the accuracy of subsequent variant calls.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software , Confiabilidade dos Dados , Genoma Humano , Genômica/métodos , Humanos , Reação em Cadeia da Polimerase
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA