Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 59
Filter
Add more filters

Publication year range
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Article in English | MEDLINE | ID: mdl-37001506

ABSTRACT

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Subject(s)
Epigenome , Quantitative Trait Loci , Genome-Wide Association Study , Genomics , Phenotype , Polymorphism, Single Nucleotide
2.
Cell ; 174(6): 1361-1372.e10, 2018 09 06.
Article in English | MEDLINE | ID: mdl-30193110

ABSTRACT

A key aspect of genomic medicine is to make individualized clinical decisions from personal genomes. We developed a machine-learning framework to integrate personal genomes and electronic health record (EHR) data and used this framework to study abdominal aortic aneurysm (AAA), a prevalent irreversible cardiovascular disease with unclear etiology. Performing whole-genome sequencing on AAA patients and controls, we demonstrated its predictive precision solely from personal genomes. By modeling personal genomes with EHRs, this framework quantitatively assessed the effectiveness of adjusting personal lifestyles given personal genome baselines, demonstrating its utility as a personal health management tool. We showed that this new framework agnostically identified genetic components involved in AAA, which were subsequently validated in human aortic tissues and in murine models. Our study presents a new framework for disease genome analysis, which can be used for both health management and understanding the biological architecture of complex diseases. VIDEO ABSTRACT.


Subject(s)
Aortic Aneurysm, Abdominal/pathology , Genomics , Animals , Aortic Aneurysm, Abdominal/genetics , Area Under Curve , Disease Models, Animal , Gene Expression Regulation , Gene Regulatory Networks , Genome-Wide Association Study , Humans , Machine Learning , Mice , Polymorphism, Single Nucleotide , Protein Interaction Maps , ROC Curve , Whole Genome Sequencing
3.
Proc Natl Acad Sci U S A ; 117(35): 21364-21372, 2020 09 01.
Article in English | MEDLINE | ID: mdl-32817564

ABSTRACT

A person's genome typically contains millions of variants which represent the differences between this personal genome and the reference human genome. The interpretation of these variants, i.e., the assessment of their potential impact on a person's phenotype, is currently of great interest in human genetics and medicine. We have developed a prioritization tool called OpenCausal which takes as inputs 1) a personal genome and 2) a reference context-specific TF expression profile and returns a list of noncoding variants prioritized according to their impact on chromatin accessibility for any given genomic region of interest. We applied OpenCausal to 6,430 samples across 18 tissues derived from the GTEx project and found that the variants prioritized by OpenCausal are highly enriched for eQTLs and caQTLs. We further propose a strategy to integrate the predicted open scores with genome-wide association studies (GWAS) data to prioritize putative causal variants and regulatory elements for a given risk locus (i.e., fine-mapping analysis). As an initial example, we applied this method to a GWAS dataset of human height and found that the prioritized putative variants and elements are correlated with the phenotype (i.e., heights of individuals) better than others.


Subject(s)
Genetic Techniques , Genetic Variation , Genome, Human , Models, Genetic , Regulatory Elements, Transcriptional , Body Height/genetics , Gene Expression Profiling , Genome-Wide Association Study , Humans , Quantitative Trait Loci , Software , Transcription Factors/metabolism
4.
RNA Biol ; 19(1): 1143-1152, 2022 01.
Article in English | MEDLINE | ID: mdl-36329613

ABSTRACT

Mutations that affect phenotypes have been identified primarily as those that directly alter amino acid sequences or disrupt splice sites. However, some mutations not located in functionally important sites can also affect phenotypes, such as splice-site-creating mutations (SCMs). To investigate how frequent exon extension/shrinkage events induced by SCMs occur in normal individuals, we used personal genome sequencing data and transcriptome data of the corresponding individuals and identified 371 exon extension/shrinkage events in normal individuals. This number was about three times higher than the number of pseudo-exon activation events identified in the previous study. The average numbers of exon extension and exon shrinkage events in each sample were 3.3 and 11.2, respectively. We also evaluated the impact of exon extension/shrinkage events on the resulting transcripts and their protein products and found that 40.2% of the identified events may have possible functional impacts by either generating premature termination codons in transcripts or affecting protein domains. Our results indicated that a certain fraction of SCMs identified in this study can be pathogenic mutations by creating novel splice sites.


Subject(s)
Proteins , RNA Splicing , Exons , Mutation , Base Sequence , Proteins/genetics , RNA Splice Sites , Introns
5.
Int J Mol Sci ; 23(18)2022 Sep 08.
Article in English | MEDLINE | ID: mdl-36142316

ABSTRACT

The number of patients diagnosed with cancer continues to increasingly rise, and has nearly doubled in 20 years. Therefore, predicting cancer occurrence has a significant impact on reducing medical costs, and preventing cancer early can increase survival rates. In the data preprocessing step, since individual genome data are used as input data, they are classified as individual genome data. Subsequently, data embedding is performed in character units, so that it can be used in deep learning. In the deep learning network schema, using preprocessed data, a character-based deep learning network learns the correlation between individual feature data and predicts cancer occurrence. To evaluate the objective reliability of the method proposed in this study, various networks published in other studies were compared and evaluated using the TCGA dataset. As a result of comparing various networks published in other studies using the same data, excellent results were obtained in terms of accuracy, sensitivity, and specificity. Thus, the superiority of the effectiveness of deep learning networks in predicting cancer occurrence using individual whole-genome data was demonstrated. From the results of the confusion matrix, the validity of the model for predicting the cancer using an individual's whole-genome data and the deep learning proposed in this study was proven. In addition, the AUC, which is the area under the ROC curve, which judges the efficiency of diagnosis as a performance evaluation index of the model, was found to be 90% or more, good classification results were derived. The objectives of this study were to use individual genome data for 12 cancers as input data to analyze the whole genome pattern, and to not separately use reference genome sequence data of normal individuals. In addition, several mutation types, including SNV, DEL, and INS, were applied.


Subject(s)
Deep Learning , Neoplasms , Humans , Neoplasms/genetics , ROC Curve , Reproducibility of Results
6.
RNA Biol ; 18(3): 382-390, 2021 03.
Article in English | MEDLINE | ID: mdl-32865117

ABSTRACT

Causative mutations for human genetic disorders have mainly been identified in exonic regions that code for amino acid sequences. Recently, however, it has been reported that mutations in deep intronic regions can also cause certain human genetic disorders by creating novel splice sites, leading to pseudo-exon activation. To investigate how frequently pseudo-exon activation events occur in normal individuals, we conducted in silico identification of such events using personal genome data and corresponding high-quality transcriptome data. With rather stringent conditions, on average, 2.6 pseudo-exon activation events per individual were identified. More pseudo-exon activation events were found in 5' donor splice sites than in 3' acceptor splice sites. Although pseudo-exon activation events have sporadically been reported as causative mutations in genetic disorders, it is revealed in this study that such events can be observed in normal individuals at a certain frequency. We estimate that human genomes typically contain on average at least 10 pseudo-exon activation events. The actual number should be higher than this, because we used stringent criteria to identify pseudo-exon activation events. This suggests that it is worth considering the possibility of pseudo-exon activation when searching for causative mutations of genetic disorders if candidate mutations are not identified in coding regions or RNA splice sites.


Subject(s)
Computational Biology , Exons , Genomics , Pseudogenes , Transcriptional Activation , Transcriptome , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling , Genomics/methods , Humans , Mutation , Polymorphism, Single Nucleotide , RNA Splice Sites , RNA Splicing , Regulatory Sequences, Nucleic Acid
7.
BMC Bioinformatics ; 19(Suppl 17): 501, 2018 Dec 28.
Article in English | MEDLINE | ID: mdl-30591030

ABSTRACT

BACKGROUND: A range of rare and common genetic variants have been discovered to be potentially associated with mental diseases, but many more have not been uncovered. Powerful integrative methods are needed to systematically prioritize both variants and genes that confer susceptibility to mental diseases in personal genomes of individual patients and to facilitate the development of personalized treatment or therapeutic approaches. METHODS: Leveraging deep neural network on the TensorFlow framework, we developed a computational tool, integrated Mental-disorder GEnome Score (iMEGES), for analyzing whole genome/exome sequencing data on personal genomes. iMEGES takes as input genetic mutations and phenotypic information from a patient with mental disorders, and outputs the rank of whole genome susceptibility variants and the prioritized disease-specific genes for mental disorders by integrating contributions from coding and non-coding variants, structural variants (SVs), known brain expression quantitative trait loci (eQTLs), and epigenetic information from PsychENCODE. RESULTS: iMEGES was evaluated on multiple datasets of mental disorders, and it achieved improved performance than competing approaches when large training dataset is available. CONCLUSION: iMEGES can be used in population studies to help the prioritization of novel genes or variants that might be associated with the susceptibility to mental disorders, and also on individual patients to help the identification of genes or variants related to mental diseases.


Subject(s)
Deep Learning , Genetic Predisposition to Disease , Genome, Human , Mental Disorders/genetics , Neural Networks, Computer , Software , Area Under Curve , Databases, Genetic , Humans , Polymorphism, Single Nucleotide/genetics
8.
Hum Mutat ; 38(9): 1266-1276, 2017 09.
Article in English | MEDLINE | ID: mdl-28544481

ABSTRACT

The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Whole Genome Sequencing/methods , Area Under Curve , Genetic Predisposition to Disease , Human Genome Project , Humans , Phenotype , Quantitative Trait Loci
9.
BMC Med Inform Decis Mak ; 17(1): 100, 2017 Jul 06.
Article in English | MEDLINE | ID: mdl-28683736

ABSTRACT

BACKGROUND: With the goal of realizing genome-based personalized healthcare, we have developed a biobank that integrates personal health, genome, and omics data along with biospecimens donated by volunteers of 150,000. Such a large-scale of data integration involves obvious risks of privacy violation. The research use of personal genome and health information is a topic of global discussion with regard to the protection of privacy while promoting scientific advancement. The present paper reports on our plans, current attempts, and accomplishments in addressing security problems involved in data sharing to ensure donor privacy while promoting scientific advancement. METHODS: Biospecimens and data have been collected in prospective cohort studies with the comprehensive agreement. The sample size of 150,000 participants was required for multiple researches including genome-wide screening of gene by environment interactions, haplotype phasing, and parametric linkage analysis. RESULTS: We established the T ohoku M edical M egabank (TMM) data sharing policy: a privacy protection rule that requires physical, personnel, and technological safeguards against privacy violation regarding the use and sharing of data. The proposed policy refers to that of NCBI and that of the Sanger Institute. The proposed policy classifies shared data according to the strength of re-identification risks. Local committees organized by TMM evaluate re-identification risk and assign a security category to a dataset. Every dataset is stored in an assigned segment of a supercomputer in accordance with its security category. A security manager should be designated to handle all security problems at individual data use locations. The proposed policy requires closed networks and IP-VPN remote connections. CONCLUSION: The mission of the biobank is to distribute biological resources most productively. This mission motivated us to collect biospecimens and health data and simultaneously analyze genome/omics data in-house. The biobank also has the mission of improving the quality and quantity of the contents of the biobank. This motivated us to request users to share the results of their research as feedback to the biobank. The TMM data sharing policy has tackled every security problem originating with the missions. We believe our current implementation to be the best way to protect privacy in data sharing.


Subject(s)
Biological Specimen Banks/organization & administration , Computer Security , Health Policy , Information Dissemination/methods , Precision Medicine/standards , Privacy , Biological Specimen Banks/standards , Biometric Identification , Confidentiality , Genome , Humans , Japan , Precision Medicine/methods , Privacy/legislation & jurisprudence , Prospective Studies , Research Design , Tissue Donors
10.
Appl Microbiol Biotechnol ; 100(3): 1319-1331, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26711277

ABSTRACT

In developing countries, livestock are often fed a high-lignin, low-nutrient diet that is rich in aromatic compounds. It is therefore important to understand the structure of the microbial community responsible for the metabolism of these substances. A metagenomic analysis was therefore carried out to assess the microbial communities associated with the liquid and solid fractions of rumen biomaterial from domestic Mehsani buffalo (Bubalus bubalis) fed with varying proportions of roughage. The experimental design consisted of three feeding regimes (50, 75 and 100 % roughage) and two roughage types (green and dry). Genes associated with aromatic compound degradation were assessed via high-throughput DNA sequencing. A total of 3914.94 Mb data were generated from all treatment groups. Genes coding for functional responses associated with aromatic compound metabolism were more prevalent in the liquid fraction of rumen samples than solid fractions. Statistically significant differences (p < 0.05) were also observed between treatment groups. These differences were dependent on the proportion of roughage fed to the animal, with the type of roughage having little effect. The genes present in the highest abundance in all treatment groups were those related to aromatic compound catabolism. At the phylum level, Bacteroidetes were dominant in all treatments closely followed by the Firmicutes. This study demonstrates the use of feed type to selectively enrich microbial communities capable of metabolizing aromatic compounds in the rumen of domestic buffalo. The results may help to improve nutrient utilization efficiency in livestock and are thus of interest to farming industries, particularly in developing countries, worldwide.


Subject(s)
Animal Feed/analysis , Bacteria/metabolism , Buffaloes/microbiology , Dietary Fiber/metabolism , Gastrointestinal Microbiome , Rumen/microbiology , Volatile Organic Compounds/metabolism , Animals , Bacteria/classification , Bacteria/genetics , Bacteria/isolation & purification , Biodiversity , Buffaloes/metabolism , Metagenomics , Phylogeny , Rumen/metabolism
11.
Regul Toxicol Pharmacol ; 74: 178-86, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26387931

ABSTRACT

Next-Generation Sequencing is a rapidly advancing technology that has research and clinical applications. For many cancers, it is important to know the precise mutation(s) present, as specific mutations could indicate or contra-indicate certain treatments as well as be indicative of prognosis. Using the Ion Torrent Personal Genome Machine and the AmpliSeq Cancer Hotspot panel v2, we sequenced two pancreatic cancer cell lines, BxPC-3 and HPAF-II, alone or in mixtures, to determine the error rate, sensitivity, and reproducibility of this system. The system resulted in coverage averaging 2000× across the various amplicons and was able to reliably and reproducibly identify mutations present at a rate of 5%. Identification of mutations present at a lower rate was possible by altering the parameters by which calls were made, but with an increase in erroneous, low-level calls. The panel was able to identify known mutations in these cell lines that are present in the COSMIC database. In addition, other, novel mutations were also identified that may prove clinically useful. The system was assessed for systematic errors such as homopolymer effects, end of amplicon effects and patterns in NO CALL sequence. Overall, the system is adequate at identifying the known, targeted mutations in the panel.


Subject(s)
Biomarkers, Tumor/genetics , DNA Mutational Analysis , Gene Expression Profiling , Genome, Human , Genomics/methods , High-Throughput Nucleotide Sequencing , Mutation , Pancreatic Neoplasms/genetics , Cell Line, Tumor , Computational Biology , Databases, Genetic , Genetic Predisposition to Disease , Humans , Oligonucleotide Array Sequence Analysis , Pancreatic Neoplasms/pathology , Phenotype , Reproducibility of Results , Software
12.
Bioethics ; 28(7): 343-51, 2014 Sep.
Article in English | MEDLINE | ID: mdl-23137034

ABSTRACT

Broad genome-wide testing is increasingly finding its way to the public through the online direct-to-consumer marketing of so-called personal genome tests. Personal genome tests estimate genetic susceptibilities to multiple diseases and other phenotypic traits simultaneously. Providers commonly make use of Terms of Service agreements rather than informed consent procedures. However, to protect consumers from the potential physical, psychological and social harms associated with personal genome testing and to promote autonomous decision-making with regard to the testing offer, we argue that current practices of information provision are insufficient and that there is a place--and a need--for informed consent in personal genome testing, also when it is offered commercially. The increasing quantity, complexity and diversity of most testing offers, however, pose challenges for information provision and informed consent. Both specific and generic models for informed consent fail to meet its moral aims when applied to personal genome testing. Consumers should be enabled to know the limitations, risks and implications of personal genome testing and should be given control over the genetic information they do or do not wish to obtain. We present the outline of a new model for informed consent which can meet both the norm of providing sufficient information and the norm of providing understandable information. The model can be used for personal genome testing, but will also be applicable to other, future forms of broad genetic testing or screening in commercial and clinical settings.


Subject(s)
Disclosure , Genetic Testing/ethics , Genome, Human , Informed Consent/ethics , Models, Theoretical , Morals , Personal Autonomy , Commerce , Comprehension , Decision Making , Genetic Predisposition to Disease , Humans
13.
Genes (Basel) ; 15(2)2024 02 19.
Article in English | MEDLINE | ID: mdl-38397248

ABSTRACT

Genotypic testing is often recommended to improve the management of patients infected with human immunodeficiency virus (HIV). To help combat this major pandemic, next-generation sequencing (NGS) techniques are widely used to analyse resistance to antiretroviral drugs. In this study, we used a Vela Sentosa kit (Vela Diagnostics, Kendall, Singapore), which is usually used for the Ion Torrent personal genome machine (PGM) platform, to sequence HIV using the Illumina Miseq platform. After RNA extraction and reverse transcriptase-polymerase chain reaction (RT-PCR), minor modifications were applied to the Vela Sentosa kit to adapt it to the Illumina Miseq platform. Analysis of the results showed the same mutations present in the samples using both sequencing platforms. The total number of reads varied from 185,069 to 752,343 and from 642,162 to 2,074,028 in the Ion Torrent PGM platform and the Illumina Miseq platform, respectively. The average depth was 21,955 and 46,856 for Ion Torrent PGM and Illumina Miseq platforms, respectively. The cost of sequencing a run of eight samples was quite similar between the two platforms (about USD 1790 for Illumina Miseq and about USD 1833 for Ion Torrent PGM platform). We have shown for the first time that it is possible to adapt and use the Vela Sentosa kit for the Illumina Miseq platform to obtain high-quality results with a similar cost.


Subject(s)
HIV Infections , HIV , Humans , HIV/genetics , Mutation , Genotype , High-Throughput Nucleotide Sequencing/methods , HIV Infections/drug therapy , HIV Infections/genetics
14.
Indian Pacing Electrophysiol J ; 12(2): 54-64, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22557843

ABSTRACT

Disorders of the cardiac rhythm are quite prevalent in clinical practice. Though the variability in drug response between individuals has been extensively studied, this information has not been widely used in clinical practice. Rapid advances in the field of pharmacogenomics have provided us with crucial insights on inter-individual genetic variability and its impact on drug metabolism and action. Technologies for faster and cheaper genetic testing and even personal genome sequencing would enable clinicians to optimize prescription based on the genetic makeup of the individual, which would open up new avenues in the area of personalized medicine. We have systematically looked at literature evidence on pharmacogenomics markers for anti-arrhythmic agents from the OpenPGx consortium collection and reason the applicability of genetics in the management of arrhythmia. We also discuss potential issues that need to be resolved before personalized pharmacogenomics becomes a reality in regular clinical practice.

15.
Genome Biol ; 23(1): 134, 2022 06 29.
Article in English | MEDLINE | ID: mdl-35765079

ABSTRACT

There are major efforts underway to make genome sequencing a routine part of clinical practice. A critical barrier to these is achieving practical solutions for data ownership and integrity. Blockchain provides solutions to these challenges in other realms, such as finance. However, its use in genomics is stymied due to the difficulty in storing large-scale data on-chain, slow transaction speeds, and limitations on querying. To overcome these roadblocks, we developed a private blockchain network to store genomic variants and reference-aligned reads on-chain. It uses nested database indexing with an accompanying tool suite to rapidly access and analyze the data.


Subject(s)
Blockchain , Genome , Genomics
16.
Front Microbiol ; 13: 1021955, 2022.
Article in English | MEDLINE | ID: mdl-36274710

ABSTRACT

Diabetic foot infections (DFIs) represent a frequent complication of diabetes and a major cause of amputations. This study aimed to evaluate the utility of 16S rRNA gene sequencing for the rapid microbiological diagnosis of DFIs and to consistently characterize the microbiome of chronic diabetic foot ulcers (DFUs) and intact skin. Wound samples were collected by ulcer swabbing and tissue biopsy, and paired swabs of intact skin were collected from 10 patients with DFIs (five were moderately infected, and the other five were severely infected). Samples were analyzed by conventional culture and using Personal Genome Machine (PGM) 16S rRNA sequencing technology. The results showed that PGM technology detected significantly more bacterial genera (66.1 vs. 1.5 per wound sample, p < 0.001); more obligate anaerobes (52.5 vs. 0%, p < 0.001) and more polymicrobial infections (100.0 vs. 55.0%, p < 0.01) than conventional cultures. There was no statistically significant difference in bacterial richness, diversity or composition between the wound swabs and tissues (p > 0.05). The bacterial community on intact skin was significantly more diverse than that in DFUs (Chao1 value, p < 0.05; Shannon index value, p < 0.001). Gram-positive bacteria (67.6%) and aerobes (59.2%) were predominant in contralateral intact skin, while Gram-negative bacteria (63.3%) and obligate anaerobes (50.6%) were the most ubiquitous in DFUs. The most differentially abundant taxon in skin was Bacillales, while Bacteroidia was the bacterial taxon most representative of DFUs. Moreover, Fusobacterium (ρ = 0.80, p < 0.01) and Proteus (ρ = 0.78, p < 0.01) were significantly correlated with the duration of DFIs. In conclusion, PGM 16S rRNA sequencing technology could be a potentially useful technique for the rapid microbiological diagnosis of DFIs. Wound swabbing may be sufficient for sampling bacterial pathogens in DFIs compared with biopsy which is an invasive technique. The empirical use of broad-spectrum antibiotics covering Gram-negative obligate anaerobes should be considered for the treatment of moderate or severe DFIs.

17.
Gene ; 769: 145237, 2021 Feb 15.
Article in English | MEDLINE | ID: mdl-33127537

ABSTRACT

Egyptians are at a crossroad between Africa and Eurasia, providing useful genomic resources for analyzing both genetic and environmental factors for future personalized medicine. Two personal Egyptian whole genomes have been published previously by us and here nine female whole genome sequences with clinical information have been added to expand the genomic resource of Egyptian personal genomes. Here we report the analysis of whole genomes of nine Egyptian females from different regions using Illumina short-read sequencers. At 30x sequencing coverage, we identified 12 SNPs that were shared in most of the subjects associated with obesity which are concordant with their clinical diagnosis. Also, we found mtDNA mutation A4282G is common in all the samples and this is associated with chronic progressive external ophthalmoplegia (CPEO). Haplogroup and Admixture analyses revealed that most Egyptian samples are close to the other north Mediterranean, Middle Eastern, and European, respectively, possibly reflecting the into-Africa influx of human migration. In conclusion, we present whole-genome sequences of nine Egyptian females with personal clinical information that cover the diverse regions of Egypt. Although limited in sample size, the whole genomes data provides possible geno-phenotype candidate markers that are relevant to the region's diseases.


Subject(s)
Computational Biology , Genome, Human , Phylogeography , Adult , DNA, Mitochondrial/genetics , Egypt , Female , Humans , Middle Aged , Polymorphism, Single Nucleotide , Whole Genome Sequencing
18.
Comput Struct Biotechnol J ; 19: 3747-3754, 2021.
Article in English | MEDLINE | ID: mdl-34285776

ABSTRACT

Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file.

19.
Front Genet ; 12: 633731, 2021.
Article in English | MEDLINE | ID: mdl-33633791

ABSTRACT

The Welfare Genome Project (WGP) provided 1,000 healthy Korean volunteers with detailed genetic and health reports to test the social perception of integrating personal genetic and healthcare data at a large-scale. WGP was launched in 2016 in the Ulsan Metropolitan City as the first large-scale genome project with public participation in Korea. The project produced a set of genetic materials, genotype information, clinical data, and lifestyle survey answers from participants aged 20-96. As compensation, the participants received a free general health check-up on 110 clinical traits, accompanied by a genetic report of their genotypes followed by genetic counseling. In a follow-up survey, 91.0% of the participants indicated that their genetic reports motivated them to improve their health. Overall, WGP expanded not only the general awareness of genomics, DNA sequencing technologies, bioinformatics, and bioethics regulations among all the parties involved, but also the general public's understanding of how genome projects can indirectly benefit their health and lifestyle management. WGP established a data construction framework for not only scientific research but also the welfare of participants. In the future, the WGP framework can help lay the groundwork for a new personalized healthcare system that is seamlessly integrated with existing public medical infrastructure.

20.
Auris Nasus Larynx ; 48(3): 530-534, 2021 Jun.
Article in English | MEDLINE | ID: mdl-32389511

ABSTRACT

Sinonasal Teratocarcinosarcoma (SNTCS) is a rare and histologically heterogeneous tumor of uncertain origin and unknown molecular pathogenesis. Its location and aggressiveness, with frequent recurrences, high rate for metastasis and short mean survival, make SNTCS a tumor highly difficult to treat. Thus, the identification of underlying genetic changes could potentially provide successful adjuvant or alternative precision medicine treatment options for patients with this tumor. We report here a 55-year-old male with a naso-ethmoidal SNTCS that invaded right maxillary sinus, orbital cavity and cranial anterior fossa and that was treated with surgery followed by radiotherapy and chemotherapy in which we evaluated the mutational profile by multigene panel sequencing. Tumor and adjacent normal mucosa were screened for hotspots and targeted regions of 22 cancer related genes by multigene panel sequencing. The analysis revealed a somatic pathogenic mutations in the PIK3CA gene (p.His1047Leu) and a germline alteration in the DDR2 gene (p.Pro476Leu) whose oncogenic function is considered unknown. This study suggests the involvement of PIK3CA gene mutation in SNTCS tumorigenesis highlighting a potential target for individualized molecular therapy for patients with this tumor.


Subject(s)
Carcinosarcoma/genetics , Class I Phosphatidylinositol 3-Kinases/genetics , Mutation , Nose Neoplasms/genetics , Paranasal Sinus Neoplasms/genetics , Teratoma/genetics , Discoidin Domain Receptor 2/genetics , Ethmoid Sinus , Germ-Line Mutation , Humans , Male , Middle Aged
SELECTION OF CITATIONS
SEARCH DETAIL