Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
Genome Biol ; 25(1): 176, 2024 Jul 04.
Article in English | MEDLINE | ID: mdl-38965568

ABSTRACT

Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .


Subject(s)
Genetic Variation , Genome, Human , Tandem Repeat Sequences , Humans , Software , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Nanopore Sequencing/methods
2.
Nat Biotechnol ; 2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38671154

ABSTRACT

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.

3.
bioRxiv ; 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38328152

ABSTRACT

Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve TR analysis, especially for long or complex repeats. Here we introduce LongTR, which accurately genotypes tandem repeats from high fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr.

4.
bioRxiv ; 2023 Nov 01.
Article in English | MEDLINE | ID: mdl-37961319

ABSTRACT

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.

5.
Nat Commun ; 14(1): 6711, 2023 10 23.
Article in English | MEDLINE | ID: mdl-37872149

ABSTRACT

Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.


Subject(s)
Polymorphism, Single Nucleotide , Tandem Repeat Sequences , Humans , Genotype , Whole Genome Sequencing
6.
Bioinform Adv ; 3(1): vbad058, 2023.
Article in English | MEDLINE | ID: mdl-37168281

ABSTRACT

Summary: TRviz is an open-source Python library for decomposing, encoding, aligning and visualizing tandem repeat (TR) sequences. TRviz takes a collection of alleles (TR containing sequences) and one or more motifs as input and generates a plot showing the motif composition of the TR sequences. Availability and implementation: TRviz is an open-source Python library and freely available at https://github.com/Jong-hun-Park/trviz. Detailed documentation is available at https://trviz.readthedocs.io. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

7.
Sci Rep ; 13(1): 8605, 2023 05 27.
Article in English | MEDLINE | ID: mdl-37244974

ABSTRACT

Continuous, comfortable, convenient (C3), and accurate blood pressure (BP) measurement and monitoring are needed for early diagnosis of various cardiovascular diseases. To supplement the limited C3 BP measurement of existing cuff-based BP technologies, though they may achieve reliable accuracy, cuffless BP measurement technologies, such as pulse transit/arrival time, pulse wave analysis, and image processing, have been studied to obtain C3 BP measurement. One of the recent cuffless BP measurement technologies, innovative machine-learning and artificial intelligence-based technologies that can estimate BP by extracting BP-related features from photoplethysmography (PPG)-based waveforms have attracted interdisciplinary attention of the medical and computer scientists owing to their handiness and effectiveness for both C3 and accurate, i.e., C3A, BP measurement. However, C3A BP measurement remains still unattainable because the accuracy of the existing PPG-based BP methods was not sufficiently justified for subject-independent and highly varying BP, which is a typical case in practice. To circumvent this issue, a novel convolutional neural network(CNN)- and calibration-based model (PPG2BP-Net) was designed by using a comparative paired one-dimensional CNN structure to estimate highly varying intrasubject BP. To this end, approximately [Formula: see text], [Formula: see text], and [Formula: see text] of 4185 cleaned, independent subjects from 25,779 surgical cases were used for training, validating, and testing the proposed PPG2BP-Net, respectively and exclusively (i.e., subject-independent modelling). For quantifying the intrasubject BP variation from an initial calibration BP, a novel 'standard deviation of subject-calibration centring (SDS)' metric is proposed wherein high SDS represents high intrasubject BP variation from the calibration BP and vice versa. PPG2BP-Net achieved accurately estimated systolic and diastolic BP values despite high intrasubject variability. In 629-subject data acquired after 20 minutes following the A-line (arterial line) insertion, low error mean and standard deviation of [Formula: see text] and [Formula: see text] for highly varying A-line systolic and diastolic BP values, respectively, where their SDSs are 15.375 and 8.745. This study moves one step forward in developing the C3A cuffless BP estimation devices that enable the push and agile pull services.


Subject(s)
Hypertension , Photoplethysmography , Humans , Blood Pressure/physiology , Photoplethysmography/methods , Artificial Intelligence , Blood Pressure Determination/methods , Hypertension/diagnosis , Pulse Wave Analysis/methods
8.
bioRxiv ; 2023 Mar 12.
Article in English | MEDLINE | ID: mdl-36945429

ABSTRACT

Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3,550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.

9.
IEEE Trans Cybern ; 53(6): 3518-3531, 2023 Jun.
Article in English | MEDLINE | ID: mdl-34860658

ABSTRACT

Reinforcement learning (RL) has emerged as a promising approach for scheduling semiconductor operations. Yet, it is still challenging to solve large-scale scheduling problems based on an RL method since learning complexity grows fast as the size of shop floor increases. This challenge becomes more apparent when solving the scheduling problems with a diverse number of job types, which leads to the difficulties in exploration and function approximation in RL. This article presents a scheduling method for semiconductor packaging facilities using deep RL in which an agent allocates a job to one of machines in a centralized manner. Specifically, a novel state representation is introduced to effectively accommodate the variations in the number of available machines and the production requirements. Furthermore, we propose a continuous representation of an action to maintain the size of the action space even when the numbers of jobs, machines, and operation types are subject to change. Extensive experiments on large-scale datasets demonstrate that the proposed method mostly outperforms the metaheuristics and rule-based methods, as well as the other RL approaches considered in terms of makespan while requiring much less computation time than the metaheuristics.

10.
Eur J Hum Genet ; 31(2): 216-222, 2023 02.
Article in English | MEDLINE | ID: mdl-36434258

ABSTRACT

Despite substantial efforts in identifying both rare and common variants affecting disease risk, in the majority of diseases, a large proportion of unexplained genetic risk remains. We propose that variable number tandem repeats (VNTRs) may explain a proportion of the missing genetic risk. Herein, in a pilot study with a retrospective cohort design, we tested whether VNTRs are causal modifiers of breast cancer risk in 347 female carriers of the BRCA1 185delAG pathogenic variant, an important group given their high risk of developing breast cancer. We performed targeted-capture to sequence VNTRs, called genotypes with adVNTR, tested the association of VNTRs and breast cancer risk using Cox regression models, and estimated the effect size using a retrospective likelihood approach. Of 303 VNTRs that passed quality control checks, 4 VNTRs were significantly associated with risk to develop breast cancer at false discovery rate [FDR] < 0.05 and an additional 4 VNTRs had FDR < 0.25. After determining the specific risk alleles, there was a significantly earlier age at diagnosis of breast cancer in carriers of the risk alleles compared to those without the risk alleles for seven of eight VNTRs. One example is a VNTR in exon 2 of LINC01973 with a per-allele hazard ratio of 1.58 (1.07-2.33) and 5.28 (2.79-9.99) for the homozygous risk-allele genotype. Results from this first systematic study of VNTRs demonstrate that VNTRs may explain a proportion of the unexplained genetic risk for breast cancer.


Subject(s)
Breast Neoplasms , Minisatellite Repeats , Female , Humans , Breast Neoplasms/genetics , Retrospective Studies , Likelihood Functions , Pilot Projects , Risk Factors , Alleles , BRCA1 Protein/genetics
11.
Eur J Hum Genet ; 30(12): 1413-1422, 2022 12.
Article in English | MEDLINE | ID: mdl-36100708

ABSTRACT

Hereditary chronic kidney disease (CKD) appears to be more frequent than the clinical perception. Exome sequencing (ES) studies in CKD cohorts could identify pathogenic variants in ~10% of individuals. Tubulointerstitial kidney diseases, showing no typical clinical/histologic finding but tubulointerstitial fibrosis, are particularly difficult to diagnose. We used a targeted panel (29 genes) and MUC1-SNaPshot to sequence 271 DNAs, selected in defined disease entities and age cutoffs from 5217 individuals in the German Chronic Kidney Disease cohort. We identified 33 pathogenic variants. Of these 27 (81.8%) were in COL4A3/4/5, the largest group being 15 COL4A5 variants with nine unrelated individuals carrying c.1871G>A, p.(Gly624Asp). We found three cysteine variants in UMOD, a novel missense and a novel splice variant in HNF1B and the homoplastic MTTF variant m.616T>C. Copy-number analysis identified a heterozygous COL4A5 deletion, and a HNF1B duplication/deletion, respectively. Overall, pathogenic variants were present in 12.5% (34/271) and variants of unknown significance in 9.6% (26/271) of selected individuals. Bioinformatic predictions paired with gold standard diagnostics for MUC1 (SNaPshot) could not identify the typical cytosine duplication ("c.428dupC") in any individual, implying that ADTKD-MUC1 is rare. Our study shows that >10% of selected individuals carry disease-causing variants in genes partly associated with tubulointerstitial kidney diseases. COL4A3/4/5 genes constitute the largest fraction, implying they are regularly overlooked using clinical Alport syndrome criteria and displaying the existence of phenocopies. We identified variants easily missed by some ES pipelines. The clinical filtering criteria applied enriched for an underlying genetic disorder.


Subject(s)
Nephritis, Hereditary , Nephritis, Interstitial , Renal Insufficiency, Chronic , Humans , Prevalence , Nephritis, Hereditary/genetics , Nephritis, Interstitial/epidemiology , Nephritis, Interstitial/genetics , Nephritis, Interstitial/diagnosis , Renal Insufficiency, Chronic/diagnosis , Renal Insufficiency, Chronic/epidemiology , Renal Insufficiency, Chronic/genetics , Mutation
12.
iScience ; 25(8): 104785, 2022 Aug 19.
Article in English | MEDLINE | ID: mdl-35982790

ABSTRACT

The human genome contains more than one million tandem repeats (TRs), DNA sequences containing multiple approximate copies of a motif repeated contiguously. TRs account for significant genetic variation, with 50 + diseases attributed to changes in motif number. A few diseases have been to be caused by small indels in variable number tandem repeats (VNTRs) including poly-cystic kidney disease type 1 (MCKD1) and monogenic type 1 diabetes. However, small indels in VNTRs are largely unexplored mainly due to the long and complex structure of VNTRs with multiple motifs. We developed a method, code-adVNTR, that utilizes multi-motif hidden Markov models to detect both, motif count variation and small indels, within VNTRs. In simulated data, code-adVNTR outperformed GATK-HaplotypeCaller in calling small indels within large VNTRs. We used code-adVNTR to characterize coding VNTRs in the 1000 genomes data identifying many population-specific variants, and to reliably call MUC1 mutations for MCKD1.

13.
Bioresour Technol ; 344(Pt A): 126205, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34715337

ABSTRACT

This study aimed a high-rate dark fermentative H2 production from xylose using a dynamic membrane module bioreactor (DMBR) with a 444-µm pore polyester mesh. 20 g xylose/L was fed continuously to the DMBR at different hydraulic retention times (HRTs) from 12 to 3 h at 37 °C. The maximum average H2 yield (HY) and H2 production rate (HPR) at 3 h HRT were found to be 1.40 ± 0.07 mol H2/mol xyloseconsumed and 30.26 ± 1.19 L H2/L-d, respectively. The short HRT resulted in the maximum suspended biomass concentration (8.92 ± 0.40 g VSS/L) along with significant attached biomass retention (7.88 ± 0.22 g VSS/L). H2 was produced by both butyric and acetic acid pathways. Low HY was concurrent with lactic acid production. The bacterial population shifted from non-H2 producers, such as Lactobacillus and Sporolactobacillus spp., to Clostridium sp., when HY increased. Thus, xylose from lignocellulose is a feasible substrate for dark fermentative H2 production using DMBR.


Subject(s)
Hydrogen , Xylose , Bioreactors , Clostridium , Fermentation
14.
Materials (Basel) ; 14(21)2021 Nov 03.
Article in English | MEDLINE | ID: mdl-34772138

ABSTRACT

Unit loads consisting of a pallet, packages, and a product securement system are the dominant way of shipping products across the United States. The most common packaging types used in unit loads are corrugated boxes. Due to the great stresses created during unit load stacking, accurately predicting the compression strength of corrugated boxes is critical to preventing unit load failure. Although many variables affect the compression strength of corrugated boxes, recently, it was found that changing the pallet's top deck stiffness can significantly affect compression strength. However, there is still a lack of understanding of how these different factors influence this phenomenon. This study investigated the effect of pallet's top-deck stiffness on corrugated box compression strength as a function of initial top deck thickness, pallet wood species, box size, and board grade. The amount of increase in top deck thickness needed to lower the board grade of corrugated boxes by one level from the initial unit load scenario was determined using PDS™. The benefits of increasing top deck thickness diminish as the initial top deck thickness increases due to less severe pallet deflection from the start. The benefits were more pronounced as higher board grade boxes were initially used, and as smaller-sized boxes were used due to the heavier weights of these unit loads. Therefore, supposing that a company uses lower stiffness pallets or heavy corrugated boxes for their unit loads, this study suggests that they will find more opportunities to optimize their unit loads by increasing their pallet's top deck thickness.

15.
Bioresour Technol ; 342: 125942, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34563827

ABSTRACT

This study examined the effect of various inocula on mixed-culture dark fermentative H2 production from food waste. Heat-treated and frozen H2-producing granular sludge (HPG) grown with monomeric sugars showed a higher H2 yield, production rate, and acidogenic efficiency along with a shorter lag phase than heat-treated methanogenic sludge. Among three different methods of methanogenic sludge inoculation, inoculation after centrifugation showed better H2 production performance. Propionic acid production and homoacetogenesis were regarded as major H2-consuming pathways when methanogenic sludge was used, whereas only homoacetogenesis was found in HPG-inoculated fermentation. During fermentation, the abundance of Clostridium increased greater than 48-fold for methanogenic sludge and greater than 108-fold for HPG, respectively. The initial abundance of Clostridium showed a linear relationship with the H2 production rate and lag-phase time. The use of inoculum with a high abundance of Clostridium is essential for H2 production from food waste.


Subject(s)
Food , Refuse Disposal , Bioreactors , Clostridium , Fermentation , Hydrogen , Sewage
16.
Folia Microbiol (Praha) ; 66(6): 1039-1046, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34346036

ABSTRACT

The JS7 strain, isolated from an old forest tree, produces extracellular enzymes that decolorize synthetic and natural melanin from human hair. Phylogenetic analysis based on the internal transcribed spacer (ITS) sequence indicated that JS7 belongs to the genus Irpex. The JS7 strain has laccase activity while it lacks manganese and lignin peroxidase activity, which suggests that the JS7 strain melanin decolorization activity originated from laccase. Laccase production from the Irpex sp. JS7 improved three-fold in the presence of veratryl alcohol, compared to without an inducer. The optimum pH and temperature for melanin decolorization were 7.5 and 40 °C, respectively. The crude enzyme half-life at 25 °C was about 100 days, and it had high storage stability. The melanin decolorization reaction rate by the crude enzyme conformed to typical enzyme kinetic principles. In the presence of syringaldehyde as a redox mediator, the melanin decolorization rate was 75% within 5 days, similar to the decolorization percentage obtained using the enzyme alone. Based on these results, the Irpex sp. JS7 enzyme is suitable for use in melanin decolorization by whitening agents in the cosmetics industry.


Subject(s)
Laccase , Polyporales , Humans , Laccase/genetics , Laccase/metabolism , Melanins/metabolism , Oxidation-Reduction , Phylogeny , Polyporales/metabolism
17.
Bioresour Technol ; 340: 125562, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34325392

ABSTRACT

This study aimed to achieve continuous biohydrogen production from red algal biomass using a dynamic membrane bioreactor (DMBR). The DMBR was continuously fed with pretreated Echeuma spinosum containing 20 g/L hexose. The highest average hydrogen production rate (HPR) of 21.58 ± 1.59 L/L-d was observed at HRT 3 h, which was higher than previous reports for continuous H2 production from biomass feedstock. Metabolic flux analysis revealed that butyric acid and propionic acid were the major by-products of the H2-producing and H2-consuming pathways, respectively, of the algal biomass fermentation. Hydrogen consumption by propionic acid pathway could not be prevented completely by heat treatment. PICRUSt2 analysis predicted that Clostridium sp., Anaerostipes sp., and Caproiciproducens sp. might significantly contribute to the expression of both ferredoxin hydrogenase and propionate CoA-transferase. This study would provide the design and operational information on high-rate bioreactor for continuous hydrogen production using biomass.


Subject(s)
Bioreactors , Hydrogen , Biomass , Clostridium , Fermentation
18.
Sensors (Basel) ; 21(7)2021 Apr 02.
Article in English | MEDLINE | ID: mdl-33918116

ABSTRACT

As unmanned aerial vehicles have become popular, the number of accidents caused by an operator's inattention have increased. To prevent such accidents, the operator should maintain an attention status. However, limited research has been conducted on the brain-computer interface (BCI)-based system with an alerting module for the operator's attention recovery of unmanned aerial vehicles. Therefore, we introduce a detection and alerting system that prevents an unmanned aerial vehicle operator from falling into inattention status by using the operator's electroencephalogram signal. The proposed system consists of the following three components: a signal processing module, which collects and preprocesses an electroencephalogram signal of an operator, an inattention detection module, which determines whether an inattention status occurred based on the preprocessed signal, and, lastly, an alert providing module that presents stimulus to an operator when inattention is detected. As a result of evaluating the performance with a real-world dataset, it was shown that the proposed system successfully contributed to the recovery of operator attention in the evaluating dataset, although statistical significance could not be established due to the small number of subjects.


Subject(s)
Electroencephalography , Signal Processing, Computer-Assisted , Cognition , Humans
19.
Nat Commun ; 12(1): 2075, 2021 04 06.
Article in English | MEDLINE | ID: mdl-33824302

ABSTRACT

Variable number tandem repeats (VNTRs) account for significant genetic variation in many organisms. In humans, VNTRs have been implicated in both Mendelian and complex disorders, but are largely ignored by genomic pipelines due to the complexity of genotyping and the computational expense. We describe adVNTR-NN, a method that uses shallow neural networks to genotype a VNTR in 18 seconds on 55X whole genome data, while maintaining high accuracy. We use adVNTR-NN to genotype 10,264 VNTRs in 652 GTEx individuals. Associating VNTR length with gene expression in 46 tissues, we identify 163 "eVNTRs". Of the 22 eVNTRs in blood where independent data is available, 21 (95%) are replicated in terms of significance and direction of association. 49% of the eVNTR loci show a strong and likely causal impact on the expression of genes and 80% have maximum effect size at least 0.3. The impacted genes are involved in diseases including Alzheimer's, obesity and familial cancers, highlighting the importance of VNTRs for understanding the genetic basis of complex diseases.


Subject(s)
Gene Expression Regulation , Minisatellite Repeats/genetics , Alleles , Cerebral Cortex/metabolism , Cohort Studies , Genetic Loci , Genotype , Humans , Reproducibility of Results
20.
Bioresour Technol ; 320(Pt A): 124279, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33152682

ABSTRACT

This study examined the feasibility of dark fermentative biohydrogen production from food waste using hybrid immobilization in mesophilic condition. Among four different organic loading rates (OLRs), the highest average hydrogen production rate (HPR) of 9.82 ± 0.30 L/L-d was found at an OLR of 74.7 g hexose/L-d, which was higher than reported values from particulate feedstock in mesophilic condition. The average hydrogen yield (HY) at the condition was 1.25 ± 0.04 mol H2/mol hexoseconsumed. Whereas the average HPR and HY at an OLR 80 g hexose/L-d were 5.82 ± 0.12 L/L-d and 0.64 ± 0.02 mol H2/mol hexoseconsumed, respectively. Metabolic flux analysis showed the low HY was concurrent with the highest propionic acid and homoacetogenis. Bacterial population was shift from Clostridium sp. to non-hydrogen producers including Bifidobacterium, Bacteriodes, Olsenella, Dysgonomonas, and Dialister sp.


Subject(s)
Microbiota , Refuse Disposal , Bioreactors , Fermentation , Food , Hydrogen
SELECTION OF CITATIONS
SEARCH DETAIL
...