Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters











Publication year range
1.
Plant Commun ; : 101002, 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38872306

ABSTRACT

Despite considerable advances in extracting crucial insights from bio-omics data to unravel the intricate mechanisms underlying complex traits, the absence of a universal multi-modal computational tool with robust interpretability for accurate phenotype prediction and identification of trait-associated genes remains a challenge. This study introduces the dual-extraction modeling (DEM) approach, a multi-modal deep-learning architecture designed to extract representative features from heterogeneous omics datasets, enabling the prediction of complex trait phenotypes. Through comprehensive benchmarking experiments, we demonstrate the efficacy of DEM in classification and regression prediction of complex traits. DEM consistently exhibits superior accuracy, robustness, generalizability, and flexibility. Notably, we establish its effectiveness in predicting pleiotropic genes that influence both flowering time and rosette leaf number, underscoring its commendable interpretability. In addition, we have developed user-friendly software to facilitate seamless utilization of DEM's functions. In summary, this study presents a state-of-the-art approach with the ability to effectively predict qualitative and quantitative traits and identify functional genes, confirming its potential as a valuable tool for exploring the genetic basis of complex traits.

2.
bioRxiv ; 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38260665

ABSTRACT

Individualized phenotypic prediction based on structural MRI is an important goal in neuroscience. Prediction performance increases with larger samples, but small-scale datasets with fewer than 200 participants are often unavoidable. We have previously proposed a "meta-matching" framework to translate models trained from large datasets to improve the prediction of new unseen phenotypes in small collection efforts. Meta-matching exploits correlations between phenotypes, yielding large improvement over classical machine learning when applied to prediction models using resting-state functional connectivity as input features. Here, we adapt the two best performing meta-matching variants ("meta-matching finetune" and "meta-matching stacking") from our previous study to work with T1-weighted MRI data by changing the base neural network architecture to a 3D convolution neural network. We compare the two meta-matching variants with elastic net and classical transfer learning using the UK Biobank (N = 36,461), Human Connectome Project Young Adults (HCP-YA) dataset (N = 1,017) and HCP-Aging dataset (N = 656). We find that meta-matching outperforms elastic net and classical transfer learning by a large margin, both when translating models within the same dataset, as well as translating models across datasets with different MRI scanners, acquisition protocols and demographics. For example, when translating a UK Biobank model to 100 HCP-YA participants, meta-matching finetune yielded a 136% improvement in variance explained over transfer learning, with an average absolute gain of 2.6% (minimum = -0.9%, maximum = 17.6%) across 35 phenotypes. Overall, our results highlight the versatility of the meta-matching framework.

3.
G3 (Bethesda) ; 13(4)2023 04 11.
Article in English | MEDLINE | ID: mdl-36625555

ABSTRACT

Accurate prediction of the phenotypic outcomes produced by different combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decades have seen an expansion of new methods applied toward this goal. Here we predict maize yield using deep neural networks, compare the efficacy of 2 model development methods, and contextualize model performance using conventional linear and machine learning models. We examine the usefulness of incorporating interactions between disparate data types. We find deep learning and best linear unbiased predictor (BLUP) models with interactions had the best overall performance. BLUP models achieved the lowest average error, but deep learning models performed more consistently with similar average error. Optimizing deep neural network submodules for each data type improved model performance relative to optimizing the whole model for all data types at once. Examining the effect of interactions in the best-performing model revealed that including interactions altered the model's sensitivity to weather and management features, including a reduction of the importance scores for timepoints expected to have a limited physiological basis for influencing yield-those at the extreme end of the season, nearly 200 days post planting. Based on these results, deep learning provides a promising avenue for the phenotypic prediction of complex traits in complex environments and a potential mechanism to better understand the influence of environmental and genetic factors.


Subject(s)
Deep Learning , Neural Networks, Computer , Machine Learning , Genotype , Multifactorial Inheritance
4.
J Dairy Sci ; 104(7): 8107-8121, 2021 Jul.
Article in English | MEDLINE | ID: mdl-33865589

ABSTRACT

Fourier-transform infrared (FTIR) spectroscopy is a powerful high-throughput phenotyping tool for predicting traits that are expensive and difficult to measure in dairy cattle. Calibration equations are often developed using standard methods, such as partial least squares (PLS) regression. Methods that employ penalization, rank-reduction, and variable selection, as well as being able to model the nonlinear relations between phenotype and FTIR, might offer improvements in predictive ability and model robustness. This study aimed to compare the predictive ability of 2 machine learning methods, namely random forest (RF) and gradient boosting machine (GBM), and penalized regression against PLS regression for predicting 3 phenotypes differing in terms of biological meaning and relationships with milk composition (i.e., phenotypes measurable directly and not directly in milk, reflecting different biological processes which can be captured using milk spectra) in Holstein-Friesian cattle under 2 cross-validation scenarios. The data set comprised phenotypic information from 471 Holstein-Friesian cows, and 3 target phenotypes were evaluated: (1) body condition score (BCS), (2) blood ß-hydroxybutyrate (BHB, mmol/L), and (3) κ-casein expressed as a percentage of nitrogen (κ-CN, % N). The data set was split considering 2 cross-validation scenarios: samples-out random in which the population was randomly split into 10-folds (8-folds for training and 1-fold for validation and testing); and herd/date-out in which the population was randomly assigned to training (70% herd), validation (10%), and testing (20% herd) based on the herd and date in which the samples were collected. The random grid search was performed using the training subset for the hyperparameter optimization and the validation set was used for the generalization of prediction error. The trained model was then used to assess the final prediction in the testing subset. The grid search for penalized regression evidenced that the elastic net (EN) was the best regularization with increase in predictive ability of 5%. The performance of PLS (standard model) was compared against 2 machine learning techniques and penalized regression using 2 cross-validation scenarios. Machine learning methods showed a greater predictive ability for BCS (0.63 for GBM and 0.61 for RF), BHB (0.80 for GBM and 0.79 for RF), and κ-CN (0.81 for GBM and 0.80 for RF) in samples-out cross-validation. Considering a herd/date-out cross-validation these values were 0.58 (GBM and RF) for BCS, 0.73 (GBM and RF) for BHB, and 0.77 (GBM and RF) for κ-CN. The GBM model tended to outperform other methods in predictive ability around 4%, 1%, and 7% for EN, RF, and PLS, respectively. The prediction accuracies of the GBM and RF models were similar, and differed statistically from the PLS model in samples-out random cross-validation. Although, machine learning techniques outperformed PLS in herd/date-out cross-validation, no significant differences were observed in terms of predictive ability due to the large standard deviation observed for predictions. Overall, GBM achieved the highest accuracy of FTIR-based prediction of the different phenotypic traits across the cross-validation scenarios. These results indicate that GBM is a promising method for obtaining more accurate FTIR-based predictions for different phenotypes in dairy cattle.


Subject(s)
Machine Learning , Milk , 3-Hydroxybutyric Acid , Animals , Cattle , Female , Phenotype , Spectroscopy, Fourier Transform Infrared/veterinary
5.
Front Microbiol ; 12: 748779, 2021.
Article in English | MEDLINE | ID: mdl-35046909

ABSTRACT

Rice wine koji, a traditional homemade starter culture in China, is nutritious and delicious. The final quality of rice wine koji is closely related to the structure of its microbial community. However, the diversity of natural microorganisms in rice wine koji from different regions has not been evaluated. In this study, the microbial population of 92 naturally fermented rice koji samples collected from Hubei, Guangxi, and Sichuan was systematically analyzed by high-throughput sequencing. From all the rice wine koji samples, 22 phyla and 479 bacterial genera were identified. Weissella, Pediococcus, Lactobacillus, Enterobacter, Lactococcus, Pantoea, Bacillus, Staphylococcus, and Leuconostoc were the dominant genera in rice wine koji. The bacterial community structure of rice wine koji samples from different regions was significantly different (p < 0.05). The bacterial community composition of the samples from Hubei and Guangxi was similar, but significantly different from that of SC samples (p < 0.05). These differences may be caused by variations in geography, environment, or manufacturing. In addition, the results of microbial phenotype prediction by BugBase and bacterial functional potential prediction by PICRUSt showed that eight of the nine predicted phenotypic functions of rice wine koji samples from different regions were significantly different (p < 0.05) and that vigorous bacterial metabolism occurred in rice wine koji samples.

6.
J Dairy Sci ; 102(10): 9409-9421, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31447154

ABSTRACT

In this study, we compared multiple logistic regression, a linear method, to naive Bayes and random forest, 2 nonlinear machine-learning methods. We used all 3 methods to predict individual survival to second lactation in dairy heifers. The data set used for prediction contained 6,847 heifers born between January 2012 and June 2013, and had known survival outcomes. Each animal had 50 genomic estimated breeding values available at birth and up to 65 phenotypic variables that accumulated over time. Survival was predicted at 5 moments in life: at birth, at 18 mo, at first calving, at 6 wk after first calving, and at 200 d after first calving. The data sets were randomly split into 70% training and 30% testing sets to evaluate model performance for 20-fold validation. The methods were compared for accuracy, sensitivity, specificity, area under the curve (AUC) value, contrasts between groups for the prediction outcomes, and increase in surviving animals in a practical scenario. At birth and 18 mo, all methods had overlapping performance; no method significantly outperformed the other. At first calving, 6 wk after first calving, and 200 d after first calving, random forest and naive Bayes had overlapping performance, and both machine-learning methods outperformed multiple logistic regression. Overall, naive Bayes has the highest average AUC at all decision points up to 200 d after first calving. Random forest had the highest AUC at 200 d after first calving. All methods obtained similar increases in survival in the practical scenario. Despite this, the methods appeared to predict the survival of individual heifers differently. All methods improved over time, but the changes in mean model outcomes for surviving and non-surviving animals differed by method. Furthermore, the correlations of individual predictions between methods ranged from r = 0.417 to r = 0.700; the lowest correlations were at first calving for all methods. In short, all 3 methods were able to predict survival at a population level, because all methods improved survival in a practical scenario. However, depending on the method used, predictions for individual animals were quite different between methods.


Subject(s)
Cattle/physiology , Genome/genetics , Machine Learning , Animals , Animals, Newborn , Bayes Theorem , Breeding , Cattle/genetics , Female , Lactation , Parturition/genetics , Pregnancy
7.
Soc Sci Med ; 198: 46-52, 2018 02.
Article in English | MEDLINE | ID: mdl-29275275

ABSTRACT

Public and scientific conceptions of identity are changing alongside advances in biotechnology, with important relevance to health and medicine. In particular, biological identity, once predominantly conceived as static (e.g., related to DNA, dental records, fingerprints) is now being recognized as dynamic or fluid, mirroring contemporary understandings of psychological and social identity. The dynamism of biological identity comes from the individual body's unique relationship with the world surrounding it, and therefore may best be described as biosocial. This paper reviews advances in scientific understandings of identity and presents a model that contrasts prior static approaches to biological identity from more recent dynamically-relational ones. This emerging viewpoint is of broad significance to health and medicine, particularly as medicine recognizes the significance of biography - i.e. the multiple, dense interactions imparted on a body across spatio-temporal dimensions - to phenotypic prediction, especially disease risk.


Subject(s)
Models, Theoretical , Social Identification , Geography , Humans , Spatial Analysis , Time
8.
Pak J Biol Sci ; 20(7): 343-349, 2017.
Article in English | MEDLINE | ID: mdl-29023066

ABSTRACT

BACKGROUND AND OBJECTIVE: The α-thalassemia is an inherited blood disorder affecting quality and quantity of hemoglobin. It caused mostly by deletion of one or two α-globin genes and characterized by deficient production of α-globin chain in hemoglobin leading from mild anemia to lethal. The α-globin gene with partial deletion could reduce chain production or produce abnormal chain. Its effect depends on mechanism of chain production affected. This study aimed to analyze the effect of partial deletion in α-globin gene influencing the mechanisms to produce functional α-globin chain in α-thalassemia cases. MATERIALS AND METHOD: The three mutant genes from genebank were selected and processed. The analysis performed in deleted sequences determination, mRNA sequences, protein structures and protein chains interaction to form hemoglobin by SWISS MODEL, CHIMERA and SABLE Polyview 2D. RESULTS: The result showed 76 amino acids deleted in one mutant α-globin gene (V00516.1). The mutation gave effect in every mechanism of the α-globin chain conformation and production. It affected protein conformation by losing over half the helical chains. It reduced the function completely, in which, disturb hemoglobin A (HbA) production with emergence of ß-sheets conformation. CONCLUSION: The analysis concluded that the protein produced by the α-globin gene with partial deletion lost its function and unable to form hemoglobin.


Subject(s)
Mutation , alpha-Globins/genetics , alpha-Thalassemia/genetics , Hemoglobins , Humans , RNA, Messenger , alpha-Thalassemia/diagnosis
9.
Pediatr Cardiol ; 37(5): 962-70, 2016 Jun.
Article in English | MEDLINE | ID: mdl-27041096

ABSTRACT

Long QT syndrome (LQTS) can cause syncope, ventricular fibrillation, and death. Recently, several disease-causing mutations in ion channel genes have been identified, and compound mutations have also been detected. It is unclear whether children who are carriers of compound mutations exhibit a more severe phenotype than those with single mutations. Although predicting phenotypic severity is clinically important, the availability of prediction tools for LQTS is unknown. To determine whether the severity of the LQTS phenotype can be predicted by the presence of compound mutations in children is needed. We detected 97 single mutations (Group S) and 13 compound mutations (Group C) between 1998 and 2012, age at diagnosis ranging 0-19 years old (median age is 9.0) and 18.0 years of follow-up period. The phenotypes and Kaplan-Meier event-free rates of the two groups were compared for cardiac events. This study investigated phenotypic severity in relation to the location of mutations in the protein sequence, which was analyzed using two sequence homology-based tools. In results, compound mutations in children were associated with a high incidence of syncope within the first decade (Group S: 32 % vs. Group C: 61 %), requiring an ICD in the second decade (Group S: 3 % vs. Group C: 56 %). Mortality in these patients was high within 5 years of birth (23 %). Phenotypic prediction tools correctly predicted the phenotypic severity in both Groups S and C, especially by using their coupling method. The coupling prediction method is useful in the initial evaluation of phenotypes both with single and compound mutations of LQTS patients. However, it should be noted that the compound mutation makes more severe phenotype.


Subject(s)
Long QT Syndrome , Mutation , Adolescent , Arrhythmias, Cardiac , Child , Child, Preschool , Humans , Infant , Infant, Newborn , KCNQ1 Potassium Channel , Phenotype , Sequence Homology , Young Adult
10.
Genetics ; 201(2): 779-93, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26253546

ABSTRACT

Prediction of complex traits using molecular genetic information is an active area in quantitative genetics research. In the postgenomic era, many types of -omic (e.g., transcriptomic, epigenomic, methylomic, and proteomic) data are becoming increasingly available. Therefore, evaluating the utility of this massive amount of information in prediction of complex traits is of interest. DNA methylation, the covalent change of a DNA molecule without affecting its underlying sequence, is one quantifiable form of epigenetic modification. We used methylation information for predicting plant height (PH) in Arabidopsis thaliana nonparametrically, using reproducing kernel Hilbert spaces (RKHS) regression. Also, we used different criteria for selecting smaller sets of probes, to assess how representative probes could be used in prediction instead of using all probes, which may lessen computational burden and lower experimental costs. Methylation information was used for describing epigenetic similarities between individuals through a kernel matrix, and the performance of predicting PH using this similarity matrix was reasonably good. The predictive correlation reached 0.53 and the same value was attained when only preselected probes were used for prediction. We created a kernel that mimics the genomic relationship matrix in genomic best linear unbiased prediction (G-BLUP) and estimated that, in this particular data set, epigenetic variation accounted for 65% of the phenotypic variance. Our results suggest that methylation information can be useful in whole-genome prediction of complex traits and that it may help to enhance understanding of complex traits when epigenetics is under examination.


Subject(s)
Arabidopsis/genetics , DNA Methylation/genetics , Epigenesis, Genetic/genetics , Quantitative Trait Loci/genetics , Arabidopsis/anatomy & histology , Phenotype , Proteomics
11.
Proteins ; 83(3): 428-35, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25546381

ABSTRACT

Recent studies have shown that the protein interface sites between individual monomeric units in biological assemblies are enriched in disease-associated non-synonymous single nucleotide variants (nsSNVs). To elucidate the mechanistic underpinning of this observation, we investigated the conformational dynamic properties of protein interface sites through a site-specific structural dynamic flexibility metric (dfi) for 333 multimeric protein assemblies. dfi measures the dynamic resilience of a single residue to perturbations that occurred in the rest of the protein structure and identifies sites contributing the most to functionally critical dynamics. Analysis of dfi profiles of over a thousand positions harboring variation revealed that amino acid residues at interfaces have lower average dfi (31%) than those present at non-interfaces (50%), which means that protein interfaces have less dynamic flexibility. Interestingly, interface sites with disease-associated nsSNVs have significantly lower average dfi (23%) as compared to those of neutral nsSNVs (42%), which directly relates structural dynamics to functional importance. We found that less conserved interface positions show much lower dfi for disease nsSNVs as compared to neutral nsSNVs. In this case, dfi is better as compared to the accessible surface area metric, which is based on the static protein structure. Overall, our proteome-wide conformational dynamic analysis indicates that certain interface sites play a critical role in functionally related dynamics (i.e., those with low dfi values), therefore mutations at those sites are more likely to be associated with disease.


Subject(s)
Disease/genetics , Polymorphism, Single Nucleotide , Protein Conformation , Proteins/chemistry , Proteins/genetics , Databases, Protein , Humans , Models, Molecular , Pliability , Polymorphism, Single Nucleotide/genetics , Polymorphism, Single Nucleotide/physiology , Proteomics , Surface Properties
12.
Anim Genet ; 45(1): 12-9, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24134470

ABSTRACT

Residual feed intake (RFI) has been adopted in Australia for the purpose of genetic improvement in feed efficiency in beef cattle. RFI is the difference between the observed feed intake of an animal and the predicted feed intake based on its size and growth rate over a test period. Gene expression of eight candidate genes (AHSG, GHR, GSTM1, INHBA, PCDH19, S100A10, SERPINI2 and SOD3), previously identified as differentially expressed between divergent lines of high- and low-RFI animals, was measured in an unselected population of 60 steers from the Angus Society Elite Progeny Test Program using quantitative real-time PCR. Results showed that the levels of gene expression were significantly correlated with RFI. The genes explain around 33.2% of the phenotypic variance in RFI, and prediction equations using the expression data are reasonably accurate estimators of RFI. The association of these genes with economically important traits, such as other feed efficiency-related traits and fat, growth and carcass traits, was investigated as well. The expression of these candidate genes was significantly correlated with feed conversion ratio and daily feed intake, which are highly associated with RFI, suggesting a functional role for these genes in modulating feed utilisation. The expression of these genes did not show any association with average daily gain, eye muscle area and carcass composition.


Subject(s)
Breeding , Cattle/growth & development , Cattle/genetics , Eating/genetics , Animal Feed , Animals , Body Composition/genetics , Male , Meat/analysis , Phenotype , Weight Gain/genetics
SELECTION OF CITATIONS
SEARCH DETAIL