Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Pathol ; 263(4-5): 454-465, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38845115

RESUMEN

Gastric cancer (GC) is one of the most heterogeneous tumors. However, research on normal tissue adjacent to the tumor (NAT) is very limited. We performed multi-regional omics sequencing on 150 samples to assess the genetic basis and immune microenvironment in NAT and matched primary tumor or lymph node metastases. NATs demonstrated different mutated genes compared with GC, and NAT genomes underwent independent evolution with low variant allele frequency. Mutation profiles were predominated by aging and smoking-associated signatures in NAT instead of signatures associated with genetic instability. Although the immune microenvironment within NATs shows substantial intra-patient heterogeneity, the proportion of shared TCR clones among NATs is five times higher than that of tumor regions. These findings support the notion that subclonal expansion is not pronounced in NATs. We also demonstrated remarkable intra-patient heterogeneity of GCs and revealed heterogeneity of focal amplification of CD274 (encoding PD-L1) that leads to differential expression. Finally, we identified that monoclonal seeding is predominant in GC, which is followed by metastasis-to-metastasis dissemination in individual lymph nodes. These results provide novel insights into GC carcinogenesis. © 2024 The Pathological Society of Great Britain and Ireland.


Asunto(s)
Antígeno B7-H1 , Mutación , Neoplasias Gástricas , Microambiente Tumoral , Humanos , Neoplasias Gástricas/genética , Neoplasias Gástricas/patología , Neoplasias Gástricas/inmunología , Microambiente Tumoral/genética , Microambiente Tumoral/inmunología , Antígeno B7-H1/genética , Heterogeneidad Genética , Metástasis Linfática , Masculino , Femenino , Anciano , Persona de Mediana Edad , Biomarcadores de Tumor/genética
2.
Nucleic Acids Res ; 51(D1): D1417-D1424, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36399488

RESUMEN

Deciphering the cell-type composition in the tumor immune microenvironment (TIME) can significantly increase the efficacy of cancer treatment and improve the prognosis of cancer. Such a task has benefited from microarrays and RNA sequencing technologies, which have been widely adopted in cancer studies, resulting in extensive expression profiles with clinical phenotypes across multiple cancers. Current state-of-the-art tools can infer cell-type composition from bulk expression profiles, providing the possibility of investigating the inter-heterogeneity and intra-heterogeneity of TIME across cancer types. Much can be gained from these tools in conjunction with a well-curated database of TIME cell-type composition data, accompanied by the corresponding clinical information. However, currently available databases fall short in data volume, multi-platform dataset integration, and tool integration. In this work, we introduce TIMEDB (https://timedb.deepomics.org), an online database for human tumor immune microenvironment cell-type composition estimated from bulk expression profiles. TIMEDB stores manually curated expression profiles, cell-type composition profiles, and the corresponding clinical information of a total of 39,706 samples from 546 datasets across 43 cancer types. TIMEDB comes readily equipped with online tools for automatic analysis and interactive visualization, and aims to serve the community as a convenient tool for investigating the human tumor microenvironment.


Asunto(s)
Neoplasias , Humanos , Bases de Datos Factuales , Neoplasias/genética , Neoplasias/inmunología , Análisis de Secuencia de ARN , Microambiente Tumoral/genética
3.
PLoS Genet ; 17(11): e1009910, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34780471

RESUMEN

Natural and artificial directional selections have resulted in significantly genetic and phenotypic differences across breeds in domestic animals. However, the molecular regulation of skeletal muscle diversity remains largely unknown. Here, we conducted transcriptome profiling of skeletal muscle across 27 time points, and performed whole-genome re-sequencing in Landrace (lean-type) and Tongcheng (obese-type) pigs. The transcription activity decreased with development, and the high-resolution transcriptome precisely captured the characterizations of skeletal muscle with distinct biological events in four developmental phases: Embryonic, Fetal, Neonatal, and Adult. A divergence in the developmental timing and asynchronous development between the two breeds was observed; Landrace showed a developmental lag and stronger abilities of myoblast proliferation and cell migration, whereas Tongcheng had higher ATP synthase activity in postnatal periods. The miR-24-3p driven network targeting insulin signaling pathway regulated glucose metabolism. Notably, integrated analysis suggested SATB2 and XLOC_036765 contributed to skeletal muscle diversity via regulating the myoblast migration and proliferation, respectively. Overall, our results provide insights into the molecular regulation of skeletal muscle development and diversity in mammals.


Asunto(s)
Proteínas de Unión a la Región de Fijación a la Matriz/genética , MicroARNs/genética , Músculo Esquelético/crecimiento & desarrollo , ARN Largo no Codificante/genética , Porcinos/embriología , Transcriptoma/genética , Animales , Diferenciación Celular/genética , Movimiento Celular/genética , Proliferación Celular/genética , Regulación del Desarrollo de la Expresión Génica/genética , Flujo Genético , Genoma/genética , Desarrollo de Músculos/genética , Músculo Esquelético/metabolismo , Mioblastos/metabolismo , ARN Largo no Codificante/metabolismo , Porcinos/genética , Porcinos/metabolismo
4.
BMC Bioinformatics ; 24(1): 40, 2023 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-36755234

RESUMEN

BACKGROUND: Distance functions are fundamental for evaluating the differences between gene expression profiles. Such a function would output a low value if the profiles are strongly correlated-either negatively or positively-and vice versa. One popular distance function is the absolute correlation distance, [Formula: see text], where [Formula: see text] is similarity measure, such as Pearson or Spearman correlation. However, the absolute correlation distance fails to fulfill the triangle inequality, which would have guaranteed better performance at vector quantization, allowed fast data localization, as well as accelerated data clustering. RESULTS: In this work, we propose [Formula: see text] as an alternative. We prove that [Formula: see text] satisfies the triangle inequality when [Formula: see text] represents Pearson correlation, Spearman correlation, or Cosine similarity. We show [Formula: see text] to be better than [Formula: see text], another variant of [Formula: see text] that satisfies the triangle inequality, both analytically as well as experimentally. We empirically compared [Formula: see text] with [Formula: see text] in gene clustering and sample clustering experiment by real-world biological data. The two distances performed similarly in both gene clustering and sample clustering in hierarchical clustering and PAM (partitioning around medoids) clustering. However, [Formula: see text] demonstrated more robust clustering. According to the bootstrap experiment, [Formula: see text] generated more robust sample pair partition more frequently (P-value [Formula: see text]). The statistics on the time a class "dissolved" also support the advantage of [Formula: see text] in robustness. CONCLUSION: [Formula: see text], as a variant of absolute correlation distance, satisfies the triangle inequality and is capable for more robust clustering.


Asunto(s)
Transcriptoma , Análisis por Conglomerados
5.
Microb Pathog ; 183: 106309, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37586463

RESUMEN

The etiology remains to be understood for endometriosis (EMS) which affected health negatively for 10% of reproductive-age women globally. Emerging studies found the associations of EMS with genital microbiota dysbiosis. However, the role of vaginal and cervical microbiota is not fully understood for Chinese women. This study recruited forty Chinese women (21 healthy women and 19 EMS patients) to analyze vaginal and cervical microbiota using 16S rRNA amplicon sequencing method. For both sites, there were no significant differences for distribution of microbial samples between control and EMS group, which was concordant with dominated Lactobacillus in both groups. In contrast, we observed accumulation of several low-abundance genera in vaginal and cervical microbiota of EMS patients, such as Fannyhessea, Prevotella, Streptococcus, Bifidobacterium, Veillonella, Megasphaera and Sneathia. Random forest analysis found that translocation of these genera had the significant importance in differentiating EMS patients from controls. In addition, cervix/vagina ratio of these genera also associated with EMS severity. And these genera had notable associations with ascending infection-related functional pathways, including flagellar assembly, bacterial motility proteins, bacterial toxins and epithelial cell signaling in Helicobacter pylori infection. These findings suggest that translocation of specific genera between vaginal and cervical sites play a role in EMS.


Asunto(s)
Endometriosis , Infecciones por Helicobacter , Helicobacter pylori , Humanos , Femenino , Cuello del Útero , Proyectos Piloto , Lactobacillus/genética , ARN Ribosómico 16S/genética , Helicobacter pylori/genética , Vagina/microbiología , Proteínas Bacterianas
6.
BMC Bioinformatics ; 23(Suppl 3): 426, 2022 Oct 14.
Artículo en Inglés | MEDLINE | ID: mdl-36241969

RESUMEN

BACKGROUND: Age estimation from panoramic radiographs is a fundamental task in forensic sciences. Previous age assessment studies mainly focused on juvenile rather than elderly populations (> 25 years old). Most proposed studies were statistical or scoring-based, requiring wet-lab experiments and professional skills, and suffering from low reliability. RESULT: Based on Soft Stagewise Regression Network (SSR-Net), we developed DENSEN to estimate the chronological age for both juvenile and older adults, based on their orthopantomograms (OPTs, also known as orthopantomographs, pantomograms, or panoramic radiographs). We collected 1903 clinical panoramic radiographs of individuals between 3 and 85 years old to train and validate the model. We evaluated the model by the mean absolute error (MAE) between the estimated age and ground truth. For different age groups, 3-11 (children), 12-18 (teens), 19-25 (young adults), and 25+ (adults), DENSEN produced MAEs as 0.6885, 0.7615, 1.3502, and 2.8770, respectively. Our results imply that the model works in situations where genders are unknown. Moreover, DENSEN has lower errors for the adult group (> 25 years) than other methods. The proposed model is memory compact, consuming about 1.0 MB of memory overhead. CONCLUSIONS: We introduced a novel deep learning approach DENSEN to estimate a subject's age from a panoramic radiograph for the first time. Our approach required less laboratory work compared with existing methods. The package we developed is an open-source tool and applies to all different age groups.


Asunto(s)
Laboratorios , Redes Neurales de la Computación , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Preescolar , Femenino , Humanos , Masculino , Persona de Mediana Edad , Radiografía Panorámica , Reproducibilidad de los Resultados , Adulto Joven
7.
BMC Genomics ; 23(Suppl 4): 827, 2022 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-36517735

RESUMEN

BACKGROUND: Inferring historical population admixture events yield essential insights in understanding a species demographic history. Methods are available to infer admixture events in demographic history with extant genetic data from multiple sources. Due to the deficiency in ancient population genetic data, there lacks a method for admixture inference from a single source. Pairwise Sequentially Markovian Coalescent (PSMC) estimates the historical effective population size from lineage genomes of a single individual, based on the distribution of the most recent common ancestor between the diploid's alleles. However, PSMC does not infer the admixture event. RESULTS: Here, we proposed eSMC, an extended PSMC model for admixture inference from a single source. We evaluated our model's performance on both in silico data and real data. We simulated population admixture events at an admixture time range from 5 kya to 100 kya (5 years/generation) with population admix ratio at 1:1, 2:1, 3:1, and 4:1, respectively. The root means the square error is [Formula: see text] kya for all experiments. Then we implemented our method to infer the historical admixture events in human, donkey and goat populations. The estimated admixture time for both Han and Tibetan individuals range from 60 kya to 80 kya (25 years/generation), while the estimated admixture time for the domesticated donkeys and the goats ranged from 40 kya to 60 kya (8 years/generation) and 40 kya to 100 kya (6 years/generation), respectively. The estimated admixture times were concordance to the time that domestication occurred in human history. CONCLUSION: Our eSMC effectively infers the time of the most recent admixture event in history from a single individual's genomics data. The source code of eSMC is hosted at https://github.com/zachary-zzc/eSMC .


Asunto(s)
Genética de Población , Genómica , Humanos , Densidad de Población , Alelos , Modelos Estadísticos
8.
Bioinformatics ; 36(22-23): 5499-5506, 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33346799

RESUMEN

MOTIVATION: The microbial community plays an essential role in human diseases and physiological activities. The functions of microbes can differ due to strain-level differences in the genome sequences. Shotgun metagenomic sequencing allows us to profile the strains in microbial communities practically. However, current methods are underdeveloped due to the highly similar sequences among strains. We observe that strains genotypes at the same single nucleotide variant (SNV) locus can be speculated by the genotype frequencies. Also, the variants in different loci covered by the same reads can provide evidence that they reside on the same strain. RESULTS: These insights inspire us to design PStrain, an optimization method that utilizes genotype frequencies and the reads which cover multiple SNV loci to profile strains iteratively based on SNVs in a set of MetaPhlAn2 marker genes. Compared to the state-of-art methods, PStrain, on average, improved the performance of inferring strains abundances and genotypes by 87.75% and 59.45%, respectively. We have applied the PStrain package to the dataset with two cohorts of colorectal cancer (CRC) and found that the sequences of Bacteroides coprocola strains are significantly different between CRC and control samples, which is the first time to report the potential role of B.coprocola in the gut microbiota of CRC. AVAILABILITYAND IMPLEMENTATION: https://github.com/wshuai294/PStrain. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
J Chem Inf Model ; 62(18): 4319-4328, 2022 09 26.
Artículo en Inglés | MEDLINE | ID: mdl-36097394

RESUMEN

The quantitative description between chemical reaction rates and nucleophilicity parameters plays a crucial role in organic chemistry. In this regard, the formula proposed by Mayr et al. and the constructed reactivity database are important representatives. However, the determination of Mayr's nucleophilicity parameter N often requires time-consuming experiments with reference electrophiles in the solvent. Several machine learning (ML)-based models have been proposed to realize the data-driven prediction of N in recent years. However, in addition to DFT-calculated electronic descriptors, most of them also use a set of artificially predefined structural descriptors as input, which may result in a biased representation of the nucleophile's structural information depending on descriptors' definition preference. Compared with traditional ML algorithms, graph neural networks (GNNs) can naturally take the molecule's structural information into account by applying the message passing technique. We herein proposed a SchNet-based GNN model that only takes the molecular conformation and solvent type as input. The model achieves a comparable performance to the previous benchmark study on 10-fold cross-validation of 894 data points (R2 = 0.91, RMSE = 2.25). To enhance the model's ability to capture the molecule's electronic information, some DFT-calculated parameters are then incorporated into the model via graph global features, and substantial improvement is achieved in the prediction precision (R2 = 0.95, RMSE = 1.63). These results demonstrate that both structural and electronic information are important for the prediction of N, and GNN can integrate these two kinds of information more effectively.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Aprendizaje Automático , Solventes
10.
J Neurosci Res ; 99(11): 2860-2873, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34510511

RESUMEN

The gut-brain axis provides a pathway for the interaction between gut microbiota and methamphetamine (METH) addiction. However, the gut microbial signatures during different phases of METH use remain unclear. In the present study, we established models of acquisition, extinction, and reinstatement of METH-induced conditioned place preference (CPP) in male mice and detected the gut microbiome profiles of the fecal samples at the three phases by 16S rRNA gene sequencing. Our results revealed that the richness of the gut microbiome increased following repeated METH administration, and it decreased after 4 weeks of abstinence. The microbial richness remained at a low level after one METH challenge at the reinstatement phase. The abundance of several genera including Prevotella, Bacteroides, and Lactobacillus differentially altered among phases of METH-induced CPP. The co-occurrence networks of the gut microbiome became weaker and more unstable during the development of METH-induced CPP at the extinction and reinstatement phases. Notably, the predicted gene functions of short-chain fatty acid metabolism, which were correlated with the abundance of Prevotella, Bacteroides, and Lactobacillus, were found differentially enriched among phases of METH-induced CPP. Our findings highlight a potential association between perturbations of the gut microbiome and different phases of METH use.


Asunto(s)
Estimulantes del Sistema Nervioso Central , Microbioma Gastrointestinal , Metanfetamina , Animales , Estimulantes del Sistema Nervioso Central/farmacología , Condicionamiento Operante , Extinción Psicológica , Masculino , Metanfetamina/farmacología , Ratones , ARN Ribosómico 16S/genética
11.
BMC Bioinformatics ; 21(Suppl 21): 570, 2020 Dec 28.
Artículo en Inglés | MEDLINE | ID: mdl-33371875

RESUMEN

BACKGROUND: Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding. RESULTS: In this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB. CONCLUSIONS: Our method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data.


Asunto(s)
Desequilibrio de Ligamiento , Secuenciación Completa del Genoma/métodos , Alelos , Animales , Secuenciación de Nucleótidos de Alto Rendimiento , Porcinos
12.
Microb Pathog ; 144: 104189, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-32278696

RESUMEN

BACKGROUND: The Mycoplasma pneumoniae(MP) and influenza virus are two common pathogens causing pediatric acute respiratory tract infection. Though emerging reports demonstrated imbalanced respiratory microbiota in respiratory infection, the respiratory microbiota differences between MP and influenza virus remained to be explored. METHODS: We collected paired nasopharyngeal(NP) and oropharyngeal(OP) microbial samples from 165 children, including 40 patients with MP pneumonia, 66 patients with influenza virus infection and 59 age-matched healthy children. RESULTS: The NP and OP microbial diversity decreased in MP infection and increased in influenza infection as compared to healthy children. The Staphylococcus dominated Mycoplasma pneumoniae pneumonia(MPP) patients' NP microbiota while five representative patterns remained in influenza patients. In OP microbiota, Streptococcus significantly enriched in MPP group and decreased in Influenza group. Decision tree analysis indicated that Ralstonia and Acidobacteria could discriminate microbial samples in healthy (59/67), MP (35/38) and Influenza groups (55/60) with high accuracy. CONCLUSIONS: This study revealed that dominant bacterial structure in the airway was niche- and disease-specific. It could facilitate the stratification of respiratory microbial samples with different infectious agents.


Asunto(s)
Gripe Humana/microbiología , Microbiota , Mycoplasma pneumoniae , Nasofaringe/microbiología , Orofaringe/microbiología , Neumonía por Mycoplasma/microbiología , Niño , ADN Bacteriano , Humanos , Gripe Humana/virología , Mycoplasma pneumoniae/patogenicidad , Orthomyxoviridae , Infecciones del Sistema Respiratorio/microbiología
13.
Dermatol Ther ; 33(6): e14215, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32827193

RESUMEN

The cutaneous microbiota responses to skin health as well as atopic dermatitis. To reveal the microbiota effect in atopic dermatitis children under therapy with topical corticosteroids and antibiotics. 59 atopic dermatitis patients were randomized to two treatment groups (by corticosteroids or combination therapy) in Beijing Children's Hospital. The lesion microbial samples were collected for 16S rDNA sequencing and bioinformatics analysis. After treatment, 57 patients recovered significantly. Though topical antibiotics application blocked the restoration of commensal Streptococcus, no remarkable differences of cutaneous microbiota were identified between the two groups along the treatment. In subject 1081, who received the combination therapy, the Streptococcus and Pseudomonas as well as Chryseobacterium increased dramatically. On the contrary, the Staphylococcus aureus decreased sharply in subject 1107 with topical corticosteroids treatment Our preliminary study suggested the necessity to consider cutaneous microbiota profile when prescribing antibiotics.


Asunto(s)
Dermatitis Atópica , Administración Tópica , Corticoesteroides/efectos adversos , Antibacterianos/efectos adversos , Niño , Dermatitis Atópica/diagnóstico , Dermatitis Atópica/tratamiento farmacológico , Humanos , Lactante , Prescripciones
14.
BMC Pediatr ; 20(1): 532, 2020 11 25.
Artículo en Inglés | MEDLINE | ID: mdl-33238955

RESUMEN

BACKGROUND: The initialization of the neonatal gut microbiota (GM) is affected by diverse factors and is associated with infant development and health outcomes. METHODS: In this study, we collected 207 faecal samples from 41 infants at 6 time points (1, 3, and 7 days and 1, 3, and 6 months after birth). The infants were assigned to four groups according to delivery mode (caesarean section (CS) or vaginal delivery (VD)) and feeding pattern (breastfeeding or formula milk). RESULTS: The meconium bacterial diversity was slightly higher in CS than in VD. Three GM patterns were identified, including Escherichia/Shigella-Streptococcus-dominated, Bifidobacterium-Escherichia/Shigella-dominated and Bifidobacterium-dominated patterns, and they gradually changed over time. In CS infants, Bifidobacterium was less abundant, and the delay in GM establishment could be partially restored by breastfeeding. The frequency of respiratory tract infection and diarrhoea consequently decreased. CONCLUSION: This study fills some gaps in the understanding of the restoration of the GM in CS towards that in VD.


Asunto(s)
Microbioma Gastrointestinal , Bifidobacterium , Lactancia Materna , Cesárea/efectos adversos , Niño , Heces , Femenino , Humanos , Lactante , Recién Nacido , Embarazo
15.
BMC Bioinformatics ; 20(Suppl 23): 702, 2019 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-31881904

RESUMEN

BACKGROUND: Horizontal Gene Transfer (HGT) refers to the transfer of genetic materials between organisms through mechanisms other than parent-offspring inheritance. HGTs may affect human health through a large number of microorganisms, especially the gut microbiomes which the human body harbors. The transferred segments may lead to complicated local genome structural variations. Details of the local genome structure can elucidate the effects of the HGTs. RESULTS: In this work, we propose a graph-based method to reconstruct the local strains from the gut metagenomics data at the HGT sites. The method is implemented in a package named LEMON. The simulated results indicate that the method can identify transferred segments accurately on reference sequences of the microbiome. Simulation results illustrate that LEMON could recover local strains with complicated structure variation. Furthermore, the gene fusion points detected in real data near HGT breakpoints validate the accuracy of LEMON. Some strains reconstructed by LEMON have a replication time profile with lower standard error, which demonstrates HGT events recovered by LEMON is reliable. CONCLUSIONS: Through LEMON we could reconstruct the sequence structure of bacteria, which harbors HGT events. This helps us to study gene flow among different microbial species.


Asunto(s)
Tracto Gastrointestinal/microbiología , Transferencia de Gen Horizontal/genética , Metagenómica , Programas Informáticos , Bacterias/genética , Simulación por Computador , Bases de Datos Genéticas , Fusión Génica , Humanos , Transcriptoma/genética
16.
BMC Bioinformatics ; 20(Suppl 23): 652, 2019 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-31881982

RESUMEN

BACKGROUND: Identifying splice sites is a necessary step to analyze the location and structure of genes. Two dinucleotides, GT and AG, are highly frequent on splice sites, and many other patterns are also on splice sites with important biological functions. Meanwhile, the dinucleotides occur frequently at the sequences without splice sites, which makes the prediction prone to generate false positives. Most existing tools select all the sequences with the two dimers and then focus on distinguishing the true splice sites from those pseudo ones. Such an approach will lead to a decrease in false positives; however, it will result in non-canonical splice sites missing. RESULT: We have designed SpliceFinder based on convolutional neural network (CNN) to predict splice sites. To achieve the ab initio prediction, we used human genomic data to train our neural network. An iterative approach is adopted to reconstruct the dataset, which tackles the data unbalance problem and forces the model to learn more features of splice sites. The proposed CNN obtains the classification accuracy of 90.25%, which is 10% higher than the existing algorithms. The method outperforms other existing methods in terms of area under receiver operating characteristics (AUC), recall, precision, and F1 score. Furthermore, SpliceFinder can find the exact position of splice sites on long genomic sequences with a sliding window. Compared with other state-of-the-art splice site prediction tools, SpliceFinder generates results in about half lower false positive while keeping recall higher than 0.8. Also, SpliceFinder captures the non-canonical splice sites. In addition, SpliceFinder performs well on the genomic sequences of Drosophila melanogaster, Mus musculus, Rattus, and Danio rerio without retraining. CONCLUSION: Based on CNN, we have proposed a new ab initio splice site prediction tool, SpliceFinder, which generates less false positives and can detect non-canonical splice sites. Additionally, SpliceFinder is transferable to other species without retraining. The source code and additional materials are available at https://gitlab.deepomics.org/wangruohan/SpliceFinder.


Asunto(s)
Biología Computacional/métodos , Redes Neurales de la Computación , Sitios de Empalme de ARN/genética , Programas Informáticos , Algoritmos , Animales , Secuencia de Bases , Bases de Datos Genéticas , Genoma , Humanos , Empalme del ARN/genética
17.
BMC Bioinformatics ; 20(Suppl 24): 671, 2019 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-31861983

RESUMEN

BACKGROUND: Short tandem repeats (STRs) serve as genetic markers in forensic scenes due to their high polymorphism in eukaryotic genomes. A variety of STRs profiling systems have been developed for species including human, dog, cat, cattle, etc. Maintaining these systems simultaneously can be costly. These mammals share many high similar regions along their genomes. With the availability of the massive amount of the whole genomics data of these species, it is possible to develop a unified STR profiling system. In this study, our objective is to propose and develop a unified set of STR loci that could be simultaneously applied to multiple species. RESULT: To find a unified STR set, we collected the whole genome sequence data of the concerned species and mapped them to the human genome reference. Then we extracted the STR loci across the species. From these loci, we proposed an algorithm which selected a subset of loci by incorporating the optimized combined power of discrimination. Our results show that the unified set of loci have high combined power of discrimination, >1-10-9, for both individual species and the mixed population, as well as the random-match probability, <10-7 for all the involved species, indicating that the identified set of STR loci could be applied to multiple species. CONCLUSIONS: We identified a set of STR loci which shared by multiple species. It implies that a unified STR profiling system is possible for these species under the forensic scenes. The system can be applied to the individual identification or paternal test of each of the ten common species which are Sus scrofa (pig), Bos taurus (cattle), Capra hircus (goat), Equus caballus (horse), Canis lupus familiaris (dog), Felis catus (cat), Ovis aries (sheep), Oryctolagus cuniculus (rabbit), and Bos grunniens (yak), and Homo sapiens (human). Our loci selection algorithm employed a greedy approach. The algorithm can generate the loci under different forensic parameters and for a specific combination of species.


Asunto(s)
Secuenciación Completa del Genoma , Algoritmos , Animales , Genoma , Humanos
18.
BMC Genomics ; 20(Suppl 2): 186, 2019 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-30967119

RESUMEN

BACKGROUND: Recent advances in genome analysis have established that chromatin has preferred 3D conformations, which bring distant loci into contact. Identifying these contacts is important for us to understand possible interactions between these loci. This has motivated the creation of the Hi-C technology, which detects long-range chromosomal interactions. Distance geometry-based algorithms, such as ChromSDE and ShRec3D, have been able to utilize Hi-C data to infer 3D chromosomal structures. However, these algorithms, being matrix-based, are space- and time-consuming on very large datasets. A human genome of 100 kilobase resolution would involve ∼30,000 loci, requiring gigabytes just in storing the matrices. RESULTS: We propose a succinct representation of the distance matrices which tremendously reduces the space requirement. We give a complete solution, called SuperRec, for the inference of chromosomal structures from Hi-C data, through iterative solving the large-scale weighted multidimensional scaling problem. CONCLUSIONS: SuperRec runs faster than earlier systems without compromising on result accuracy. The SuperRec package can be obtained from http://www.cs.cityu.edu.hk/~shuaicli/SuperRec .


Asunto(s)
Algoritmos , Cromatina/química , Cromosomas Humanos/química , Biología Computacional/métodos , Genoma Humano , Cromatina/genética , Cromosomas Humanos/genética , Simulación por Computador , Humanos , Modelos Moleculares , Conformación de Ácido Nucleico
19.
Am J Hum Genet ; 98(2): 256-74, 2016 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-26833333

RESUMEN

Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs.


Asunto(s)
Carcinoma de Células Escamosas/genética , Neoplasias Esofágicas/genética , Estudios de Asociación Genética/métodos , Variación Genética , Línea Celular , Ciclina D1/genética , Variaciones en el Número de Copia de ADN , Receptores ErbB/genética , Carcinoma de Células Escamosas de Esófago , Eliminación de Gen , Reordenamiento Génico , Genes p16 , Genoma Humano , Genómica , Humanos , Hibridación Fluorescente in Situ , Receptor ErbB-2/genética , Receptor Tipo 1 de Factor de Crecimiento de Fibroblastos/genética , Receptor Notch1/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Translocación Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA