Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Leuk Res ; 141: 107499, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38640632

RESUMEN

Acute myeloid leukemia (AML) is a hematopoietic malignancy with a high relapse rate and progressive drug resistance. Alternative polyadenylation (APA) contributes to post-transcriptional dysregulation, but little is known about the association between APA and AML. The APA quantitative trait locus (apaQTL) is a powerful method to investigate the relationship between APA and single nucleotide polymorphisms (SNPs). We quantified APA usage in 195 Chinese AML patients and identified 4922 cis-apaQTLs related to 1875 genes, most of which were newly reported. Cis-apaQTLs may modulate the APA selection of 115 genes through poly(A) signals. Colocalization analysis revealed that cis-apaQTLs colocalized with cis-eQTLs may regulate gene expression by affecting miRNA binding sites or RNA secondary structures. We discovered 207 cis-apaQTLs related to AML risk by comparing genotype frequency with the East Asian healthy controls from the 1000 Genomes Project. Genes with cis-apaQTLs were associated with hematological phenotypes and tumor incidence according to the PHARMGKB and MGI databases. Collectively, we profiled an atlas of cis-apaQTLs in Asian AML patients and found their association with APA selection, gene expression, AML risk, and complex traits. Cis-apaQTLs may provide insights into the regulatory mechanisms related to APA in AML occurrence, progression, and prognosis.


Asunto(s)
Leucemia Mieloide Aguda , Poliadenilación , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Humanos , Leucemia Mieloide Aguda/genética , Masculino , Femenino , Persona de Mediana Edad , Predisposición Genética a la Enfermedad , Adulto , Regulación Leucémica de la Expresión Génica , Anciano , Pueblo Asiatico/genética
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38340093

RESUMEN

Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.


Asunto(s)
Algoritmos , Variaciones en el Número de Copia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
3.
Ann Med ; 55(1): 146-154, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-36519234

RESUMEN

OBJECTIVE: To assess the trends in non-melanoma skin cancer (NMSC) incidence in Hong Kong from 1990 to 2019 and the associations of age, calendar period, and birth cohort, to make projections to 2030, and to examine the drivers of NMSC incidence. METHODS: We assessed the age, calendar period, and birth cohort effects of NMSC incidence in Hong Kong between 1990 and 2019 using an age-period-cohort model. Using Bayesian age-period-cohort analysis with integrated nested Laplace approximations, we projected the incidence of NMSC in Hong Kong to 2030. RESULTS: From 1990 to 2019, the age-standardized incidence rate of NMSC increased from 6.7 per 100,000 population to 8.6 per 100,000 population in men and from 5.4 per 100,000 to 5.9 per 100,000 population in women, among the 19,568 patients in the study (9812 male patients [50.14%]). The annual net drift was 2.00% (95% confidence interval [CI]: 1.50-2.50%) for men and 1.53% (95% CI: 0.95-2.11%) for women. Local drifts increased for both sexes above the 35-39-year age group. The period and cohort risk of developing NMSC tended to rise but slowed gradually in the most recent period and post-1975 birth cohort. From 2019 to 2030, it is projected that the number of newly diagnosed NMSC cases in Hong Kong will increase from 564 to 829 in men and from 517 to 863 in women. Population aging, population growth, and epidemiologic changes contributed to the increase in incident NMSCs, with population aging being the most significant contributor. CONCLUSION: The slowing of the period and cohort effects suggests that the rising incidence of NMSC is partly attributable to increased awareness and diagnosis. The increasing prevalence of NMSC among the elderly and an aging population will significantly impact the clinical workload associated with NMSC for the foreseeable future.


Asunto(s)
Neoplasias Cutáneas , Humanos , Masculino , Femenino , Anciano , Incidencia , Hong Kong/epidemiología , Teorema de Bayes , Neoplasias Cutáneas/epidemiología
4.
Nanoscale ; 14(45): 16787-16796, 2022 Nov 24.
Artículo en Inglés | MEDLINE | ID: mdl-36342384

RESUMEN

Reticular 2-bromo-2-methylpropionic acid (BMPA) modified Fe3O4 nanoparticle aggregates with novel acoustic properties, namely the photoexcited audible sound (PEAS) effect, were prepared by a laser-induced irradiation method. Their morphology was observed by Lorentz transmission electron microscopy. Their chemical structure, crystal composition, and magnetic properties were analyzed using infrared spectroscopy, X-ray diffraction, and a magnetic property measurement instrument, respectively. It is found that the nanoparticle aggregates appeared reticular, with the size of the BMPA modified Fe3O4 nanoparticles being 5.5 ± 0.4 nm. The saturation magnetization values of the BMPA modified Fe3O4 nanoparticles and associated aggregates were 59.99 and 63.51 emu g-1, respectively. The reticular BMPA modified nanoparticle aggregates can produce strong PEAS signals under very weak laser irradiation with great stability and repeatability. The emitted PEAS signals possessed strong specificity, suitable decay time and a large amount of information under a very weak laser power and can be detected by the human ear without any special detection equipment. Subsequently, a heat transfer model was constructed for the simulation of the possible mechanism of the PEAS effect using COMSOL software. The simulation results showed that the aggregates have a fast heat transfer rate with the temperature increasing to 480 K in only 0.25 s and 600 K in 5 s, respectively, meeting the requirements of the vapor explosion mechanism. Therefore, we realized that the possible mechanism of the PEAS effect of the reticular BMPA modified Fe3O4 nanoparticle aggregates is laser-induced fast heat transfer and vapor explosion in situ, resulting in the observed audible sound phenomenon. This novel PEAS effect has potential for application in materials science, biomedical engineering and other fields.

5.
PLoS One ; 17(8): e0271596, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35925979

RESUMEN

Atrial fibrillation (AF) is a typical category of arrhythmia. Clinical diagnosis of AF is based on the detection of abnormal R-R intervals (RRIs) with an electrocardiogram (ECG). Previous studies considered this detection problem as a classification problem and focused on extracting a number of features. In this study we demonstrate that instead of using any specific numerical characteristic as the input feature, the probability density of RRIs from ECG conserves comprehensive statistical information; hence, is a natural and efficient input feature for AF detection. Incorporated with a support vector machine as the classifier, results on the MIT-BIH database indicates that the proposed method is a simple and accurate approach for AF detection in terms of accuracy, sensitivity, and specificity.


Asunto(s)
Fibrilación Atrial , Algoritmos , Fibrilación Atrial/diagnóstico , Bases de Datos Factuales , Electrocardiografía/métodos , Humanos , Máquina de Vectores de Soporte
6.
Front Genet ; 13: 928862, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36035147

RESUMEN

Background: Hematologic malignancies, such as acute promyelocytic leukemia (APL) and acute myeloid leukemia (AML), are cancers that start in blood-forming tissues and can affect the blood, bone marrow, and lymph nodes. They are often caused by genetic and molecular alterations such as mutations and gene expression changes. Alternative polyadenylation (APA) is a post-transcriptional process that regulates gene expression, and dysregulation of APA contributes to hematological malignancies. RNA-sequencing-based bioinformatic methods can identify APA sites and quantify APA usages as molecular indexes to study APA roles in disease development, diagnosis, and treatment. Unfortunately, APA data pre-processing, analysis, and visualization are time-consuming, inconsistent, and laborious. A comprehensive, user-friendly tool will greatly simplify processes for APA feature screening and mining. Results: Here, we present APAview, a web-based platform to explore APA features in hematological cancers and perform APA statistical analysis. APAview server runs on Python3 with a Flask framework and a Jinja2 templating engine. For visualization, APAview client is built on Bootstrap and Plotly. Multimodal data, such as APA quantified by QAPA/DaPars, gene expression data, and clinical information, can be uploaded to APAview and analyzed interactively. Correlation, survival, and differential analyses among user-defined groups can be performed via the web interface. Using APAview, we explored APA features in two hematological cancers, APL and AML. APAview can also be applied to other diseases by uploading different experimental data.

7.
Artículo en Inglés | MEDLINE | ID: mdl-31180897

RESUMEN

Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Variaciones en el Número de Copia de ADN/genética , Modelos Estadísticos , Bases de Datos Genéticas , Árboles de Decisión , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos
8.
IEEE/ACM Trans Comput Biol Bioinform ; 18(5): 1893-1901, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-31751246

RESUMEN

Next generation sequencing technology has led to the development of methods for the detection of novel sequence insertions (nsINS). Multiple signatures from short reads are usually extracted to improve nsINS detection performance. However, characterization of nsINSs larger than the mean insert size is still challenging. This article presents a new method, ERINS, to detect nsINS contents and genotypes of full spectrum range size. It integrates the features of structural variations and mapping states of split reads to find nsINS breakpoints, and then adopts a left-most mapping strategy to infer nsINS content by iteratively extending the standard reference at each breakpoint. Finally, it realigns all reads to the extended reference and infers nsINS genotypes through statistical testing on read counts. We test and validate the performance of ERINS on simulation and real sequencing datasets. The simulation experimental results demonstrate that it outperforms several peer methods with respect to sensitivity and precision. The real data application indicates that ERINS obtains high consistent results with those of previously reported and detects nsINSs over 200 base pairs that many other methods fail. In conclusion, ERINS can be used as a supplement to existing tools and will become a routine approach for characterizing nsINSs.


Asunto(s)
Algoritmos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación INDEL/genética , Análisis de Secuencia de ADN/métodos , Genoma Humano/genética , Humanos
9.
BMC Bioinformatics ; 21(1): 97, 2020 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-32138645

RESUMEN

BACKGROUND: With the rapid development of whole exome sequencing (WES), an increasing number of tools are being proposed for copy number variation (CNV) detection based on this technique. However, no comprehensive guide is available for the use of these tools in clinical settings, which renders them inapplicable in practice. To resolve this problem, in this study, we evaluated the performances of four WES-based CNV tools, and established a guideline for the recommendation of a suitable tool according to the application requirements. RESULTS: In this study, first, we selected four WES-based CNV detection tools: CoNIFER, cn.MOPS, CNVkit and exomeCopy. Then, we evaluated their performances in terms of three aspects: sensitivity and specificity, overlapping consistency and computational costs. From this evaluation, we obtained four main results: (1) The sensitivity increases and subsequently stabilizes as the coverage or CNV size increases, while the specificity decreases. (2) CoNIFER performs better for CNV insertions than for CNV deletions, while the remaining tools exhibit the opposite trend. (3) CoNIFER, cn.MOPS and CNVkit realize satisfactory overlapping consistency, which indicates their results are trustworthy. (4) CoNIFER has the best space complexity and cn.MOPS has the best time complexity among these four tools. Finally, we established a guideline for tools' usage according to these results. CONCLUSION: No available tool performs excellently under all conditions; however, some tools perform excellently in some scenarios. Users can obtain a CNV tool recommendation from our paper according to the targeted CNV size, the CNV type or computational costs of their projects, as presented in Table 1, which is helpful even for users with limited knowledge of computer science.


Asunto(s)
Variaciones en el Número de Copia de ADN , Secuenciación del Exoma/métodos , Algoritmos , Exoma/genética , Humanos , Programas Informáticos/economía
10.
IEEE/ACM Trans Comput Biol Bioinform ; 17(3): 1082-1091, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-30334804

RESUMEN

Structural variation accounts for a major fraction of mutations in the human genome and confers susceptibility to complex diseases. Next generation sequencing along with the rapid development of computational methods provides a cost-effective procedure to detect such variations. Simulation of structural variations and sequencing reads with real characteristics is essential for benchmarking the computational methods. Here, we develop a new program, SVSR, to simulate five types of structural variations (indels, tandem duplication, CNVs, inversions, and translocations) and SNPs for the human genome and to generate sequencing reads with features from popular platforms (Illumina, SOLiD, 454, and Ion Torrent). We adopt a selection model trained from real data to predict copy number states, starting from the first site of a particular genome to the end. Furthermore, we utilize references of microbial genomes to produce insertion fragments and design probabilistic models to imitate inversions and translocations. Moreover, we create platform-specific errors and base quality profiles to generate normal, tumor, or normal-tumor mixture reads. Experimental results show that SVSR could capture more features that are realistic and generate datasets with satisfactory quality scores. SVSR is able to evaluate the performance of structural variation detection methods and guide the development of new computational methods.


Asunto(s)
Variación Estructural del Genoma/genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Genoma Humano/genética , Humanos , Mutación INDEL/genética , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN/métodos
11.
IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1141-1153, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-30489272

RESUMEN

Characterizing copy number variations (CNVs) from sequenced genomes is a both feasible and cost-effective way to search for driver genes in cancer diagnosis. A number of existing algorithms for CNV detection only explored part of the features underlying sequence data and copy number structures, resulting in limited performance. Here, we describe CONDEL, a method for detecting CNVs from single tumor samples using high-throughput sequence data. CONDEL utilizes a novel statistic in combination with a peel-off scheme to assess the statistical significance of genome bins, and adopts a Bayesian approach to infer copy number gains, losses, and deletion zygosity based on statistical mixture models. We compare CONDEL to six peer methods on a large number of simulation datasets, showing improved performance in terms of true positive and false positive rates, and further validate CONDEL on three real datasets derived from the 1000 Genomes Project and the EGA archive. CONDEL obtained higher consistent results in comparison with other three single sample-based methods, and exclusively identified a number of CNVs that were previously associated with cancers. We conclude that CONDEL is a powerful tool for detecting copy number variations on single tumor samples even if these are sequenced at low-coverage.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Eliminación de Gen , Genes Relacionados con las Neoplasias/genética , Técnicas de Genotipaje/métodos , Humanos , Modelos Estadísticos
12.
Biomed Res Int ; 2019: 8420547, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31080831

RESUMEN

Next generation sequencing is an emerging technology that has been widely used in the detection of genomic variants. However, since its depth of coverage, a main signature used for variant calling, is affected greatly by biases such as GC content and mappability, some callings are false positives. In this study, we utilized paired-end read mapping, another signature that is not affected by the aforementioned biases, to detect false-positive deletions in the database of genomic variants. We first identified 1923 suspicious variants that may be false positives and then conducted validation studies on each suspicious variant, which detected 583 false-positive deletions. Finally we analysed the distribution of these false positives by chromosome, sample, and size. Hopefully, incorrect documentation and annotations in downstream studies can be avoided by correcting these false positives in public repositories.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Humano/genética , Genómica , Eliminación de Secuencia , Composición de Base , Mapeo Cromosómico , Reacciones Falso Positivas , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN/métodos
13.
Meas Sci Technol ; 29(3)2018 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30250357

RESUMEN

Interpretation of experimental data from micro- and nano-scale indentation testing is highly dependent on the constitutive model selected to relate measurements to mechanical properties. The Kelvin-Voigt Fractional Derivative model (KVFD) offers a compact set of viscoelastic features appropriate for characterizing soft biological materials. This paper provides a set of KVFD solutions for converting indentation testing data acquired for different geometries and scales into viscoelastic properties of soft materials. These solutions, which are mostly in closed-form, apply to ramp-hold relaxation, load-unload and ramp-load creep-testing protocols. We report on applications of these model solutions to macro- and nano-indentation testing of hydrogels, gastric cancer cells and ex vivo breast tissue samples using an Atomic Force Microscope (AFM). We also applied KVFD models to clinical ultrasonic breast data using a compression plate as required for elasticity imaging. Together the results show that KVFD models fit a broad range of experimental data with a correlation coefficient typically R2 > 0.99. For hydrogel samples, estimation of KVFD model parameters from test data using spherical indentation versus plate compression as well as ramp relaxation versus load-unload compression all agree within one standard deviation. Results from measurements made using macro- and nano-scale indentation agree in trend. For gastric cell and ex vivo breast tissue measurements, KVFD moduli are, respectively, 1/3 - 1/2 and 1/6 of the elasticity modulus found from the Sneddon model. In vivo breast tissue measurements yield model parameters consistent with literature results. The consistency of results found for a broad range of experimental parameters suggest the KVFD model is a reliable tool for exploring intrinsic features of the cell/tissue microenvironments.

14.
Artículo en Inglés | MEDLINE | ID: mdl-28796605

RESUMEN

Pulse-inversion subharmonic (PISH) imaging can display information relating to pure cavitation bubbles while excluding that of tissue. Although plane-wave-based ultrafast active cavitation imaging (UACI) can monitor the transient activities of cavitation bubbles, its resolution and cavitation-to-tissue ratio (CTR) are barely satisfactory but can be significantly improved by introducing eigenspace-based (ESB) adaptive beamforming. PISH and UACI are a natural combination for imaging of pure cavitation activity in tissue; however, it raises two problems: 1) the ESB beamforming is hard to implement in real time due to the enormous amount of computation associated with the covariance matrix inversion and eigendecomposition and 2) the narrowband characteristic of the subharmonic filter will incur a drastic degradation in resolution. Thus, in order to jointly address these two problems, we propose a new PISH-UACI method using novel fast ESB (F-ESB) beamforming and cavitation deconvolution for nonlinear signals. This method greatly reduces the computational complexity by using F-ESB beamforming through dimensionality reduction based on principal component analysis, while maintaining the high quality of ESB beamforming. The degraded resolution is recovered using cavitation deconvolution through a modified convolution model and compressive deconvolution. Both simulations and in vitro experiments were performed to verify the effectiveness of the proposed method. Compared with the ESB-based PISH-UACI, the entire computation of our proposed approach was reduced by 99%, while the axial resolution gain and CTR were increased by 3 times and 2 dB, respectively, confirming that satisfactory performance can be obtained for monitoring pure cavitation bubbles in tissue erosion.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Procesamiento de Señales Asistido por Computador , Ultrasonografía/métodos , Algoritmos , Animales , Hígado/diagnóstico por imagen , Músculos/diagnóstico por imagen , Fantasmas de Imagen , Porcinos
15.
Artículo en Inglés | MEDLINE | ID: mdl-27913325

RESUMEN

The axial resolution of ultrasonic imaging is confined by the temporal width of acoustic pulse generated by the transducer, which has a limited bandwidth. Deconvolution can eliminate this effect and, therefore, improve the resolution. However, most ultrasonic imaging methods perform deconvolution scan line by scan line, and therefore the information embedded within the neighbor scan lines is unexplored, especially for those materials with layered structures such as blood vessels. In this paper, a joint sparse representation model is proposed to increase the axial resolution of ultrasonic imaging. The proposed model combines the sparse deconvolution along the axial direction with a sparsity-favoring constraint along the lateral direction. Since the constraint explores the information embedded within neighbor scan lines by connecting nearby pixels in the ultrasound image, the axial resolution of the image improves after deconvolution. The results on simulated data showed that the proposed method can increase resolution and discover layered structure. Moreover, the results on real data showed that the proposed method can measure carotid intima-media thickness automatically with good quality ( 0.56±0.03 versus 0.60±0.06 mm manually).


Asunto(s)
Arteria Carótida Común/diagnóstico por imagen , Grosor Intima-Media Carotídeo , Procesamiento de Imagen Asistido por Computador/métodos , Humanos
16.
Signal Processing ; 127: 239-246, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27346902

RESUMEN

This paper studies the intrinsic connection between a generalized LASSO and a basic LASSO formulation. The former is the extended version of the latter by introducing a regularization matrix to the coefficients. We show that when the regularization matrix is even- or under-determined with full rank conditions, the generalized LASSO can be transformed into the LASSO form via the Lagrangian framework. In addition, we show that some published results of LASSO can be extended to the generalized LASSO, and some variants of LASSO, e.g., robust LASSO, can be rewritten into the generalized LASSO form and hence can be transformed into basic LASSO. Based on this connection, many existing results concerning LASSO, e.g., efficient LASSO solvers, can be used for generalized LASSO.

17.
IEEE Trans Biomed Eng ; 63(3): 496-505, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26258935

RESUMEN

GOAL: Whole-exome sequencing provides a more cost-effective way than whole-genome sequencing for detecting genetic variants, such as copy number variations (CNVs). Although a number of approaches have been proposed to detect CNVs from whole-genome sequencing, a direct adoption of these approaches to whole-exome sequencing will often fail because exons are separately located along a genome. Therefore, an appropriate method is needed to target the specific features of exome sequencing data. METHODS: In this paper, a novel sparse model based method is proposed to discover CNVs from multiple exome sequencing data. First, exome sequencing data are represented with a penalized matrix approximation, and technical variability and random sequencing errors are assumed to follow a generalized Gaussian distribution. Second, an iteratively reweighted least squares algorithm is used to estimate the solution. RESULTS: The method is tested and validated on both synthetic and real data, and compared with other approaches including CoNIFER, XHMM, and cn.MOPS. The test demonstrates that the proposed method outperform other approaches. CONCLUSION: The proposed sparse model can detect CNVs from exome sequencing data with high power and precision. Significance: Sparse model can target the specific features of exome sequencing data. The software codes are freely available at http://www.tulane.edu/ wyp/software/Exon_CNV.m.


Asunto(s)
Biología Computacional/métodos , Variaciones en el Número de Copia de ADN/genética , Exoma/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Humanos , Reproducibilidad de los Resultados
18.
Artículo en Inglés | MEDLINE | ID: mdl-26737773

RESUMEN

To analyze the next generation sequencing data, the so-called read depth signal is often segmented with standard segmentation tools. However, these tools usually assume the signal to be a piecewise constant signal and contaminated with zero mean Gaussian noise, and therefore modeling error occurs. This paper models the read depth signal with piecewise Poisson distribution, which is more appropriate to the next generation sequencing mechanism. Based on the proposed model, an opti- mal dynamic programming algorithm with parallel computing is proposed to segment the piecewise signal, and furthermore detect the copy number variation.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , ADN/análisis , ADN/genética , Variaciones en el Número de Copia de ADN , Genoma , Humanos , Distribución Normal , Distribución de Poisson , Análisis de Secuencia de ADN
19.
J Bioinform Comput Biol ; 12(4): 1450021, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25152046

RESUMEN

Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genética de Población/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis por Conglomerados , Femenino , Genoma , Genoma Humano , Homocigoto , Humanos , Masculino , Modelos Genéticos , Linaje
20.
IEEE Trans Biomed Eng ; 61(3): 928-37, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24557694

RESUMEN

Common copy number variations (CNVs) are small regions of genomic variations at the same loci across multiple samples, which can be detected with high resolution from next-generation sequencing (NGS) technique. Multiple sequencing data samples are often available from genomic studies; examples include sequences from multiple platforms and sequences from multiple individuals. By integrating complementary information from multiple data samples, detection power can be potentially improved. However, most of current CNV detection methods often process an individual sequence sample, or two samples in an abnormal versus matched normal study; researches on detecting common CNVs across multiple samples have been very limited but are much needed. In this paper, we propose a novel method to detect common CNVs from multiple sequencing samples by exploiting the concurrency of genomic variations in read depth signals derived from multiple NGS data. We use a penalized sparse regression model to fit multiple read depth profiles, based on which common CNV identification is formulated as a change-point detection problem. Finally, we validate the proposed method on both simulation and real data, showing that it can give both higher detection power and better break point estimation over several published CNV detection methods.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Proyecto Mapa de Haplotipos , Humanos , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA