Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
BMC Bioinformatics ; 12: 450, 2011 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-22093447

RESUMEN

BACKGROUND: Successfully modeling high-dimensional data involving thousands of variables is challenging. This is especially true for gene expression profiling experiments, given the large number of genes involved and the small number of samples available. Random Forests (RF) is a popular and widely used approach to feature selection for such "small n, large p problems." However, Random Forests suffers from instability, especially in the presence of noisy and/or unbalanced inputs. RESULTS: We present RKNN-FS, an innovative feature selection procedure for "small n, large p problems." RKNN-FS is based on Random KNN (RKNN), a novel generalization of traditional nearest-neighbor modeling. RKNN consists of an ensemble of base k-nearest neighbor models, each constructed from a random subset of the input variables. To rank the importance of the variables, we define a criterion on the RKNN framework, using the notion of support. A two-stage backward model selection method is then developed based on this criterion. Empirical results on microarray data sets with thousands of variables and relatively few samples show that RKNN-FS is an effective feature selection approach for high-dimensional data. RKNN is similar to Random Forests in terms of classification accuracy without feature selection. However, RKNN provides much better classification accuracy than RF when each method incorporates a feature-selection step. Our results show that RKNN is significantly more stable and more robust than Random Forests for feature selection when the input data are noisy and/or unbalanced. Further, RKNN-FS is much faster than the Random Forests feature selection method (RF-FS), especially for large scale problems, involving thousands of variables and multiple classes. CONCLUSIONS: Given the superiority of Random KNN in classification performance when compared with Random Forests, RKNN-FS's simplicity and ease of implementation, and its superiority in speed and stability, we propose RKNN-FS as a faster and more stable alternative to Random Forests in classification problems involving feature selection for high-dimensional datasets.


Asunto(s)
Perfilación de la Expresión Génica , Modelos Genéticos , Neoplasias/genética , Análisis por Conglomerados , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos
2.
J Biomed Inform ; 43(1): 51-9, 2010 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-19699317

RESUMEN

Problems of haplotyping and block partitioning have been extensively studied with regard to the regular genotype data, but more cost-efficient data called XOR-genotypes remain under-investigated. Previous studies developed methods for haplotyping of short-sequence partial XOR-genotypes. In this paper we propose a new algorithm that performs haplotyping of long-range partial XOR-genotype data with possibility of missing entries, and in addition simultaneously finds the block structure for the given data. Our method is implemented as a fast and practical algorithm. We also investigate the effect of the percentage of fully genotyped individuals in a sample on the accuracy of results with and without the missing data. The algorithm is validated by testing on the HapMap data. Obtained results show good prediction rates both for samples with and without missing data. The accuracy of prediction of XOR sites is not significantly affected by the presence of 10% or less missing data.


Asunto(s)
Biología Computacional/métodos , Genotipo , Algoritmos , Alelos , Mapeo Cromosómico/métodos , Cromosomas , Interpretación Estadística de Datos , Procesamiento Automatizado de Datos , Femenino , Genoma Humano , Haplotipos , Humanos , Masculino , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados
3.
Int J Oncol ; 34(1): 107-15, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19082483

RESUMEN

New computational approaches are needed to integrate both protein expression and gene expression profiles, extending beyond the correlation analyses of gene and protein expression profiles in the current practices. Here, we developed an algorithm to classify cell line chemosensitivity based on integrated transcriptional and proteomic profiles. We sought to determine whether a combination of gene and protein expression profiles of untreated cells was able to enhance the performance of chemosensitivity prediction. An integrative feature selection scheme was employed to identify chemosensitivity determinants from genome-wide transcriptional profiles and 52 protein expression levels in 60 human cancer cell lines (the NCI-60). A set of 118 anti-cancer drugs whose mechanisms of action were putatively understood was evaluated. Classifiers of the complete range of drug response (sensitive, intermediate, or resistant) were generated for the evaluated anti-cancer drugs, one for each agent. The classifiers were designed to be independent of the cells' tissue origins. The classification accuracy of all the evaluated 118 agents was remarkably better (P<0.001) than that would be achieved by chance. Furthermore, 76 out of the 118 classifiers identified from integrated genomic and protein profiles significantly (P<0.05) improved the accuracy of protein expression-based classifiers identified previously. These results demonstrate that our integrated genomic and proteomic approach enhances the performance of chemosensitivity prediction. This study presents a new analytical framework to identify integrated gene and protein expression signatures for predicting cellular behavior and clinical outcome in general.


Asunto(s)
Antineoplásicos/farmacología , Resistencia a Antineoplásicos/genética , Perfilación de la Expresión Génica , Proteómica/métodos , Ensayos de Selección de Medicamentos Antitumorales , Humanos , Proteínas de Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Valor Predictivo de las Pruebas , Transcripción Genética , Células Tumorales Cultivadas/efectos de los fármacos
4.
J Bioinform Comput Biol ; 6(6): 1177-92, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19090023

RESUMEN

Multi-population haplotype inference and block partitioning is a difficult task when dealing with mixed genotype samples. A number of studies have shown that the haplotype block structures, as well as the collections of common haplotypes and their frequencies, vary significantly among world populations. These differences are more extreme when the geographical locations for the populations are more distant. Some of the previous studies performed haplotype inference in multi-population samples with known population assignment. Others developed algorithms for clustering of the mixed haplotype or genotype samples with different block structures or genetic marker profiles. We present a new algorithm that performs haplotype inference and block partitioning in a mixed sample of genotypes from two populations when the population assignments are not known. Given a mixed genotype sample, the proposed algorithm (HAPLOCLUST) extracts two clusters of genotypes with different block structures in addition to performing haplotype inference on each of these clusters. When tested on a set of unrelated individuals, our algorithm provides correct assignments comparable to those of two state-of-the-art algorithms for population stratification. The contribution of HAPLOCLUST consists of performing haplotype/block-based population stratification and simultaneously finding the haplotype resolution and block partitioning for the extracted clusters.


Asunto(s)
Algoritmos , Genética de Población/estadística & datos numéricos , Haplotipos , Biología Computacional , Bases de Datos Genéticas , Genoma Humano , Genómica/estadística & datos numéricos , Humanos
5.
Ecol Appl ; 18(8 Suppl): A107-27, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19475921

RESUMEN

Clear Lake is the site of the abandoned Sulphur Bank Mercury Mine, active periodically from 1873 to 1957, resulting in approximately 100 Mg of mercury (Hg) being deposited into the lake's ecosystem. Concentrations of total (primarily inorganic) Hg (TotHg) in Clear Lake are some of the highest reported worldwide for sediments (up to 4.4 x 10(5) ng/g [ppb dry mass]) and water (up to 4 x 10(-1) microg/L [= ppb]). However, the ratio of methylmercury (MeHg) to TotHg at Clear Lake indicates that the methylation process is mostly decoupled from bulk inorganic Hg loading, with Hg in lower trophic level biota significantly less than anticipated compared with other Hg-contaminated sites worldwide. This may be due to several factors, including: (1) reduced bioavailability of Hg derived from the mine (i.e., cinnabar, metacinnabar, and corderoite), (2) the alkaline nature of the lake water, (3) the shallow depth of the lake, which prevents stratification and subsequent methylation in a stratified hypolimnion, and (4) possible dilution of MeHg by a highly productive system. However, while bulk inorganic Hg loading to the lake may not contribute significantly to the bioaccumulation of Hg, acid mine drainage (AMD) from the mine likely promotes Hg methylation by sulfate-reducing and iron-reducing bacteria, making AMD a vehicle for the production of highly bioavailable Hg. If Clear Lake were deeper, less productive, or less alkaline, biota would likely contain much more MeHg than they do presently. Comparisons of MeHg:TotHg ratios in sediments, water, and biota from sites worldwide suggest that the highest production of MeHg may be found at sites influenced by chloralkali plants, followed by sites influenced by gold and silver mines, with the lowest production of MeHg observed at cinnabar and metacinnabar Hg mines. These data also suggest that the total maximum daily load (TMDL) process for Hg at Clear Lake, as currently implemented to reduce contamination in fishes for the protection of wildlife and humans, may be flawed because the metric used to implement Hg load reduction (i.e., TotHg) is not directly proportional to the critical form of Hg that is being bioaccumulated (i.e., MeHg).


Asunto(s)
Ecosistema , Agua Dulce/química , Mercurio/metabolismo , Compuestos de Metilmercurio/química , Contaminantes Químicos del Agua/metabolismo , Animales , California , Sedimentos Geológicos/química , Invertebrados/química , Invertebrados/metabolismo , Mercurio/química , Minería , Plancton/química , Plancton/metabolismo , Factores de Tiempo , Contaminantes Químicos del Agua/química
6.
Ecol Appl ; 18(8 Suppl): A128-57, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19475922

RESUMEN

Mercury (Hg) from Hg mining at Clear Lake, California, USA, has contaminated water and sediments for over 130 years and has the potential to affect human and environmental health. With total mercury (TotHg) concentrations up to 438 mg/kg (dry mass) in surficial sediments and up to 399 ng/L in lake water, Clear Lake is one of the most Hg-contaminated lakes worldwide. Particulate Hg in surface water near the mine ranges from 10,000 to 64,000 ng/g; TotHg declines exponentially with distance from the Sulphur Bank Mercury Mine. From 1992 to 1998, no significant long-term trends for TotHg or methylmercury (MeHg) in sediments or water were observed, but peaks of both TotHg and MeHg occurred following a 1995 flooding event. Sediments and water exhibit summer/fall maxima and winter/spring minima for MeHg, but not TotHg. Sediment TotHg has not declined significantly a decade after remediation in 1992. At the mine site, aqueous TotHg reached 374,000 ng/L in unfiltered groundwater. Pore water sulfate in sediments varies seasonally from 112 mg/L in summer/fall (when Hg methylation is highest) to 3300 mg/L in winter. While TotHg is exceptionally high in both sediments and water, MeHg is substantially lower than would be expected based on the bulk Hg loading to the lake and in comparison with other sites worldwide. Total mercury in Clear Lake water does not exceed the Safe Drinking Water Act criteria, but it sometimes greatly exceeds human health criteria established by the Great Lakes Initiative, U.S. Environmental Protection Agency water quality guidelines, and the California Toxics Rule criterion. Methylmercury concentrations exceed the Great Lakes Initiative criterion for MeHg in water at some sites only during summer/fall. Relative to ecological health, Clear Lake sediments greatly exceed the National Oceanic and Atmospheric Administration's benthic fauna Sediment Quality Guidelines for toxic effects, as well as the more concensus-based Threshold Effects Concentration criteria. Based on these criteria, Hg-contaminated sediments and water from Clear Lake are predicted to have some lethal and sublethal effects on specific resident aquatic species. However, based on unique physical and chemical characteristics of the Clear Lake environment, MeHg toxicity may be significantly less than anticipated from the large inorganic Hg loading.


Asunto(s)
Ecosistema , Agua Dulce/química , Mercurio/química , Mercurio/toxicidad , Contaminantes Químicos del Agua/química , Contaminantes Químicos del Agua/toxicidad , California , Sedimentos Geológicos/química , Humanos , Minería , Factores de Tiempo
7.
Ecol Appl ; 18(8 Suppl): A158-76, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19475923

RESUMEN

Considerable ecological research on mercury (Hg) has focused on higher trophic level species (e.g., fishes and birds), but less on lower trophic species. Clear Lake, site of the abandoned Sulphur Bank Mercury Mine, provides a unique opportunity to study a system influenced by mine-derived Hg. An exponentially decreasing gradient of total Hg (TotHg) away from the mine allowed us to evaluate Hg bioaccumulation in planktonic and benthic invertebrates and evaluate population- and community-level parameters that might be influenced by Hg. Studies from 1992-1998 demonstrated that TotHg in lower trophic species typically decreased exponentially away from the mine, similar to trends observed in water and sediments. However, a significant amount of invertebrate TotHg (approximately 60% for sediment-dwelling chironomid insect larvae) likely derives from Hg-laden particles in their guts. Spatially, whole-body methylmercury (MeHg) did not typically exhibit a significant decrease with increasing distance from the mine. Temporally, TotHg concentrations in plankton and chironomids did not exhibit any short-term (seasonal or annual) or long-term (multiyear) trends. Methylmercury, however, was elevated during late summer/fall in both plankton and chironomids, but it exhibited no long-term increase or decrease during this study. Although data from a 50-yr monitoring program for benthic chaoborid and chironomid larvae documented significant population fluctuations, they did not demonstrate population-level trends with respect to Hg concentrations. Littoral invertebrates also exhibited no detectable population- or community-level trends associated with the steep Hg gradient. Although sediment TotHg concentrations (1-1200 mg/kg dry mass) exceed sediment quality guidelines by up to 7000 times, it is notable that no population- or community-level effects were detected for benthic and planktonic taxa. In comparison with other sites worldwide, Clear Lake's lower trophic species typically have significantly higher TotHg concentrations, but comparable or lower MeHg concentrations, which may be responsible for the discrepancy between highly elevated TotHg concentrations and the general lack of observed population- or community-level effects. These data suggest that MeHg, as well as TotHg, should be used when establishing sediment quality guidelines. In addition, site-specific criteria should be established using the observed relationship between MeHg and observed ecological responses.


Asunto(s)
Cadena Alimentaria , Agua Dulce/química , Invertebrados/efectos de los fármacos , Mercurio/toxicidad , Plancton/efectos de los fármacos , Contaminantes Químicos del Agua/toxicidad , Animales , California , Demografía , Mercurio/química , Minería , Tiempo , Movimientos del Agua , Contaminantes Químicos del Agua/química
8.
Ecol Appl ; 18(8 Suppl): A177-95, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19475924

RESUMEN

Clear Lake, California, USA, receives acid mine drainage and mercury (Hg) from the Sulphur Bank Mercury Mine, a U.S. Environmental Protection Agency (U.S. EPA) Superfund Site that was active intermittently from 1873 to 1957 and partially remediated in 1992. Mercury concentrations were analyzed primarily in four species of Clear Lake fishes: inland silversides (Menidia beryllina, planktivore), common carp (Cyprinus carpio, benthic scavenger/omnivore), channel catfish (Ictalurus punctatus, benthic omnivorous predator), and largemouth bass (Micropterus salmoides, piscivorous top predator). These data represent one of the largest fish Hg data sets for a single site, especially in California. Spatially, total Hg (TotHg) in silversides and bass declined with distance from the mine, indicating that the mine site represents a point source for Hg loading to Clear Lake. Temporally, fish Hg has not declined significantly over 12 years since mine site remediation. Mercury concentrations were variable throughout the study period, with no monotonic trends of increase or decrease, except those correlated with boom and bust cycles of an introduced fish, threadfin shad (Dorosoma petenense). However, stochastic events such as storms also influence juvenile largemouth bass Hg as evidenced during an acid mine drainage overflow event in 1995. Compared to other sites regionally and nationally, most fish in Clear Lake exhibit Hg concentrations similar to other Hg-contaminated sites, up to approximately 2.0 mg/kg wet mass (WM) TotHg in largemouth bass. However, even these elevated concentrations are less than would be anticipated from such high inorganic Hg loading to the lake. Mercury in some Clear Lake largemouth bass exceeded all human health fish consumption guidelines established over the past 25 years by the U.S. Food and Drug Administration (1.0 mg/kg WM), the National Academy of Sciences (0.5 mg/kg WM), and the U.S. EPA (0.3 mg/kg WM). Mercury in higher trophic level fishes exceeds ecotoxicological risk assessment estimates for concentrations that would be safe for wildlife, specifically the nonlisted Common Merganser and the recently delisted Bald Eagle. Fish populations of 11 out of 18 species surveyed exhibited a significant decrease in abundance with increasing proximity to the mine; this decrease is correlated with increasing water and sediment Hg. These trends may be related to Hg or other lake-wide gradients such as distribution of submerged aquatic vegetation.


Asunto(s)
Ecosistema , Peces/metabolismo , Agua Dulce/química , Mercurio/metabolismo , Minería , Contaminantes Químicos del Agua/metabolismo , Animales , California , Mercurio/química , Compuestos de Metilmercurio/química , Compuestos de Metilmercurio/metabolismo , Factores de Tiempo , Contaminantes Químicos del Agua/química
9.
Ecol Appl ; 18(8 Suppl): A12-28, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19475916

RESUMEN

Clear Lake is the site of an abandoned mercury (Hg) mine (active intermittently from 1873 to 1957), now a U.S. Environmental Protection Agency Superfund Site. Mining activities, including bulldozing waste rock and tailings into the lake, resulted in approximately 100 Mg of Hg entering the lake's ecosystem. This series of papers represents the culmination of approximately 15 years of Hg-related studies on this ecosystem, following Hg from the ore body to the highest trophic levels. A series of physical, chemical, biological, and limnological studies elucidate how ongoing Hg loading to the lake is influenced by acid mine drainage and how wind-driven currents and baroclinic circulation patterns redistribute Hg throughout the lake. Methylmercury (MeHg) production in this system is controlled by both sulfate-reducing bacteria as well as newly identified iron-reducing bacteria. Sediment cores (dated with dichlorodiphenyldichlorethane [DDD], 210pb, and 14C) to approximately 250 cm depth (representing up to approximately 3000 years before present) elucidate a record of total Hg (TotHg) loading to the lake from natural sources and mining and demonstrate how MeHg remains stable at depth within the sediment column for decades to millenia. Core data also identify other stresses that have influenced the Clear Lake Basin especially over the past 150 years. Although Clear Lake is one of the most Hg-contaminated lakes in the world, biota do not exhibit MeHg concentrations as high as would be predicted based on the gross level of Hg loading. We compare Clear Lake's TotHg and MeHg concentrations with other sites worldwide and suggest several hypotheses to explain why this discrepancy exists. Based on our data, together with state and federal water and sediment quality criteria, we predict potential resulting environmental and human health effects and provide data that can assist remediation efforts.


Asunto(s)
Ecosistema , Agua Dulce/química , Mercurio/metabolismo , Minería/historia , Contaminantes Químicos del Agua/metabolismo , California , Precipitación Química , Historia del Siglo XIX , Historia del Siglo XX , Actividades Humanas , Humanos , Mercurio/química , Intoxicación por Mercurio , Factores de Tiempo , Contaminantes Químicos del Agua/química , Viento
10.
Clin Cancer Res ; 13(7): 2014-22, 2007 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-17404081

RESUMEN

PURPOSE: The purpose of this study is to predict breast cancer recurrence and metastases and to identify gene signatures indicative of clinicopathologic characteristics using gene expression patterns derived from cDNA microarray. EXPERIMENTAL DESIGN: Expression profiles of 7,650 genes were investigated on an unselected group of 99 node-negative and node-positive breast cancer patients to identify prognostic gene signature of recurrence and metastases. The identified gene signature was validated on independent 78 patients with primary invasive carcinoma (T(1)/T(2) and N(0)) and on 58 patients with locally advanced breast cancer (T(3)/T(4) and/or N(2)). The gene predictors were identified using a combination of random forests and linear discriminant analysis function. RESULTS: This study identified a new 28-gene signature that achieved highly accurate disease-free survival and overall survival (both at P < 0.001, time-dependent receiver operating characteristic analysis) in individual breast cancer patients. Patients categorized into high-risk, intermediate-risk, and low-risk groups had distinct disease-free survival (P < 0.005, Kaplan-Meier analysis, log-rank test) in three patient cohorts. A strong association (P < 0.05) was identified between risk groups and tumor size, tumor grade, estrogen receptor and progesterone receptor status, and HER2/neu overexpression in the studied cohorts. We also identified 14-gene predictors of nodal status and 9-gene predictors of tumor grade. CONCLUSIONS: This study has established a population-based approach to predicting breast cancer outcomes at the individual level exclusively based on gene expression patterns. The 28-gene recurrence signature has been validated as quantifying the probability of recurrence and metastases in patients with heterogeneous histology and disease stage.


Asunto(s)
Neoplasias de la Mama/genética , Carcinoma Ductal de Mama/genética , Perfilación de la Expresión Génica , Metástasis de la Neoplasia/genética , Recurrencia Local de Neoplasia/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Adulto , Neoplasias de la Mama/mortalidad , Neoplasias de la Mama/patología , Femenino , Expresión Génica , Humanos , Persona de Mediana Edad , Pronóstico , Curva ROC
11.
Clin Cancer Res ; 12(15): 4583-9, 2006 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-16899605

RESUMEN

PURPOSE: Accurate prediction of an individual patient's drug response is an important prerequisite of personalized medicine. Recent pharmacogenomics research in chemosensitivity prediction has studied the gene-drug correlation based on transcriptional profiling. However, proteomic profiling will more directly solve the current functional and pharmacologic problems. We sought to determine whether proteomic signatures of untreated cells were sufficient for the prediction of drug response. EXPERIMENTAL DESIGN: In this study, a machine learning model system was developed to classify cell line chemosensitivity exclusively based on proteomic profiling. Using reverse-phase protein lysate microarrays, protein expression levels were measured by 52 antibodies in a panel of 60 human cancer cell (NCI-60) lines. The model system combined several well-known algorithms, including random forests, Relief, and the nearest neighbor methods, to construct the protein expression--based chemosensitivity classifiers. The classifiers were designed to be independent of the tissue origin of the cells. RESULTS: A total of 118 classifiers of the complete range of drug responses (sensitive, intermediate, and resistant) were generated for the evaluated anticancer drugs, one for each agent. The accuracy of chemosensitivity prediction of all the evaluated 118 agents was significantly higher (P < 0.02) than that of random prediction. Furthermore, our study found that the proteomic determinants for chemosensitivity of 5-fluorouracil were also potential diagnostic markers of colon cancer. CONCLUSIONS: The results showed that it was feasible to accurately predict chemosensitivity by proteomic approaches. This study provides a basis for the prediction of drug response based on protein markers in the untreated tumors.


Asunto(s)
Antineoplásicos/uso terapéutico , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Proteómica/métodos , Antineoplásicos/efectos adversos , Antineoplásicos/farmacología , Línea Celular Tumoral , Análisis por Conglomerados , Resistencia a Antineoplásicos/efectos de los fármacos , Resistencia a Antineoplásicos/genética , Humanos , Neoplasias/diagnóstico , Valor Predictivo de las Pruebas , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA