Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Biometrics ; 71(4): 1042-9, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26038228

RESUMEN

We wish to estimate the total number of classes in a population based on sample counts, especially in the presence of high latent diversity. Drawing on probability theory that characterizes distributions on the integers by ratios of consecutive probabilities, we construct a nonlinear regression model for the ratios of consecutive frequency counts. This allows us to predict the unobserved count and hence estimate the total diversity. We believe that this is the first approach to depart from the classical mixed Poisson model in this problem. Our method is geometrically intuitive and yields good fits to data with reasonable standard errors. It is especially well-suited to analyzing high diversity datasets derived from next-generation sequencing in microbial ecology. We demonstrate the method's performance in this context and via simulation, and we present a dataset for which our method outperforms all competitors.


Asunto(s)
Biodiversidad , Consorcios Microbianos , Biometría/métodos , Simulación por Computador , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento , Dinámicas no Lineales , Teoría de la Probabilidad
2.
J Eukaryot Microbiol ; 62(3): 338-45, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25312509

RESUMEN

High-throughput sequencing platforms are continuing to increase resulting read lengths, which is allowing for a deeper and more accurate depiction of environmental microbial diversity. With the nascent Reagent Kit v3, Illumina MiSeq now has the ability to sequence the eukaryotic hyper-variable V4 region of the SSU-rDNA locus with paired-end reads. Using DNA collected from soils with analyses of strictly- and nearly identical amplicons, here we ask how the new Illumina MiSeq data compares with what we can obtain with Roche/454 GS FLX with regard to quantity and quality, presence and absence, and abundance perspectives. We show that there is an easy qualitative transition from the Roche/454 to the Illumina MiSeq platforms. The ease of this transition is more nuanced quantitatively for low-abundant amplicons, although estimates of abundances are known to also vary within platforms.


Asunto(s)
Biota , Microbiología Ambiental , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Ribosómico 18S/genética
3.
Brief Bioinform ; 13(4): 420-9, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22308073

RESUMEN

This article reviews recent advances in 'microbiome studies': molecular, statistical and graphical techniques to explore and quantify how microbial organisms affect our environments and ourselves given recent increases in sequencing technology. Microbiome studies are moving beyond mere inventories of specific ecosystems to quantifications of community diversity and descriptions of their ecological function. We review the last 24 months of progress in this sort of research, and anticipate where the next 2 years will take us. We hope that bioinformaticians will find this a helpful springboard for new collaborations with microbiologists.


Asunto(s)
ADN/química , Genómica/métodos , Metagenoma , Ecosistema
4.
Bioinformatics ; 28(7): 1045-7, 2012 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-22333246

RESUMEN

MOTIVATION: The massive data produced by next-generation sequencing require advanced statistical tools. We address estimating the total diversity or species richness in a population. To date, only relatively simple methods have been implemented in available software. There is a need for software employing modern, computationally intensive statistical analyses including error, goodness-of-fit and robustness assessments. RESULTS: We present CatchAll, a fast, easy-to-use, platform-independent program that computes maximum likelihood estimates for finite-mixture models, weighted linear regression-based analyses and coverage-based non-parametric methods, along with outlier diagnostics. Given sample 'frequency count' data, CatchAll computes 12 different diversity estimates and applies a model-selection algorithm. CatchAll also derives discounted diversity estimates to adjust for possibly uncertain low-frequency counts. It is accompanied by an Excel-based graphics program. AVAILABILITY: Free executable downloads for Linux, Windows and Mac OS, with manual and source code, at www.northeastern.edu/catchall. CONTACT: jab18@cornell.edu.


Asunto(s)
Genética de Población/métodos , Genética de Población/estadística & datos numéricos , Modelos Estadísticos , Programas Informáticos , Algoritmos , Bacteriófagos/genética , Biología Computacional/métodos , Funciones de Verosimilitud , Modelos Lineales
5.
J Eukaryot Microbiol ; 59(2): 185-7, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22236102

RESUMEN

The hyper-variable V4 and V9 regions of the small subunit (SSU) rDNA have been targeted for assessing environmental diversity of microbial eukaryotes using next generation sequencing technologies. Here, we explore how the genetic distances among these short fragments compare with the distances obtained from near full-length SSU-rDNA sequences by comparing all pairwise estimates, as well as within and among species of ciliates. Results show that pairwise distances from V4 more closely match the near full-length SSU-rDNA and are more comparable with previous studies based on much longer SSU-rDNA fragments, then pairwise distances from V9. Thus, studies that use the V4 will estimate similar values of phylotype richness and community structure as would have been estimated using the full-length SSU-rDNA.


Asunto(s)
Biodiversidad , Cilióforos/genética , ADN Protozoario/química , ADN Protozoario/genética , ADN Ribosómico/química , ADN Ribosómico/genética , Cilióforos/química , Cilióforos/clasificación , Conformación de Ácido Nucleico , Homología de Secuencia de Ácido Nucleico
6.
Mol Ecol ; 19 Suppl 1: 54-66, 2010 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20331770

RESUMEN

Over the past 100 years, Arctic temperatures have increased at almost twice the global average rate. One consequence is the acceleration of glacier retreat, exposing new habitats that are colonized by microorganisms whose diversity and function are unknown. Here, we characterized bacterial diversity along two approximately parallel chronosequences in an Arctic glacier forefield that span six time points following glacier retreat. We assessed changes in phylotype richness, evenness and turnover rate through the analysis of 16S rRNA gene sequences recovered from 52 samples taken from surface layers along the chronosequences. An average of 4500 sequences was obtained from each sample by 454 pyrosequencing. Using parametric methods, it was estimated that bacterial phylotype richness was high, and that it increased significantly from an average of 4000 (at a threshold of 97% sequence similarity) at locations exposed for 5 years to an average of 7050 phylotypes per 0.5 g of soil at sites that had been exposed for 150 years. Phylotype evenness also increased over time, with an evenness of 0.74 for 150 years since glacier retreat reflecting large proportions of rare phylotypes. The bacterial species turnover rate was especially high between sites exposed for 5 and 19 years. The level of bacterial diversity present in this High Arctic glacier foreland was comparable with that found in temperate and tropical soils, raising the question whether global patterns of bacterial species diversity parallel that of plants and animals, which have been found to form a latitudinal gradient and be lower in polar regions compared with the tropics.


Asunto(s)
Bacterias/genética , Biodiversidad , Cubierta de Hielo/microbiología , Microbiología del Suelo , Regiones Árticas , Bacterias/clasificación , ADN Bacteriano/análisis , ADN Bacteriano/genética , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADN/métodos , Factores de Tiempo
7.
BMC Genomics ; 10 Suppl 2: S9, 2009 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-19607660

RESUMEN

BACKGROUND: Crocodilians (Order Crocodylia) are an ancient vertebrate group of tremendous ecological, social, and evolutionary importance. They are the only extant reptilian members of Archosauria, a monophyletic group that also includes birds, dinosaurs, and pterosaurs. Consequently, crocodilian genomes represent a gateway through which the molecular evolution of avian lineages can be explored. To facilitate comparative genomics within Crocodylia and between crocodilians and other archosaurs, we have constructed a bacterial artificial chromosome (BAC) library for the Australian saltwater crocodile, Crocodylus porosus. This is the first BAC library for a crocodile and only the second BAC resource for a crocodilian. RESULTS: The C. porosus BAC library consists of 101,760 individually archived clones stored in 384-well microtiter plates. NotI digestion of random clones indicates an average insert size of 102 kb. Based on a genome size estimate of 2778 Mb, the library affords 3.7 fold (3.7x) coverage of the C. porosus genome. To investigate the utility of the library in studying sequence distribution, probes derived from CR1a and CR1b, two crocodilian CR1-like retrotransposon subfamilies, were hybridized to C. porosus macroarrays. The results indicate that there are a minimum of 20,000 CR1a/b elements in C. porosus and that their distribution throughout the genome is decidedly non-random. To demonstrate the utility of the library in gene isolation, we probed the C. porosus macroarrays with an overgo designed from a C-mos (oocyte maturation factor) partial cDNA. A BAC containing C-mos was identified and the C-mos locus was sequenced. Nucleotide and amino acid sequence alignment of the C. porosus C-mos coding sequence with avian and reptilian C-mos orthologs reveals greater sequence similarity between C. porosus and birds (specifically chicken and zebra finch) than between C. porosus and squamates (green anole). CONCLUSION: We have demonstrated the utility of the Crocodylus porosus BAC library as a tool in genomics research. The BAC library should expedite complete genome sequencing of C. porosus and facilitate detailed analysis of genome evolution within Crocodylia and between crocodilians and diverse amniote lineages including birds, mammals, and other non-avian reptiles.


Asunto(s)
Caimanes y Cocodrilos/genética , Biblioteca de Genes , Genómica/métodos , Animales , Cromosomas Artificiales Bacterianos/genética , Genes mos , Masculino , Retroelementos , Análisis de Secuencia de ADN
8.
Environ Microbiol ; 11(2): 360-81, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18826436

RESUMEN

The frontiers of eukaryote life in nature are still unidentified. In this study, we analysed protistan communities in the hypersaline (up to 365 g l(-1) NaCl) anoxic L'Atalante deep-sea basin located in the eastern Mediterranean Sea. Targeting 18S ribosomal RNA retrieved from the basin's lower halocline (3501 m depth) we detected 279 protistan sequences that grouped into 42 unique phylotypes (99% sequence similarity). Statistical analyses revealed that these phylotypes account only for a proportion of the protists inhabiting this harsh environment with as much as 50% missed by this survey. Most phylotypes were affiliated with ciliates (45%), dinoflagellates (21%), choanoflagelates (10%) and uncultured marine alveolates (6%). Sequences from other taxonomic groups like stramenopiles, Polycystinea, Acantharea and Euglenozoa, all of which are typically found in non-hypersaline deep-sea systems, are either missing or very rare in our cDNA clone library. Although many DHAB sequences fell within previously identified environmental clades, a large number branched relatively deeply. Phylotype richness, community membership and community structure differ significantly from a deep seawater reference community (3499 m depth). Also, the protistan community in the L'Atalante basin is distinctively different from any previously described hypersaline community. In conclusion, we hypothesize that extreme environments may exert a high selection pressure possibly resulting in the evolution of an exceptional and distinctive assemblage of protists. The deep hypersaline anoxic basins in the Mediterranean Sea provide an ideal platform to test for this hypothesis and are promising targets for the discovery of undescribed protists with unknown physiological capabilities.


Asunto(s)
Biodiversidad , Eucariontes/clasificación , Eucariontes/aislamiento & purificación , Sedimentos Geológicos , Animales , ADN Protozoario/química , ADN Protozoario/genética , ADN Ribosómico/química , ADN Ribosómico/genética , Eucariontes/genética , Biblioteca de Genes , Genes de ARNr , Soluciones Hipertónicas , Hipoxia , Mar Mediterráneo , Datos de Secuencia Molecular , Filogenia , ARN Protozoario/genética , ARN Ribosómico 18S/genética , Análisis de Secuencia de ADN
9.
Anal Biochem ; 388(2): 322-30, 2009 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-19285476

RESUMEN

Cot analysis (DNA reassociation kinetics) has long been used to explore genome structure in individual species, estimate genome similarity among organisms, and evaluate diversity in ecological samples, yet the algorithms and computational tools designed for analyzing Cot data are outdated, difficult to use, and prone to error. We report a new nonlinear regression procedure for analysis of Cot data and describe our algorithms in detail. Our procedure is implemented as CotQuest, a suite of scripts designed for use with the statistics package SAS. Unlike previous programs, CotQuest does not require users to input guesses as to the final values of parameters; rather, it employs a novel algorithm to step through a sequence of progressively more complex models, with the results from a given analysis being used to generate starting values for the next model. Moreover, CotQuest returns a statistical comparison of potential models and provides a variety of model assessment and selection diagnostics to help users in model selection. In situations where two models possess similar goodness-of-fit assessments, visual analysis of the Cot curves and comparison of CotQuest-generated graphs and statistics reflecting the normality and homoscedasticity of residuals can be employed to make educated choices between models.


Asunto(s)
Algoritmos , ADN/química , Análisis de Regresión , Programas Informáticos , Cinética
10.
Extremophiles ; 13(1): 151-67, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19057844

RESUMEN

Environmental factors restrict the distribution of microbial eukaryotes but the exact boundaries for eukaryotic life are not known. Here, we examine protistan communities at the extremes of salinity and osmotic pressure, and report rich assemblages inhabiting Bannock and Discovery, two deep-sea superhaline anoxic basins in the Mediterranean. Using a rRNA-based approach, we detected 1,538 protistan rRNA gene sequences from water samples with total salinity ranging from 39 to 280 g/Kg, and obtained evidence that this DNA was endogenous to the extreme habitat sampled. Statistical analyses indicate that the discovered phylotypes represent only a fraction of species actually inhabiting both the brine and the brine-seawater interface, with as much as 82% of the actual richness missed by our survey. Jaccard indices (e.g., for a comparison of community membership) suggest that the brine/interface protistan communities are unique to Bannock and Discovery basins, and share little (0.8-2.8%) in species composition with overlying waters with typical marine salinity and oxygen tension. The protistan communities from the basins' brine and brine/seawater interface appear to be particularly enriched with dinoflagellates, ciliates and other alveolates, as well as fungi, and are conspicuously poor in stramenopiles. The uniqueness and diversity of brine and brine-interface protistan communities make them promising targets for protistan discovery.


Asunto(s)
Oxígeno/análisis , Agua de Mar/microbiología , Cloruro de Sodio/análisis , Microbiología del Agua , Mar Mediterráneo , Filogenia , ARN Ribosómico/genética , Especificidad de la Especie
11.
BMC Microbiol ; 8: 222, 2008 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-19087295

RESUMEN

BACKGROUND: The main tool to discover novel microbial eukaryotes is the rRNA approach. This approach has important biases, including PCR discrimination against certain rRNA gene species, which makes molecular inventories skewed relative to the source communities. The degree of this bias has not been quantified, and it remains unclear whether species missed from clone libraries could be recovered by increasing sequencing efforts, or whether they cannot be detected in principle. Here we attempt to discriminate between these possibilities by statistically analysing four protistan inventories obtained using different general eukaryotic PCR primers. RESULTS: We show that each PCR primer set-specific clone library is not a sample from the community diversity but rather from a fraction of this diversity. Therefore, even sequencing such clone libraries to saturation would only recover that fraction, which, according to the parametric models, varies between 17 +/- 4% to 49 +/- 10%, depending on the set of primers. The pooled data is thus qualitatively richer than individual libraries, even if normalized to the same sequencing effort. CONCLUSION: The use of a single pair of primers leads to significant underestimation of the true community richness at all levels of taxonomic hierarchy. The majority of available protistan rRNA gene surveys likely sampled less than half of the target diversity, and might have completely missed the rest. The use of multiple PCR primers reduces this bias but does not necessarily eliminate it.


Asunto(s)
ADN Protozoario/genética , Eucariontes/genética , Variación Genética , ARN Ribosómico/genética , Animales , Cartilla de ADN/genética , ADN Ribosómico/genética , Filogenia , Análisis de Secuencia de ADN
12.
Biom J ; 50(6): 1064-76, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19035547

RESUMEN

Consider a sample of animal abundances collected from one sampling occasion. Our focus is in estimating the number of species in a closed population. In order to conduct a noninformative Bayesian inference when modeling this data, we derive Jeffreys and reference priors from the full likelihood. We assume that the species' abundances are randomly distributed according to a distribution indexed by a finite-dimensional parameter. We consider two specific cases which assume that the mean abundances are constant or exponentially distributed. The Jeffreys and reference priors are functions of the Fisher information for the model parameters; the information is calculated in part using the linear difference score for integer parameter models (Lindsay & Roeder 1987). The Jeffreys and reference priors perform similarly in a data example we consider. The posteriors based on the Jeffreys and reference priors are proper.


Asunto(s)
Teorema de Bayes , Modelos Biológicos , Modelos Estadísticos , Densidad de Población , Biodiversidad , Simulación por Computador , Microbiología del Agua
13.
Biom J ; 50(6): 971-82, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-19067331

RESUMEN

We consider parametric distributions intended to model heterogeneity in population size estimation, especially parametric stochastic abundance models for species richness estimation. We briefly review (conditional) maximum likelihood estimation of the number of species, and summarize the results of fitting 7 candidate models to frequency-count data, from a database of >40000 such instances, mostly arising from microbial ecology. We consider error estimation, goodness-of-fit assessment, data subsetting, and other practical matters. We find that, although the array of candidate models can be improved, finite mixtures of a small number of components (point masses or simple diffuse distributions) represent a promising direction. Finally we consider the connections between parametric models for abundance and incidence data, again noting the usefulness of finite mixture models.


Asunto(s)
Biodiversidad , Ecosistema , Métodos Epidemiológicos , Modelos Biológicos , Modelos Estadísticos , Análisis Numérico Asistido por Computador
14.
J Adolesc Health ; 60(2): 176-183, 2017 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-28109451

RESUMEN

PURPOSE: To assess if adolescent dating violence was associated with physical intimate partner violence victimization in adulthood, using a comprehensive propensity score to create a matched group of victims and nonvictims. METHODS: Secondary analysis of waves 1 (1994-1995), 2 (1996), 3 (2001-2002) and 4 (2007-2008) of the National Longitudinal Study of Adolescent to Adult Health, a nationally representative sample of US high schools and middle schools. Individuals aged 12-18 reporting adolescent dating violence between the wave 1 and 2 interviews (n = 732) were matched to nonvictimized participants of the same sex (n = 1,429) using propensity score matching. These participants were followed up approximately 5 (wave 3) and 12 (wave 4) years later. At both follow-up points, physical violence victimization by a current partner was assessed. Data were analyzed using path models. RESULTS: Compared with the matched no victimization group, individuals reporting adolescent dating violence were more likely to experience physical intimate partner violence approximately 12 years later (wave 4), through the experience of 5-year (wave 3) victimization. This path held for males and females. CONCLUSIONS: Results from this sample matched on key risk variables suggest that violence first experienced in adolescent relationships may become chronic, confirming adolescent dating violence as an important risk factor for adult partner violence. Findings from this study underscore the critical role of primary and secondary prevention for adolescent dating violence.


Asunto(s)
Salud del Adolescente , Víctimas de Crimen/psicología , Violencia de Pareja/estadística & datos numéricos , Parejas Sexuales/psicología , Adolescente , Estudios de Casos y Controles , Niño , Víctimas de Crimen/estadística & datos numéricos , Femenino , Humanos , Violencia de Pareja/prevención & control , Estudios Longitudinales , Masculino , Puntaje de Propensión , Factores de Riesgo
15.
Nat Ecol Evol ; 1(4): 91, 2017 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-28812652

RESUMEN

High animal and plant richness in tropical rainforest communities has long intrigued naturalists. It is unknown if similar hyperdiversity patterns are reflected at the microbial scale with unicellular eukaryotes (protists). Here we show, using environmental metabarcoding of soil samples and a phylogeny-aware cleaning step, that protist communities in Neotropical rainforests are hyperdiverse and dominated by the parasitic Apicomplexa, which infect arthropods and other animals. These host-specific parasites potentially contribute to the high animal diversity in the forests by reducing population growth in a density-dependent manner. By contrast, too few operational taxonomic units (OTUs) of Oomycota were found to broadly drive high tropical tree diversity in a host-specific manner under the Janzen-Connell model. Extremely high OTU diversity and high heterogeneity between samples within the same forests suggest that protists, not arthropods, are the most diverse eukaryotes in tropical rainforests. Our data show that protists play a large role in tropical terrestrial ecosystems long viewed as being dominated by macroorganisms.

16.
FEMS Microbiol Ecol ; 58(3): 476-91, 2006 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-17117990

RESUMEN

Microbial communities of extreme environments have often been assumed to have low species richness. We analysed 18S rRNA gene signatures in a sample collected below the chemocline of the anoxic Mariager Fjord in Denmark, and from these data we computed novel parametric and standard nonparametric estimates of protistan phylotype richness. Our results indicate unexpectedly high richness in this environment: at the 99.5% phylotype definition, our most conservative estimate was 568 phylotypes (+/-114, standard error). Phylogenetic analyses revealed that the sequences collected cover the majority of described lineages in the eukaryotic domain. Out of 384 sequences analysed, 307 were identified as protistan targets, none of which was identical to known sequences. However, based on what is known about species that are phylogenetically related to the Mariager sequences, most of the latter seem to belong to strictly or facultative anaerobe organisms. We also found signatures that together with other environmental 18S rRNA gene sequences represent environmental clades of possibly high taxonomic levels (class to kingdom level). One of these clades, consisting exclusively of sequences from anoxic sampling sites, branches at the base of the eukaryotic evolutionary tree among the earliest eukaryotic lineages. Assuming eukaryotic evolution under oxygen-depleted conditions, these sequences may represent immediate descendants of early eukaryotic ancestors.


Asunto(s)
Células Eucariotas/clasificación , Variación Genética , Filogenia , Plancton/clasificación , Plancton/genética , Recuento de Colonia Microbiana , Dinamarca , Células Eucariotas/química , Evolución Molecular , Biología Marina/métodos , Agua de Mar/microbiología
17.
Microbiome ; 1(1): 5, 2013 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-24451229

RESUMEN

BACKGROUND: Viruses are important drivers of ecosystem functions, yet little is known about the vast majority of viruses. Viral shotgun metagenomics enables the investigation of broad ecological questions in phage communities. One ecological characteristic is species richness, which is the number of different species in a community. Viruses do not have a phylogenetic marker analogous to the bacterial 16S rRNA gene with which to estimate richness, and so contig spectra are employed to measure the number of virus taxa in a given community. A contig spectrum is generated from a viral shotgun metagenome by assembling the random sequence reads into groups of sequences that overlap (contigs) and counting the number of sequences that group within each contig. Current tools available to analyze contig spectra to estimate phage richness are limited by relying on rank-abundance data. RESULTS: We present statistical estimates of virus richness from contig spectra. The program CatchAll (http://www.northeastern.edu/catchall/) was used to analyze contig spectra in terms of frequency count data rather than rank-abundance, thus enabling formal statistical analyses. Also, the influence of potentially spurious low-frequency counts on richness estimates was minimized by two methods, empirical and statistical. The results show greater estimates of viral richness than previous calculations in nearly all environments analyzed, including swine feces and reclaimed fresh water. CONCLUSIONS: CatchAll yielded consistent estimates of richness across viral metagenomes from the same or similar environments. Additionally, analysis of pooled viral metagenomes from different environments via mixed contig spectra resulted in greater richness estimates than those of the component metagenomes. Using CatchAll to analyze contig spectra will improve estimations of richness from viral shotgun metagenomes, particularly from large datasets, by providing statistical measures of richness.

18.
Pac Symp Biocomput ; : 203-12, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22174276

RESUMEN

We consider the classical population diversity estimation scenario based on frequency count data (the number of classes or taxa represented once, twice, etc. in the sample), but with the proviso that the lowest frequency counts, especially the singletons, may not be reliably observed. This arises especially in data derived from modern high-throughput DNA sequencing, where errors may cause sequences to be incorrectly assigned to new taxa instead of being matched to existing, observed taxa. We look at a spectrum of methods for addressing this issue, focusing in particular on fitting a parametric mixture model and deleting the highest-diversity component; we also consider regarding the data as left-censored and effectively pooling two or more low frequency counts. We find that these purely statistical "downstream" corrections will depend strongly on their underlying assumptions, but that such methods can be useful nonetheless.


Asunto(s)
Biodiversidad , Microbiota/genética , Animales , Teorema de Bayes , Biología Computacional , Heces/virología , Secuenciación de Nucleótidos de Alto Rendimiento , Modelos Estadísticos , Porcinos/virología
19.
Pac Symp Biocomput ; : 121-30, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21121040

RESUMEN

In many situations we are faced with the need to estimate the number of classes in a population from observed count data: this arises not only in biology, where we are interested in the number of taxa such as species, but also in many other fields such as public health, criminal justice, software engineering, etc. This problem has a rich history in theoretical statistics, dating back at least to 1943, and many approaches have been proposed and studied. However, to date only one approach has been implemented in readily available software, namely a relatively simple nonparametric method which, while straightforward to program, is not flexible and can be prone to information loss. Here we present CatchAll, a new, platform-independent, user-friendly, computationally optimized software package which calculates a powerful and flexible suite of parametric models (based on current statistical research) in addition to all existing nonparametric procedures. We briefly describe the software and its mathematical underpinnings (which are treated in depth elsewhere), and we work through an applied example from microbial ecology in detail.


Asunto(s)
Biodiversidad , Programas Informáticos , Algoritmos , Biología Computacional , Bases de Datos Factuales/estadística & datos numéricos , Ecosistema , Análisis de Elementos Finitos , Funciones de Verosimilitud , Modelos Estadísticos , Agua de Mar/microbiología , Especificidad de la Especie , Estadísticas no Paramétricas
20.
ISME J ; 5(2): 184-95, 2011 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-20631807

RESUMEN

Despite the ecological importance of marine pico-size eukaryotes, the study of their in situ diversity using molecular tools started just a few years ago. These studies have revealed that marine picoeukaryotes are very diverse and include many novel taxa. However, the amount and structure of their phylogenetic diversity and the extent of their sequence novelty still remains poorly known, as a systematic analysis has been seldom attempted. In this study, we use a coherent and carefully curated data set of 500 published 18S ribosomal DNA sequences to quantify the diversity and novelty patterns of picoeukaryotes in the Indian Ocean. Our phylogenetic tree showed many distant lineages. We grouped sequences in OTUs (operational taxonomic units) at discrete values delineated by pair-wise Jukes-Cantor (JC) distances and tree patristic distances. At a distance of 0.01, the number of OTUs observed (237/242; using JC or patristic distances, respectively) was half the number of sequences analyzed, indicating the existence of microdiverse clusters of highly related sequences. At this distance level, we estimated 600-800 OTUs using several statistical methods. The number of OTUs observed was still substantial at higher distances (39/82 at 0.20 distance) suggesting a large diversity at high-taxonomic ranks. Most sequences were related to marine clones from other sites and many were distant to cultured organisms, highlighting the huge culturing gap within protists. The novelty analysis indicated the putative presence of pseudogenes and of truly novel high-rank phylogenetic lineages. The identified diversity and novelty patterns among marine picoeukaryotes are of great importance for understanding and interpreting their ecology and evolution.


Asunto(s)
Biodiversidad , Eucariontes/clasificación , Eucariontes/genética , Variación Genética , Filogenia , Océano Índico , ARN Ribosómico 18S/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA