Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
PLoS Biol ; 17(1): e3000125, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30695030

RESUMO

Over the past decade, biology has undergone a data revolution in how researchers collect data and the amount of data being collected. An emerging challenge that has received limited attention in biology is managing, working with, and providing access to data under continual active collection. Regularly updated data present unique challenges in quality assurance and control, data publication, archiving, and reproducibility. We developed a workflow for a long-term ecological study that addresses many of the challenges associated with managing this type of data. We do this by leveraging existing tools to 1) perform quality assurance and control; 2) import, restructure, version, and archive data; 3) rapidly publish new data in ways that ensure appropriate credit to all contributors; and 4) automate most steps in the data pipeline to reduce the time and effort required by researchers. The workflow leverages tools from software development, including version control and continuous integration, to create a modern data management system that automates the pipeline.


Assuntos
Curadoria de Dados/métodos , Curadoria de Dados/tendências , Animais , Big Data , Biologia Computacional/métodos , Humanos , Publicações , Reprodutibilidade dos Testes , Software , Fluxo de Trabalho
2.
PLoS Comput Biol ; 17(7): e1009180, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34214077

RESUMO

Broad scale remote sensing promises to build forest inventories at unprecedented scales. A crucial step in this process is to associate sensor data into individual crowns. While dozens of crown detection algorithms have been proposed, their performance is typically not compared based on standard data or evaluation metrics. There is a need for a benchmark dataset to minimize differences in reported results as well as support evaluation of algorithms across a broad range of forest types. Combining RGB, LiDAR and hyperspectral sensor data from the USA National Ecological Observatory Network's Airborne Observation Platform with multiple types of evaluation data, we created a benchmark dataset to assess crown detection and delineation methods for canopy trees covering dominant forest types in the United States. This benchmark dataset includes an R package to standardize evaluation metrics and simplify comparisons between methods. The benchmark dataset contains over 6,000 image-annotated crowns, 400 field-annotated crowns, and 3,000 canopy stem points from a wide range of forest types. In addition, we include over 10,000 training crowns for optional use. We discuss the different evaluation data sources and assess the accuracy of the image-annotated crowns by comparing annotations among multiple annotators as well as overlapping field-annotated crowns. We provide an example submission and score for an open-source algorithm that can serve as a baseline for future methods.


Assuntos
Bases de Dados Factuais , Monitoramento Ambiental/métodos , Florestas , Processamento de Imagem Assistida por Computador/métodos , Árvores , Algoritmos , Benchmarking , Ecossistema , Imagem Óptica , Árvores/classificação , Árvores/fisiologia
3.
Ecol Appl ; 32(8): e2694, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35708073

RESUMO

Advances in artificial intelligence for computer vision hold great promise for increasing the scales at which ecological systems can be studied. The distribution and behavior of individuals is central to ecology, and computer vision using deep neural networks can learn to detect individual objects in imagery. However, developing supervised models for ecological monitoring is challenging because it requires large amounts of human-labeled training data, requires advanced technical expertise and computational infrastructure, and is prone to overfitting. This limits application across space and time. One solution is developing generalized models that can be applied across species and ecosystems. Using over 250,000 annotations from 13 projects from around the world, we develop a general bird detection model that achieves over 65% recall and 50% precision on novel aerial data without any local training despite differences in species, habitat, and imaging methodology. Fine-tuning this model with only 1000 local annotations increases these values to an average of 84% recall and 69% precision by building on the general features learned from other data sources. Retraining from the general model improves local predictions even when moderately large annotation sets are available and makes model training faster and more stable. Our results demonstrate that general models for detecting broad classes of organisms using airborne imagery are achievable. These models can reduce the effort, expertise, and computational resources necessary for automating the detection of individual organisms across large scales, helping to transform the scale of data collection in ecology and the questions that can be addressed.


Assuntos
Aprendizado Profundo , Humanos , Animais , Ecossistema , Inteligência Artificial , Redes Neurais de Computação , Aves
4.
PLoS Comput Biol ; 16(5): e1007809, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32379759

RESUMO

Postdocs are a critical transition for early-career researchers. This transient period, between finishing a PhD and finding a permanent position, is when early-career researchers develop independent research programs and establish collaborative relationships that can make a successful career. Traditionally, postdocs physically relocate-sometimes multiple times-for these short-term appointments, which creates challenges that can disproportionately affect members of traditionally underrepresented groups in science, technology, engineering, and mathematics (STEM). However, many research activities involving analytical and quantitative work do not require a physical presence in a lab and can be accomplished remotely. Other fields have embraced remote work, yet many academics have been hesitant to hire remote postdocs. In this article, we present advice to both principal investigators (PIs) and postdocs for successfully navigating a remote position. Using the combined experience of the authors (as either remote postdocs or employers of remote postdocs), we provide a road map to overcome the real (and perceived) obstacles associated with remote work. With planning, communication, and creativity, remote postdocs can be a fully functioning and productive member of a research lab. Further, our rules can be useful for research labs generally and can help foster a more flexible and inclusive environment.


Assuntos
Educação a Distância/métodos , Preceptoria/métodos , Pesquisadores/educação , Escolha da Profissão , Educação a Distância/tendências , Engenharia/educação , Humanos , Matemática/educação , Ciência/educação , Tecnologia/educação
5.
Ecol Appl ; 31(4): e02300, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33480058

RESUMO

Functional ecology has increasingly focused on describing ecological communities based on their traits (measurable features affecting individuals' fitness and performance). Analyzing trait distributions within and among forests could significantly improve understanding of community composition and ecosystem function. Historically, data on trait distributions are generated by (1) collecting a small number of leaves from a small number of trees, which suffers from limited sampling but produces information at the fundamental ecological unit (the individual), or (2) using remote-sensing images to infer traits, producing information continuously across large regions, but as plots (containing multiple trees of different species) or pixels, not individuals. Remote-sensing methods that identify individual trees and estimate their traits would provide the benefits of both approaches, producing continuous large-scale data linked to biological individuals. We used data from the National Ecological Observatory Network (NEON) to develop a method to scale up functional traits from 160 trees to the millions of trees within the spatial extent of two NEON sites. The pipeline consists of three stages: (1) image segmentation, to identify individual trees and estimate structural traits; (2) an ensemble of models to infer leaf mass area (LMA), nitrogen, carbon, and phosphorus content using hyperspectral signatures, and DBH from allometry; and (3) predictions for segmented crowns for the full remote-sensing footprint at the NEON sites. The R2 values on held-out test data ranged from 0.41 to 0.75 on held-out test data. The ensemble approach performed better than single partial least-squares models. Carbon performed poorly compared to other traits (R2 of 0.41). The crown segmentation step contributed the most uncertainty in the pipeline, due to over-segmentation. The pipeline produced good estimates of DBH (R2 of 0.62 on held-out data). Trait predictions for crowns performed significantly better than comparable predictions on pixels, resulting in improvement of R2 on test data of between 0.07 and 0.26. We used the pipeline to produce individual-level trait data for ~5 million individual crowns, covering a total extent of ~360 km2 . This large data set allows testing ecological questions on landscape scales, revealing that foliar traits are correlated with structural traits and environmental conditions.


Assuntos
Ecossistema , Florestas , Humanos , Folhas de Planta , Plantas , Árvores
6.
Proc Natl Acad Sci U S A ; 115(7): 1424-1432, 2018 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-29382745

RESUMO

Two foundational questions about sustainability are "How are ecosystems and the services they provide going to change in the future?" and "How do human decisions affect these trajectories?" Answering these questions requires an ability to forecast ecological processes. Unfortunately, most ecological forecasts focus on centennial-scale climate responses, therefore neither meeting the needs of near-term (daily to decadal) environmental decision-making nor allowing comparison of specific, quantitative predictions to new observational data, one of the strongest tests of scientific theory. Near-term forecasts provide the opportunity to iteratively cycle between performing analyses and updating predictions in light of new evidence. This iterative process of gaining feedback, building experience, and correcting models and methods is critical for improving forecasts. Iterative, near-term forecasting will accelerate ecological research, make it more relevant to society, and inform sustainable decision-making under high uncertainty and adaptive management. Here, we identify the immediate scientific and societal needs, opportunities, and challenges for iterative near-term ecological forecasting. Over the past decade, data volume, variety, and accessibility have greatly increased, but challenges remain in interoperability, latency, and uncertainty quantification. Similarly, ecologists have made considerable advances in applying computational, informatic, and statistical methods, but opportunities exist for improving forecast-specific theory, methods, and cyberinfrastructure. Effective forecasting will also require changes in scientific training, culture, and institutions. The need to start forecasting is now; the time for making ecology more predictive is here, and learning by doing is the fastest route to drive the science forward.


Assuntos
Ecologia/educação , Ecologia/métodos , Teorema de Bayes , Mudança Climática , Ecologia/tendências , Ecossistema , Previsões , Humanos , Modelos Teóricos
7.
Ecol Appl ; 30(1): e02025, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31630468

RESUMO

Phenology, the timing of cyclical and seasonal natural phenomena such as flowering and leaf out, is an integral part of ecological systems with impacts on human activities like environmental management, tourism, and agriculture. As a result, there are numerous potential applications for actionable predictions of when phenological events will occur. However, despite the availability of phenological data with large spatial, temporal, and taxonomic extents, and numerous phenology models, there have been no automated species-level forecasts of plant phenology. This is due in part to the challenges of building a system that integrates large volumes of climate observations and forecasts, uses that data to fit models and make predictions for large numbers of species, and consistently disseminates the results of these forecasts in interpretable ways. Here, we describe a new near-term phenology-forecasting system that makes predictions for the timing of budburst, flowers, ripe fruit, and fall colors for 78 species across the United States up to 6 months in advance and is updated every four days. We use the lessons learned in developing this system to provide guidance developing large-scale near-term ecological forecast systems more generally, to help advance the use of automated forecasting in ecology.


Assuntos
Mudança Climática , Clima , Ecossistema , Flores , Plantas , Temperatura , Estados Unidos
8.
Ecology ; 100(2): e02568, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30499218

RESUMO

Large-scale observational data from citizen science efforts are becoming increasingly common in ecology, and researchers often choose between these and data from intensive local-scale studies for their analyses. This choice has potential trade-offs related to spatial scale, observer variance, and interannual variability. Here we explored this issue with phenology by comparing models built using data from the large-scale, citizen science USA National Phenology Network (USA-NPN) effort with models built using data from more intensive studies at Long Term Ecological Research (LTER) sites. We built statistical and process based phenology models for species common to each data set. From these models, we compared parameter estimates, estimates of phenological events, and out-of-sample errors between models derived from both USA-NPN and LTER data. We found that model parameter estimates for the same species were most similar between the two data sets when using simple models, but parameter estimates varied widely as model complexity increased. Despite this, estimates for the date of phenological events and out-of-sample errors were similar, regardless of the model chosen. Predictions for USA-NPN data had the lowest error when using models built from the USA-NPN data, while LTER predictions were best made using LTER-derived models, confirming that models perform best when applied at the same scale they were built. This difference in the cross-scale model comparison is likely due to variation in phenological requirements within species. Models using the USA-NPN data set can integrate parameters over a large spatial scale while those using an LTER data set can only estimate parameters for a single location. Accordingly, the choice of data set depends on the research question. Inferences about species-specific phenological requirements are best made with LTER data, and if USA-NPN or similar data are all that is available, then analyses should be limited to simple models. Large-scale predictive modeling is best done with the larger-scale USA-NPN data, which has high spatial representation and a large regional species pool. LTER data sets, on the other hand, have high site fidelity and thus characterize inter-annual variability extremely well. Future research aimed at forecasting phenology events for particular species over larger scales should develop models that integrate the strengths of both data sets.


Assuntos
Mudança Climática , Modelos Teóricos , Estudos Longitudinais , Estações do Ano
9.
Ecology ; 99(8): 1825-1835, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29802772

RESUMO

Transient species occur infrequently in a community over time and do not maintain viable local populations. Because transient species interact differently than non-transients with their biotic and abiotic environment, it is important to characterize the prevalence of these species and how they impact our understanding of ecological systems. We quantified the prevalence and impact of transient species in communities using data on over 19,000 community time series spanning an array of ecosystems, taxonomic groups, and spatial scales. We found that transient species are a general feature of communities regardless of taxa or ecosystem. The proportion of these species decreases with increasing spatial scale leading to a need to control for scale in comparative work. Removing transient species from analyses influences the form of a suite of commonly studied ecological patterns including species-abundance distributions, species-energy relationships, species-area relationships, and temporal turnover. Careful consideration should be given to whether transient species are included in analyses depending on the theoretical and practical relevance of these species for the question being studied.


Assuntos
Biota , Ecossistema , Prevalência
10.
Bioscience ; 67(6): 546-557, 2017 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-28584342

RESUMO

The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap.

11.
Ecology ; 97(5): 1228-38, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-27349099

RESUMO

Ecological patterns arise from the interplay of many different processes, and yet the emergence of consistent phenomena across a diverse range of ecological systems suggests that many patterns may in part be determined by statistical or numerical constraints. Differentiating the extent to which patterns in a given system are determined statistically, and where it requires explicit ecological processes, has been difficult. We tackled this challenge by directly comparing models from a constraint-based theory, the Maximum Entropy Theory of Ecology (METE) and models from a process-based theory, the size-structured neutral theory (SSNT). Models from both theories were capable of characterizing the distribution of individuals among species and the distribution of body size among individuals across 76 forest communities. However, the SSNT models consistently yielded higher overall likelihood, as well as more realistic characterizations of the relationship between species abundance and average body size of conspecific individuals. This suggests that the details of the biological processes contain additional information for understanding community structure that are not fully captured by the METE constraints in these systems. Our approach provides a first step towards differentiating between process- and constraint-based models of ecological systems and a general methodology for comparing ecological models that make predictions for multiple patterns.


Assuntos
Ecossistema , Modelos Biológicos , Animais , Tamanho Corporal , Modelos Estatísticos , Plantas/classificação , Plantas/metabolismo , Densidade Demográfica
12.
Am Nat ; 186(2): E51-60, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26655161

RESUMO

Taylor's law (TL) describes the scaling relationship between the mean and variance of populations as a power law. TL is widely observed in ecological systems across space and time, with exponents varying largely between 1 and 2. Many ecological explanations have been proposed for TL, but it is also commonly observed outside ecology. We propose that TL arises from the constraining influence of two primary variables: the number of individuals and the number of censuses or sites. We show that most possible configurations of individuals among censuses or sites produce the power-law form of TL, with exponents between 1 and 2. This "feasible set" approach suggests that TL is a statistical pattern driven by two constraints, providing an a priori explanation for this ubiquitous pattern. However, the exact form of any specific mean-variance relationship cannot be predicted in this way, that is, this approach does a poor job of predicting variation in the exponent, suggesting that TL may still contain ecological information.


Assuntos
Modelos Biológicos , Dinâmica Populacional , Ecologia , Pesquisa Empírica
13.
Am Nat ; 185(3): E70-80, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25821878

RESUMO

The maximum entropy theory of ecology (METE) is a unified theory of biodiversity that predicts a large number of macroecological patterns using information on only species richness, total abundance, and total metabolic rate of the community. We evaluated four major predictions of METE simultaneously at an unprecedented scale using data from 60 globally distributed forest communities including more than 300,000 individuals and nearly 2,000 species.METE successfully captured 96% and 89% of the variation in the rank distribution of species abundance and individual size but performed poorly when characterizing the size-density relationship and intraspecific distribution of individual size. Specifically, METE predicted a negative correlation between size and species abundance, which is weak in natural communities. By evaluating multiple predictions with large quantities of data, our study not only identifies a mismatch between abundance and body size in METE but also demonstrates the importance of conducting strong tests of ecological theories.


Assuntos
Biodiversidade , Tamanho Corporal , Ecossistema , Entropia , Demografia , Florestas , Modelos Biológicos , Densidade Demográfica , Dinâmica Populacional
14.
Ecol Lett ; 16(9): 1177-85, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23848604

RESUMO

The species abundance distribution (SAD) is one of the most intensively studied distributions in ecology and its hollow-curve shape is one of ecology's most general patterns. We examine the SAD in the context of all possible forms having the same richness (S) and total abundance (N), i.e. the feasible set. We find that feasible sets are dominated by similarly shaped hollow curves, most of which are highly correlated with empirical SADs (most R(2) values > 75%), revealing a strong influence of N and S on the form of the SAD and an a priori explanation for the ubiquitous hollow curve. Empirical SADs are often more hollow and less variable than the majority of the feasible set, revealing exceptional unevenness and relatively low natural variability among ecological communities. We discuss the importance of the feasible set in understanding how general constraints determine observable variation and influence the forms of predicted and empirical patterns.


Assuntos
Biodiversidade , Modelos Biológicos , Densidade Demográfica
15.
Am Nat ; 181(4): E83-90, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23535624

RESUMO

Studies of biodiversity typically assume that all species are equivalent. However, some species in a community maintain viable populations in the study area, while others occur only occasionally as transient individuals. Here we show that North American bird communities can reliably be divided into core and transient species groups and that the richness of each group is driven by different processes. The richness of core species is influenced primarily by local environmental conditions, while the richness of transient species is influenced primarily by the heterogeneity of the surrounding landscape. This demonstrates that the well-known effects of the local environment and landscape heterogeneity on overall species richness are the result of two sets of processes operating differentially on core and transient species. Models of species richness should focus on explaining two distinct patterns, those of core and transient species, rather than a single pattern for the community as a whole.


Assuntos
Migração Animal/fisiologia , Biodiversidade , Aves/classificação , Aves/fisiologia , Modelos Biológicos , Animais , Meio Ambiente , Análise Multivariada
16.
PeerJ ; 11: e16578, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38144190

RESUMO

Data on individual tree crowns from remote sensing have the potential to advance forest ecology by providing information about forest composition and structure with a continuous spatial coverage over large spatial extents. Classifying individual trees to their taxonomic species over large regions from remote sensing data is challenging. Methods to classify individual species are often accurate for common species, but perform poorly for less common species and when applied to new sites. We ran a data science competition to help identify effective methods for the task of classification of individual crowns to species identity. The competition included data from three sites to assess each methods' ability to generalize patterns across two sites simultaneously and apply methods to an untrained site. Three different metrics were used to assess and compare model performance. Six teams participated, representing four countries and nine individuals. The highest performing method from a previous competition in 2017 was applied and used as a baseline to understand advancements and changes in successful methods. The best species classification method was based on a two-stage fully connected neural network that significantly outperformed the baseline random forest and gradient boosting ensemble methods. All methods generalized well by showing relatively strong performance on the trained sites (accuracy = 0.46-0.55, macro F1 = 0.09-0.32, cross entropy loss = 2.4-9.2), but generally failed to transfer effectively to the untrained site (accuracy = 0.07-0.32, macro F1 = 0.02-0.18, cross entropy loss = 2.8-16.3). Classification performance was influenced by the number of samples with species labels available for training, with most methods predicting common species at the training sites well (maximum F1 score of 0.86) relative to the uncommon species where none were predicted. Classification errors were most common between species in the same genus and different species that occur in the same habitat. Most methods performed better than the baseline in detecting if a species was not in the training data by predicting an untrained mixed-species class, especially in the untrained site. This work has highlighted that data science competitions can encourage advancement of methods, particularly by bringing in new people from outside the focal discipline, and by providing an open dataset and evaluation criteria from which participants can learn.


Assuntos
Ciência de Dados , Tecnologia de Sensoriamento Remoto , Humanos , Redes Neurais de Computação , Ecossistema
17.
Ecology ; 93(8): 1772-8, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22928405

RESUMO

The species abundance distribution (SAD) is one of themost studied patterns in ecology due to its potential insights into commonness and rarity, community assembly, and patterns of biodiversity. It is well established that communities are composed of a few common and many rare species, and numerous theoretical models have been proposed to explain this pattern. However, no attempt has been made to determine how well these theoretical characterizations capture observed taxonomic and global-scale spatial variation in the general form of the distribution. Here, using data of a scope unprecedented in community ecology, we show that a simple maximum entropy model produces a truncated log-series distribution that can predict between 83% and 93% of the observed variation in the rank abundance of species across 15 848 globally distributed communities including birds, mammals, plants, and butterflies. This model requires knowledge of only the species richness and total abundance of the community to predict the full abundance distribution, which suggests that these factors are sufficient to understand the distribution for most purposes. Since geographic patterns in richness and abundance can often be successfully modeled, this approach should allow the distribution of commonness and rarity to be characterized, even in locations where empirical data are unavailable.


Assuntos
Aves/fisiologia , Borboletas/fisiologia , Ecossistema , Entropia , Mamíferos/fisiologia , Modelos Biológicos , Animais , Simulação por Computador , Demografia , Densidade Demográfica , Árvores/fisiologia
19.
Ecology ; 92(10): 1887-94, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22073779

RESUMO

Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain.


Assuntos
Ecossistema , Monitoramento Ambiental/métodos , Modelos Biológicos , Dinâmica não Linear
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA