RESUMO
BACKGROUND: Waterborne diseases are one of the leading causes of mortality in developing countries, and diarrhea alone is responsible for over 1.5 million deaths annually. Such waterborne illnesses most often affect those in impoverished rural communities who rely on rivers for their supply of drinking water. Deaths are most common among infants and the elderly. Without knowledge of which communities are upstream of a community, upstream sanitary and bathing behaviors can never be directly linked to downstream health outcomes including disease outbreaks. Although current GIS technologies can answer the upstream question for a limited number of downstream communities, no systematic way existed of labeling each downstream village with all its upstream contributing villages along river networks or within basins at the large national scale, such as in Indonesia. This limitation prohibits macro analyses of waterborne illness across developing world communities globally. RESULTS: This novel method approach combines parallel computing, big data, community data, and open source GIS to create a database of upstream communities for 50,000-70,0000 villages in Indonesia across four differing periods. The resultant village database provides information that can be tied to the Indonesian PODES health and behavior surveys in each village to connect upstream sanitary behaviors to downstream health outcomes. We find that the approximately 250,000 communities analyzed across the four periods in Indonesia have a combined total of 13.7 million upstream villages. The average number of upstream villages per village was almost 55, the maximum number of upstream villages for any single village was over 5300. CONCLUSIONS: Advances in big-data availability, particularly high-resolution elevation data, the lowering of the cost of parallel computing options, mass survey data, and open source GIS algorithms that can utilize parallel processing and big-data, open new opportunities for the study of human health at micro granularities but across entire nations. The database generated has already been used by health researchers to compute the influence of upstream behaviors on downstream diarrhea outbreaks and to monitor avoidance behaviors to upstream water behaviors across all downstream 250,000 Indonesian villages over 4 years, and further waterborne health analyses are underway.