Use of Unstructured Event-Based Reports for Global Infectious Disease Surveillance
Volume 15, Number 5May 2009
CDC
Abstract
Free or low-cost sources of unstructured information, such as Internet news and online discussion sites, provide detailed local and near real-time data on disease outbreaks, even in countries that lack traditional public health surveillance. To improve public health surveillance and, ultimately, interventions, we examined 3 primary systems that process event-based outbreak information:
- Global Public Health Intelligence Network,
- HealthMap, and
- EpiSPIDER. Despite similarities among them, these systems are highly complementary because they monitor different data types, rely on varying levels of automation and human analysis, and distribute distinct information. Future development should focus on linking these systems more closely to public health practitioners in the field and establishing collaborative networks for alert verification and dissemination. Such development would further establish event-based monitoring as an invaluable public health resource that provides critical context and an alternative to traditional indicator-based outbreak reporting.
International travel and movement of goods increasingly facilitates the spread of pathogens across and among nations, enabling pathogens to invade new territories and adapt to new environments and hosts (1-3). Officials now need to consider worldwide disease outbreaks when determining what potential threats might affect the health and welfare of their nations (4).
In industrialized countries, unprecedented efforts have built on indicator-based public health surveillance, and monitoring of clinically relevant data sources now provides early indication of outbreaks (5). In many countries where public health infrastructure is rudimentary, deteriorating, or nonexistent, efforts to improve the ability to conduct electronic disease surveillance include more robust data collection methods and enhanced analysis capability (6,7).
However, in these parts of the world, basing timely and sensitive reporting of public health threats on conventional surveillance sources remains challenging. Lack of resources and trained public health professionals poses a substantial roadblock (8-10). Furthermore, reporting emerging infectious diseases has certain constraints, including fear of repercussions on trade and tourism, delays in clearance through multiple levels of government, tendency to err on the conservative side, and inadequately functioning or nonexistent surveillance infrastructure (11).
In many countries, free or low-cost sources of unstructured information, including Internet news and online discussion sites (Figure), could provide detailed local and near real-time data on potential and confirmed disease outbreaks and other public health events (9,10,13-18). These event-based informal data sources provide insight into new and ongoing public health challenges in areas that have limited or no public health reporting infrastructure but have the highest risk for emerging diseases (19). In fact, event-based informal surveillance now represents a critical source of epidemic intelligence-almost all major outbreaks investigated by the World Health Organization (WHO) are first identified through these informal sources (9,13).
With a goal of improving public health surveillance and, ultimately, intervention efforts, we (the architects, developers, and methodologists for the information systems described herein) reviewed 3 of the primary active systems that process unstructured (free-text), event-based information on disease outbreaks: The Global Public Health Intelligence Network (GPHIN), the HealthMap system, and the EpiSPIDER project (Semantic Processing and Integration of Distributed Electronic Resources for Epidemics [and disasters]; http://www.epispider.net/). Our report is the result of a joint symposium from the American Medical Informatics Association Annual Conference in 2007. Despite key differences, all 3 systems face similar technologic challenges, including 1) topic detection and data acquisition from a high-volume stream of event reports (not all related to disease outbreaks); 2) data characterization, categorization, or information extraction; 3) information formatting and integration with other sources; and 4) information dissemination to clients or, more broadly, to the public.
Each system tackles these challenges in unique ways, highlighting the diversity of possible approaches and public health objectives. Our goal was to draw lessons from these early experiences to advance overall progress in this recently established field of event-based public health surveillance. After summarizing these systems, we compared them within the context of this new surveillance framework and outlined goals for future development and research.
Even with the recent enactment of international health regulations in 2005, no guarantee yet exists that broad compliance will be feasible, given the challenges associated with reporting mechanisms and multilateral coordination (12).
CONTINUE TO FULL REPORT AND RESOURCES

