Extract
Multivariate analysis and geovisualization with an integrated geographic knowledge discovery approach.
Introduction
Scientific understanding of complex geographic problems often depends on the discovery, interpretation, and presentation of multivariate spatial patterns, e.g., detection of unknown multivariate spatial patterns or relationships between the incidence of various cancers and socioeconomic, demographic, and/ or environmental factors can lead to important hypotheses about unexpected cancer risk factors. However, identifying such patterns becomes ever more challenging, as powerful data collection and distribution techniques produce geographic datasets of unprecedented size in many application and research areas. These datasets are not only large in data volume (i.e., number of observations) but also characterized by a high number of attributes or dimensions (Guo et al. 2003a; National Research Council 2003). It is an extremely challenging and yet urgent research problem to effectively and efficiently detect and understand relationships and patterns in such voluminous and high-dimensional data (Fayyad et al. 1996; Miller and Han 2001; Guo 2003; Guo et al. 2003b; National Research Council 2003). There are several major challenges that are associated with multivariate spatial analysis in large and high-dimensional geographic datasets. First, the high dimensionality of a dataset can cause serious problems for most analysis methods. One typical problem to address is that it is unlikely for all variables to interrelate meaningfully. Analysts need to find interesting subspaces (subsets of variables) out of a combinatorially explosive number of possible subspaces in a high-dimensional dataset. Second, even when a selected multivariate data space is given as the starting point for analysis (which may be a subspace from a higher-dimensional dataset), it is still a challenging problem to discover the hidden relationships among those variables, as potential patterns may take various forms, linear or non-linear, spatial or non-spatial. Third, attribution of meaning to discovered patterns typically requires input from experts who have domain knowledge and the subsequent presentation of the patterns identified to a broader audience (e.g., other experts who will try to replicate the results, or policy makers who need to act on the results). Fourth, large and high-dimensional datasets demand that all analysis methods are computationally efficient in terms of execution time. Existing methods for multivariate spatial analysis span a continuum between computational and visual approaches. At the computational end, methods typically exploit the computational power and the formalisms of statistical inference to search for patterns. The more visually based methods capitalize instead on the ability of human vision to identify patterns and facilitate this process by presenting the data from different perspectives. Although computational methods can search large volumes of data for a specific type of pattern very quickly, they have very limited pattern interpretation ability. In contrast, visualization methods can help analysts to visually...See the full content of this document
Sponsored links
