Control Logo

CLOUDS: Online Database Visualization


Chris Olston1, Tali Roth2, and Andy Chou3
(Professor Joseph M. Hellerstein)

Data visualization is a hot topic in the database community because of its potential to make databases easier to use. Visualization systems present graphical representations of large data sets in order to make the data easy to understand [1]. Typically, these images can be produced only after processing every row in a large data table, forcing the visualization system to wait for a long time before it can display a useful picture.

Most visualization systems can produce the graphical representation one row at a time in an online, constantly updating fashion. This improves the interactivity of the visualization system by displaying in graphical form a progressive sample of the data table. We are researching a new visualization algorithm called CLOUDS [2], which improves upon this notion. The goal of CLOUDS is to make visualization more interactive by quickly displaying an accurate approximation of the final image. In addition to displaying a progressive sample, CLOUDS uses various statistical techniques to predict in advance what the final picture will look like once it has been completed. As rows are being retrieved from the database table, CLOUDS displays the graphical representations of the rows that have been retrieved so far along with translucent "clouds" that indicate where the remaining graphical objects are predicted to lie. This produces an approximate image that refines as more data is processed.

Figure 1 shows an example visualization in progress of US cities. Each row in the database is displayed as one black point on the screen. This same visualization is shown in Figure 2 using the CLOUDS algorithm. Note that the "clouds" indicate the density of cities that have not yet been plotted.

As data points are being retrieved from the database, CLOUDS approximates the final picture more closely than does the conventional display algorithm (see Figure 3). Consequently, visualization systems using CLOUDS can display closer approximations to the final image faster than conventional visualization systems.


Figure 1: A partially completed visualization of US cities without CLOUDS.


Figure 2: A partially completed visualization of US cities with CLOUDS. Note that the "clouds" indicate the density of cities that have not yet been plotted.


Figure 3: A graph of the mean squared error over time. The CLOUDS visualization algorithms (pink, red lines) have lower error than the non-CLOUDS algorithm (blue line) for the first half of the rendering time. Eventually, the non-CLOUDS algorithm "catches up" with CLOUDS. However, CLOUDS achieves the goal of quickly displaying a more correct approximate visualization than the conventional algorithm.

[1]
Olston, C., Woodruff, A., Aiken, A., Chu, M., Ercegovac, V., Lin, M., Spalding, M., Stonebraker, M. DataSplash, SIGMOD 1998, Seattle, Washington, June 1998, pp. 550-552 [postscript].
[2]
Avnur, R., Hellerstein, J., Lo, B., Olston, C., Raman, B., Raman, V., Roth, T., Wylie, K. CONTROL: Continuous Output and Navigation Technology with Refinement On-Line, SIGMOD 1998, Seattle, Washington, June 1998, pp. 567-569 [postscript].
1Undergraduate (EECS)
2Undergraduate (EECS)
3Undergraduate (EECS)

More information (Word97 document) or

Send mail to the author : (cao@cs.berkeley.edu)

CONTROL
Copyright © 1998, Regents of the University of California
Comments to: control@postgres.berkeley.edu
Document: CLOUDS: Online Data Visualization
Date Last Revised: December 1st, 1998