VUDM
Online Tutorial
Contents
Introduction
This webpage describes VUDM, the Visual User-model Data Mining tool, and provides a guide to its application, i.e., to user model data related to NDLTD, the Networked Digital Library of Theses and Dissertations (http://www.ndltd.org/),. The goals of VUDM are to provide both an overview and details of users, user communities, and usage trends of digital libraries.
The distinctive approach of this research is that we focus on analysis and visualization of users' implicit rating data, which was generated based on user tracking information, such as sending queries and browsing result sets - rather than focusing on explicit data obtained from a user survey, such as: major, specialties, years of experience, and demographics. The VUDM interface uses spirals to portray virtual interest communities, positioned based on inter-community relationship. Small face icons on the spirals are users in the community. The distances between user icons, or spirals, represent how similar they are. Each spiral has a tag that shows most frequently used items, queries or anchor texts, accessed by the community in the digital library. Therefore, viewing the overall distribution of spirals and user icons provides insight regarding social networks and their interests as well as usage trends of digital libraries. The visualization strategy of VUDM fully follows Ben Shneiderman's "Information Visualization Mantra: Overview first, then zoom & filter, then details on demand". This tutorial aims to improve the understandability and utility of VUDM.
System Requirements &
Installation
- Windows XP, 2000, Server 2003, or Vista
- Version 1.6 or later Java Runtime or Java Development Kit
- 512MB memory
- Mouse
- XGA(1024x768) or higher resolution monitor
Tutorial
Download and Installation
- Check the Java version of your computer. In a Command Prompt window, try "java -version", as shown in Figure 1. If the version is lower than 1.6.x, download a new Java Runtime Environment(JRE) or Java Development Kit(JDK) version of 1.6 or higher from http://java.sun.com/javase/downloads/index.jsp. Download "JDK update 4" for JDK or "Java Runtime Environment(JRE) 6 Update 4" for JRE.
Figure 1: Java version checking.
- Once a Java Runtime of 1.6 or higher version is installed, download VUDM from http://rocky.dlib.vt.edu/~shkim/VUDM_tutorial/files/VUDM_demo.zip.
- The zipped file VUDM_demo.zip contains an executable jar file "VUDM_demo.jar" and two data sets "200601" and "200602" in sub-directories. The names of data sets represent Year+Month during which the data were collected. For example, 200602 means that the data was collected in Feb. 2006. The executable jar file, VUDM_demo.jar, is an archive of Java libraries and environment setting files that executes VUDM without tedious configuring of the Java environment of your system.
- Move the executable jar and data sets into any folder, let's say "C:\VUDM" for this tutorial. Therefore, you will have C:\VUDM\VUDM_demo.jar and C:\VUDM\200601 and C:\VUDM\200602 after successful installation, see Figure 2.
Figure 2: VUDM demo is installed under C:\VUDM.
Start VUDM
- Open a Command Prompt window and move to the directory by typing in "cd C:\VUDM"
- Type in "Java -jar -Xmx512m VUDM_demo.jar" to invoke VUDM.
Figure 3: Invoke VUDM
The option "-Xmx512m" secures 512MB memory for running the VUDM. If you experience an "out-of-memory" error during loading a data set, increase the number after "-Xmx". The number should not be greater than the physical memory of your system.
Data Loading
-
VUDM is a multiple window application, so you can load multiple data set onto multiple windows at a same time. In order to load a data set, open the "File" menu. There are two options, "Open" and "Open in new window" in the "File" menu. Select the "Open" to load a new data set onto the current activated window. If there is no activated window, a new window will be generated, and then data will be loaded.
Figure 4: Open new data in current window.
Select the "Open in new window" to generate a new window and load new data onto the window. This feature is useful when you want to keep current opened windows and load another data set.

Figure 5: Open new data in a new window.
- Once you select either "Open" or "Open in new window", a file-selection dialog box will be popped up. Browse to "C:\VUDM" to select data folder. Select either "200601" or "200602" then click "Choose folder with data" button. You have to choose a FOLDER, not a single data file in the folders, as shown in Figure 6.
Figure 6: Select a data folder.
- Once you click the "Choose folder with data" button, the data loading process will begin. Progress of loading a data set is visualized by printing "R" in the Command Prompt window as Figure 7 shows.
Figure 7: During data loading, a series of "R" will be scrolled on the window. A "R" represents that a user data item is being loaded.
- After data loading is finished, a window with spirals and face icons will be popped up as Figure 8 shows.
Figure 8: User data of February 2006 is successfully loaded and visualized.
Data Exploration
- Overview
Figure 9: VUDM overview
Once data loading is successful, a new window will be generated. It will have a set of spirals and users; a term 'user' means a patron of a digital library throughout this tutorial; face icons will be visualized in the window in the portion marked '1' in Figure 9. VUDM can visualize multiple data at separate windows at a same time to make it easy to compare them, see the mark '2'. Open two or more different data sets, collected from different times, into separate windows. Through comparing them, we can see how the user communities had been changed during the period. The data name is printed at each window frame. The mark '3' is a filtering device so that you can control the strictness of the user community finding algorithm, which will be explained later in this tutorial. The marks '4', '5', and '6' identify information tables to show more detailed information. That is, '4' shows detailed information about a selected user community. You can click a spiral, which is a user community, with a mouse or just put the mouse cursor on it to select a community. This table contains Group ID, Group Size which is the number of users in the community, number of mentors, and group labels. The information table, marked '5', provides a detailed list of items, which are queries and noun-phrases which are referred by the users in the community you selected, sorted by their frequencies. Information table '6' is for detailed information about a user you selected. Try to move your mouse cursor over user communities and user icons, and see the contents of these three information tables change.
The marks '7', '8', and '9' represent mouse interactions for zooming-in, zooming-out, and panning, respectively. For zooming-in and zooming-out, press the RIGHT mouse button and drag forward or backward. For panning, press the LEFT mouse button and drag into your desired direction. Spirals and labels of user communities are clickable and dragable so that you can see hidden objects in a congested area.
Figure 10: VUDM visualization strategy
Figure 10 briefly describes the visualization strategy of VUDM. The distance between two user communities means the two communities is similar, which means the users of two communities share a lot of research interests or learning topics. A user in a community is ranked according to the similarity between the user and the community she belonged to. More close to the center of a spiral, the more similar and more expert to the topics representing the community. This information is obtained from user registration information, and the similarity between her used items and whole items of the community.
Sometimes, when you load a data set or click the "redraw" button, you may find nothing is displayed in the window. This is because no user community was found in that setting. Two factors affect the result of user community finding. One is the data size and the other is the strictness of community finding algorithm, we will call this value "theta" throughout this tutorial. The theta is the threshold of similarity between two users that consider the relation between the two users is meaningful for clustering. The other factor is the size of data. If the data set is too small, the probability that any user community is found is very small. If the theta value is too high the probability of finding a user community is also very low. Therefore, if no user communities were discovered, you may add more data or select a more generous theta value to see any user community. An example of using the theta value control slider bar will be provided below in the 'filtering' session.
- Zoom
After you successfully load a data set, you can zoom any part of users and user communities. Drag the map with the left button of the mouse, so that the area you want to zoom in locates at the center of the window. Click any part of the window with the right button of the mouse, and while holding the right button push or pull the mouse. Pushing the mouse will zoom into the center of the window and pulling the mouse will zoom out from the center of the window.
Figure 11: Zoomed image of a community.
In VUDM, a user can belong to multiple user communities at a same time. This is reasonable because a normal user has multiple interests and joined in multiple communities. The linked lines among user icons across user communities represent they are an identical user. You can see the linked lines by putting the mouse cursor over the user icon.
- Filtering
VUDM provides a filtering feature, a slider bar for adjusting the theta value, which is a threshold of the strictness of community finding algorithm. The higher the theta the more strict and precise communities will be found. The lower the theta, vice versa, the less precise and the greater number of user communities will be generated, see Figure 12.
Figure 12: Low theta value generates more but less-precise user communities. High theta values generate less but more precise user communities.
In this tutorial, because the demo data size is fixed and cannot be controlled, you can adjust the theta value by sliding the bar, to control the number of user communities. Setting theta to 0 means there is no restriction in making user communities among users. Therefore, the number of found user communities will be 2^n-1, which is meaningless, because any combination is allowed. If theta is set to 1, no two users will be grouped together into the same community and all user communities will have only one user, which is also meaningless. VUDM automatically removes too small user communities. Therefore, above a certain theta value, no user communities will be found. For the given data for this demo, the meaningful theta values are between 0 and about 0.2. Finding a meaningful theta is up to the VUDM user, and is dependent on the user data properties.
For examples, open the data set 200601 in a new window and try to change the theta values to 0.2, 0.18, and 0.15 by yourself. Move the slider bar into 0.20 or enter directly 0.20 at the text input box next to the slider bar, and click the "Redraw" button; then you will get the result shown in Figure 13.
Figure 13: User community of January 2006 (Theta = 0.2)
A user community labeled as "Digital Library, Electronic Theses and Dissertations" is found. This label is selected because the items "Digital Library" and "Electronic Theses and Dissertations" are most frequently used by all users in the community. You can check the full list of used items and their frequencies from the second information table located at the bottom right of the VUDM display.
Next, enter 0.18 at the text input box and then click the "Redraw" button. This will present a result as shown in Figure 14.
Figure 14: User community of January 2006 (Theta = 0.18)
The theta value 0.18, which is lower than the previous 0.20, produced more user communities. A total of five user communities, including the one found when theta is 0.2, were found and visualized. The VUDM also clusters user communities based on their similarities. That is, similar user communities locate close each other while non-similar communities are isolated. In this result, three communities are clustered because they share some topics, such as "Digital Library" and "Electronic Theses and Dissertations". The shared topics may be not fully displayed in their labels. You may have to look into the detailed item tables. The other two communities are isolated because they are not similar to other communities.
Now, for the last example, let's try one more theta value. Enter 0.15 at the text input box and then click the "Redraw" button.
Figure 15: User community of January 2006 (Theta = 0.15)
Understandably, more user communities were found because a lower theta value is used. Clusters of user communities are also affected by the low theta value.
Let's compare this data with the data of the following month. Open the data set "200602" in a new window and set the theta to 0.15, the same value used previously, then click "redraw". Now, you will see how the users and user communities have changed between the two months. Also, this provides insight regarding which topics or items attracted more users during the month. It is overview of usage trends of the digital library, NDLTD, at the time the data is collected.
Figure 16: User community of February 2006 (Theta = 0.15)
This section described the filtering feature of VUDM. You can control the number of user communities by changing the theta value. You will select a proper theta value according to the size of the data and its properties, to analyze the data.
- Detail on Demand
Figure 17: Details on demand.
Figure 17 shows how you can get detailed information about users and user communities. Besides the three detailed information tables, which are explained before, there are two buttons labeled as "Groups Statistics" and "Users Statistics" at the bottom left of VUDM main frame. The "Groups Statistics" button invokes a pop-up window that shows a table which contains detailed information about all user communities in the opened data. This table includes even the information of user communities which are not visualized. The "Users Statistics" button is to invoke a pop-up window that shows a table containing detailed information about all individual users in the data.
Release Notes
1st release: Feb. 7th 2008, version 0.9
Bug Reports and Feedback
To submit a bug or request a feature, please send an email to shk@vt.edu
Web Pages
Related web pages.
-
- questionnaire
- Questionnaire for library experts survey on VUDM.
- http://www.dlib.vt.edu
- Virginia Tech Digital Library Research Laboratory
- http://www.ndltd.org
- Networked Digital Library of Theses and Dissertations
- http://www.cs.vt.edu
- Department of Computer Science at Virginia Tech