Data Mining in Education

Abstract

Technological improvements enhanced our education experience and involved new terms like distance-learning. However, todays education systems are lack of proper assessment tools parallel to technology involved educations. The new hot topics in Computer Science, i.e. Data Mining, improving potential technologies which are candidates to fulfill these needs. This project suggests an architecture combining data mining and commodity technologies together in an education environment.

Table of Contents

  1. Introduction
  2. Related NPAC projects
  3. Related Work Outside
  4. Thesis Goal
  5. Near Term Implementation Issues
  6. Introduction

Recent technological innovations tremendously effected the education systems. Distance learning became a common technique in education. Technologies are adapted to many academic courses like the ones offered by Syracuse University to Jackson State students. In these courses, collaboration tools, mailing lists, bulletin boards are used, and rich set of online materials and reference links in WWW are presented to students.

Both synchronous and asynchronous techniques and resources are subject to continuous improvements. The computational science education group at the Northeast Parallel Architectures Center (NPAC) has developed a huge repository of online course material, which includes lectures, tutorials, and programming examples in various languages. To provide a regular basis synchronous interaction with students involving teachers and other learners, in addition to asynchronous learning materials, TANGO system was used to deliver CSC 499 over the Internet.

All these improvements in the technology based education methods bring new needs. Our education experience showed that the new face of learning has lack of assessment tools, which were used to be done by human interactions in traditional systems. The recent hot topic in Computer Science, Data Mining, opened the way to construct atomized assessment tools for the technology based educations.

The purpose of these project is to design an architecture using Data Mining Tools and Commodity Technologies for the assessment of distance education. Basically, using the online resources asynchronously reflects the students learning abilities. Current education experiences do not have the ability to evaluate the students informal responses to the given materials. The main architecture of the purposed system is based on the collecting access information to the web materials, i.e., web mining, and discovering access patterns of users using data mining tools. The results will be presented to the teacher and students as necessary combined with online analytical processing tools. Some other applications like Students Records will be involved in the analysis also.

2. Related NPAC projects
2.1 Student Records Database

3. Related World Projects

Popularity of the World Wide Web (WWW) on the Internet has exploded recently. Many organizations have invested a tremendous amount of capital to operate sites on the Web. These Web sites provide communications and services to their employees, customers, and suppliers. With money invested in these sites, there is a strong desire to understand the effectiveness of such investments and to find ways to realize the potential opportunities provided by the Internet. As a result, it has become important to understand user surfing behavior.[1]

World Wide Web usage mining and analysis tools are developed to understand user surfing behavior by exploring the Web server log files with data mining techniques. As the popularity of the Web has exploded, there is a strong desire to understand user surfing behavior. [1]

However, it
is difficult to perform user-oriented data mining and analysis directly on the server log files because they tend to be ambiguous and incomplete. There are a number of projects with innovative algorithms trying to identify user sessions by reconstructing user traversal paths. It does not require "cookies" or user registration for session identification. User privacy is protected. Once user sessions are identified, data mining algorithms are then applied to discover the most common traversal paths and groups of pages frequently visited together. Important user browsing patterns are manifested through the frequent traversal paths and page groups, helping the understanding of user surfing behavior. Three types of reports are prepared: user-based reports, path-based reports and group-based reports. [1]

Several Web server log analysis tools have been implemented. Some of these tools are very simple and do not attempt to identify individual user sessions. These packages are simply mechanisms through which a Web master can view the raw Web server statistics, such as hit counts and distributions based on geographic regions. Examples of this type of tool include wwwstat (http://www.ics.uci.edu/pub/websoft/wwwstat) and Analog (http://www.statslab.cam.ac.uk/~sret1/analog).

4. Thesis Goal

5. Near Term Implementation Issues

Reference
1. SpeedTracer: A Web usage mining and analysis tool, http://www.almaden.ibm.com/journal/sj/371/wu.txt