Word Sense Disambiguation for Web 2.0 Data
Project Information
- Discipline
- Computer Science (401)
- Orientation
- Research
In this work we plan to create an architecture that will allow for a variety of parallel similarity and parallel clustering algorithms to be tested and developed to be run against Web 2.0 data. These algorithms will be used to analyze emerging semantics and word senses within the data.
Intellectual MeritUser generated data on the Web is but one example of where researchers are seeing the challenges of "big data." This data phenomena can be described as a problem of where large datasets are being generated and updated to scales where it becomes difficult to store, manage, and visualize among other challenges. This project will allow students and researchers to investigate the challenges of big data from a computer science and engineering perspective. The goal of this project is to specifically investigate a natural language processing problem (word sense disambiguation) that will provide results to the specific problem as well as provide information to the greater context of the big data paradigm. The project is supported by two faculty members and a Ph.D. student in computer science. Insight gained from this project will benefit the following research communities: natural language processing, information modeling, as well as cloud and grid computing.
Broader ImpactsThe broader impact of this project is to provide a Ph.D. student a dissertation topic that can then be expanded into future teachings for students at Indiana University. The project ties well into Indiana's School of Informatics and Computing mission teaching and researching computing and information technology topics while integrating these topics into scientific and human issues. The results of this project will allow other institutions to utilize the methodologies and framework to perform the same experiments.
Project Contact
- Project Lead
- Jonathan Klinginsmith (jklingin)
- Project Manager
- Jonathan Klinginsmith (jklingin)
Resource Requirements
- Hardware Systems
-
- india (IBM iDataPlex at IU)
- xray (Cray XM5 at IU)
Investigate emerging semantics in natural language web 2.0 data.
Scale of UseAround ten VMs to run experiments. We will use these VMs many times over the course of a couple of months to test a variety of algorithms.
Project Timeline
- Submitted
- 11/02/2010 - 10:40