Caltech Spring 1996 CS 138c: Projects


The suggested projects are supposed to be concrete enough that there is a ``place to start,'' yet open-ended enough that there are plenty of interesting possibilities for further exploration. These project ideas are intended to form the basis of projects rather than to try to give you an itenerary for the rest of the term. Note that most of these problems aren't entirely new, but that there should be evidence of originality in your project! Thus, you will have to expand on these ideas. You will be graded on your:
  1. Specification.
  2. Design.
  3. (where necessary) Proof.
  4. (where necessary) Implementation.
Projects will be term-long, and comprise 70% of your grade (the other 30% will be miscellaneous quizzes and programs).

o Eve

  1. Animating Algorithms. (example projects: algorithm comparison, visualization, performance) Choose a favorite algorithm or algorithms by browsing through 138 notes, Knuth, CLR's "Intro to Algorithms", Dijkstra's publications, a journal you've always wanted to read, etc. Animate the algorithm's behavior and compare and contrast performance under different conditions. Link your results into cs138 web pages on archetypes for posterity. Assess the benefits of a sequential, distributed, and/or parallel version of the algorithm(s).
  2. Coordination Algorithms. (example projects: multi-agent coordination and control, distributed agreement, consistency, synchronization) Design and implement one of the distributed algorithms that you might encounter some time in your life outside of cs138. In particular, model one of the coordination and consensus algorithms that help make the following systems function properly: airline or ticketmaster reservation systems, distributed multiplayer games, an on-line registrar's office, distributed voting in an election or at the UN, a network-based auction house, a distributed appointment calendar (as discussed in class), the stock market.
  3. Internet Protocols. (example projecs: algorithm analysis, proof of correctness, implementation, evolution) Examine the design of a seminal or emerging Internet algorithm; e.g., Transport Control Protocol/Internet Protocol (TCP/IP), Simple Mail Transfer Protocol (SMTP), Protocol Independent Multicast (PIM), Resource Reservation Protocol (RSVP), Mobile IP, or any number of routing algorithms that are in use. Create a prototype implementation, assess best/worst performance, prove correctness, explore the impact of scale and network unreliability on the algorithm. Discuss the algorithms' evolution and improvements over time.
  4. Global Snapshots. After having had to prove several termination detection algorithms for cs138b, you should all be experts on global snapshots. Implement a global snapshot algorithm among distributed entities.
  5. Network of Workstations. Design a system that would allow you to know about idle CPUs in a local-area network. As a result, you could use clusters of these idle machines for multiprocessor computations. Specify how machines are dynamically registered as available and unavailable. Prototype a centralized architecture for workstation resource management, then a distributed alternative. Speculate what it would entail to migrate something like this into the wide-area, or to track many thousands of machines. How might this system be refined to handle load balancing among processors during a computation ?
  6. Diffusing Computations. Design and implement a diffusing computation algorithm among distributed entities.
  7. Web Performance. Analyze HTTP, the main protocol used by the Web to transfer data between Web clients and Web servers. Assess HTTP's usage of the idea of protocol abstraction, i.e., its use of underlying protocols such as TCP an in turn IP for data transmission. How could it be optimized for better performance when ther are multiple transactions between the same client-server pair? How might the Web's approach to data caching be improved for better load balancing?
  8. Synchronous Game Archetype (distributed tic tac toe, checkers, othello, connect 4, chess).
  9. Workflow (office automata) - an acyclic graph with a token for each transaction, spawning new work processes where necessary.
  10. Implement "Sessions."

o Paul

  1. Comparison of Termination Detection Algorithms (discrete event simulation, math, research) Several algorithms for termination detection were described in class. The literature contains a multitude of others, including algorithms which involve acknowledging every message, but are message optimal. In addition, there are periodic snapshot algorithms which can be message supra optimal, but suffer a delay between the occurrence of termination and its detection. The problem is to analyze the conditions & parameters under which various algorithms outperform the others.
  2. Blocking Theorem for Matrices (maybe discrete math/graph theory?, research) Given a 2D (or nD) matrix, and the associated data dependencies between cell entries, what is the optimal subblocking of this matrix to minimize calculation time, given k processors? If a closed form solution is not possible in the general case, can we characterize the blocking structure for some of the simplest data dependencies (eg cells depend on neighbours only)? These cases could be solved by exhaustive analysis and simulation.
  3. Game of Spit (distributed systems, asynchronous interaction) Implement the game of spit across processes (or machines) using java & sockets. Consider the effects of network assymetries on the relative advantages of the players. Can the game be designed such that neither player has any advantage due to proximity effects? If time permits, consider session management issues such as player invitation, game construction, computer opponents, etc.
  4. Futures Market (distributed systems, asynchronous interaction) Implement a futures market server, patterned after the Iowa web site. Buyers and sellers trade futures over this market. Analyze issues of fairness with regards to network assymetries. If time permits, examine trading programs which can be linked to the market to buy and sell automatically to test various market models.
  5. Diplomacy (distributed systems, synchronous interaction, modes of interaction) Implement the game of Diplomacy. Players must be able to communicate with each other between moves to negotiate. Communication sessions can be 2 way or multiway! Moves occur in single synchronous act. With multiway sessions, it is important that every player see the same "history" of what was said. That way they can have a consistent discussion.

o Mika

  1. Network security on the Internet A network daemon provides access to clients that request some service, potentially from all over the Internet. The low-level protocols in use on the Internet do not include secure verification of the sender of those packets, nor do they include encryption to ensure that only the intended recipient of the data can understand it. A good CS138c project would be to explore the possibilities for ensuring security in such an environment. Passwords, public-key encryption, public key signatures would be possible ways to implement the secure layer of the protocol. The protocol could be applied to a specific application (e.g., a Java-based server of some kind), or it could be made more general. This project would probably involve studying literature, implementation of secure network protocols, and formal verification of those protocols.
  2. Fault tolerance in a specific distributed application Imagine implementing ``the'' calendar application as a distributed set of processes. We can imagine a solution with a centralized secretary and distributed professors, or a different solution where one of the professors plans a meeting. Now consider adding the very important real-world element of failures of processes and/or links. Even for a specific application such as this one, coming up with solutions that tolerate faults is a very open-ended problem. The project might involve literature searches, implementation, proofs, and simulation. This project has a lot of potential for deep formal arguments.
  3. Networks of workstations In the CS department, we have approximately a hundred workstations that are probably idle more than 90 % of the time. How would you write a runtime system that would allow you to use this network of computers as a single ``virtual multicomputer''? How would you find out which machines are idle and available for use by your application, and how would you manage the available resources? Implement a centralized architecture for workstation resource management, then a distributed alternative. (Use our systems, maybe you can do something useful with them!) Speculate what it would entail to migrate something like this into the wide-area, or to track many thousands of machines. How might this system be refined to handle load balancing among processors during a computation? This project is very open-ended and useful. You are not expected to implement a commercially viable system; we just want to see some insight and analysis of the issues involved. This is probably the most ``interesting'' and open-ended of these problems.
  4. The dating server At Caltech, it can be very difficult to find a date (depending on gender and other factors). In order to make most efficient use of the carce resources, we have decided to use our engineering skills to build a system to track dating opportunities for us. Here is a suggested mechanism for finding a date for the weekend:
    • Each user X enters his preferences in the form of an objective function to maximize over the set of possible alternative dates, and includes his email address in the request which is sent to the dating server. User X may include the day (e.g., Friday, Saturday, or don't care) he is available.
    • The requests are queued up on the server, which can respond immediately or later (perhaps by Wednesday evening) by either sending a time of the date and the identity of the matching party or by sending a message that no suitable matches were available this week. Note that the requests arrive asynchronously.
    There are endless variations on this problem. Explore distributed dating services, different types of objective functions, different types of mappings from personal to global objective functions (i.e., maximizing each person's happiness may not maximize the group's happiness.. how do you define the happiness of a group of people anyhow?), etc. An interesting idea is to make one of the parameters of the objective function the time at which the decision is made, such that a person's ``user agent'' will get less and less picky as the weekend approaches.
  5. Projects in VLSI(?) My own research interests are mainly in VLSI, so if you cannot find a good project from the lists of suggestions, I would be more than willing to suggest some very interesting distributed programming problems taken from that field. But I'm not limited to that... Here are a few examples:
    • Discrete-level circuit simulation. The differential equations governing the behavior of electronic circuits are well studied and well understood. However, the interdependence of the state variables of a given part of the circuit on every other part of the circuit make this problem very intractable as a parallel or distributed simulation. This need not be an insurmountable obstacle, especially in the case of digital circuits, since we know that in those highly nonlinear circuits, the state variables in one location are only affected by {\em large\/} swings of other nodes. By taking advantage of this, it is possible to decouple parts of the system from each other and achieve considerable parallelism. Can you implement a parallel discrete-level circuit simulator that works for digital circuits? There has been some research in this area, so a literature search is (as usual) the first place to start.
    • Layout generation. You are given to generate a particular circuit topology by drawing polygons connecting the various nodes. You are free to move things around the two-dimensional surface as you wish, and you may have multiple levels of interconnect. A very difficult problem, and since your solution is likely to be ``slow,'' you decide to investigate a distributed algorithm for layout generation. How do you partition the original circuit to achieve efficient end results? What heuristics can you apply? Does branch-and-bound work? This kind of project would require a great deal of planning to make it tractable yet interesting, but if you have a good idea for a solution, it may be possible to make progress by the end of the term.

o Adam

  1. Ticketmaster Reservation System. Coordinating multiple distributed asynchronous accesses to a finite resource, without priorities. There is a concert, and lots of people want to buy tickets from a number of distributed ticket dispatchers. The total number of tickets available for the concert is finite, and we assume that more people will want to buy tickets than are available. The dispatchers must coordinate with each other to service the ticket requests without overselling the number of tickets.
  2. On-Line Registrar's Office. Coordinating multiple distributed asynchronous accesses to a finite resource, with priorities. Students have distributed processes that asynchronously want to sign up for limited-enrollment classes; an undergraduate registrar process services undergraduate requests, and a graduate registrar process services graduate requests. Student processes make requests with certain priority, based on class standing and requirement of the class for graduating. The registrars must coordinate with each other to enroll students in classes without overenrolling them.
  3. Easter Eggs. Coordinating multiple distributed asynchronous accesses to a dynamic resource pool. Whenever a World Wide Web surfer first discovers a Web page with an "Easter Egg" on it, s/he is given two things: some electronic "money" for himself or herself, and some electronic "money" for him or her to give (as s/he sees fit) to any of the web pages s/he has found that contain Easter Eggs, based on his or her perceived quality of the visited page. A project can work on any one of several facets of this problem: maintaining a registery of Easter Eggs, maintaining a "bank" of player accounts and moneys collected, or coordinating the dispatch of new funds when a new Easter Egg is discovered.
  4. Distributed Auction. Coordinating multiple distributed asynchronous bids to control set of common resources, with bids being unbounded. There are 10 distinct items, and each player has a "values sheet" of how much single items or combinations of items are worth to that person. Each value sheet contains different values for items and combinations of items. When the auction begins, players bid on items they'd like to buy. Item prices can only go up. At a designated time, the auction closes, and people "buy" the items they want at the prices they bid, and people "collect" the total value of the items they bought, according to their value sheets. This can be played as a game with a specified number of rounds, but the challenge is in coordinating the asynchronous bids in a well-designed manner.
  5. Distributed Voting. Coordinating multiple distributed asynchronous bids to control set of common resources, with bids being bounded. Consider a slate with 5 ballots for 5 candidates. Each player has a "value sheet" of how much combinations of candidates winning the election is worth to him or her personally; each person's value sheet is different. When the election begins, each player casts a vote for the candidate s/he wants in each ballot; each player can then gauge as the election is going on whether to keep his or her votes, or change them. However, unlike the auction, each person can only place a single vote on each ballot. At a designated time, the polls close, and people "collect" the total value of the candidates electred, as per their value sheets. This can be played as a game with a specified number of rounds, but the challenge is in coordinating the asynchronous ballot castings and recastings in a well-designed manner.

o Unassigned so far

  1. Cellular Routing, as described by Jacob.
  2. Sealed Auction, like the auction described previously, but bids are sealed, and you have to make sure information is not leaked, as described by Jacob.

Last updated April 10, 1996 by cs138@cs.caltech.edu.

If you have any questions, contact the TAs Adam, Eve, Paul, or Mika.

Copyright © 1994-96, K. Mani Chandy. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy. Disclaimer: this collection of notes is experimental, and does not serve as-is as a substitute for attendance in the actual class.