P2P Peer-to-Peer Networks In this issue of Web Computing, we briefly discuss Peer-to-Peer Networks; an area which some consider to be the next "killer application" for the Internet. As we will show here, this area has several important technology challenges and applications that vary from the sublime to the ridiculous. Our goal is not to espouse any particular approach but rather to bring this area to the readers' attention and suggest some emerging research areas and opportunities. Like most such over hyped concepts, P2P is rather loosely defined and covers a set of rather disparate ideas. Perhaps the only common theme is a client oriented view of the world and P2P can be thought of as Power to the People. Clients do most of the work, communicate with each other and really are the most important computers; servers may be around and even essential but remain subservient to the clients. We will first discuss the most well-known and popular P2P system, Napster and then try to be a little more systematic in covering the other players in this field. Shawn Fanning developed the original Napster application and service in January 1999 while a freshman at Northeastern University. Napster allowed any client to advertise any MP3 files stored on its disk and choose to download MP3 files from other clients connected to the Napster server network. It is said that Shawn was taking a computer-programming course at Northeastern, but had to buy a programming book to build Napster. Like most good ideas, Napster was designed to solve a real need - in this case to allow Shawn, a musician himself, to share his music with his friends on campus. The system has become staggeringly popular. Quoting a legal opinion from last summer, approximately 10,000 music files are shared per second using Napster, and every second more than 100 users attempt to connect to the system and there will be 75 million Napster users by the end of 2000. [http://news.cnet.com/News/Pages/Special/Napster/napster_patel.html] Napster has some other typical P2P services; Instant Messenger, chat rooms, "Buddy lists" and information about "hot music" but the key feature is the ability to share files between any Internet connected consenting clients. This is roughly the Web version of NFS (Network File System) familiar from traditional computing environments. There are some key features of this system. MP3 files are important as a popular digital encoding for audio files; it is straightforward to "rip" files off an audio CD and look up key meta-data (Artist, Title etc.) in a CDDB database on the Web (http://www.gracenote.com/). The audio and meta-data can be stored and accessed as a single unit. Although a server is used to establish the initial connection, the file transfer is done efficiently - directly from client to client; a P2P service. This is an improvement over most NFS systems where use of distributed files is not easy (except possibly for the originator) as usually all you have is a filename. The added value of meta-data for files lies at the heart of the Semantic Web - a vision from the W3C Web Consortium related to P2P. [http://www.w3.org/2001/sw/] Now Napster happens to be controversial as the audio files are typically copyrighted but the two concepts - Web-based NFS and meta-data enhanced files are fundamental and broadly applicable. The legal debate continues; at the Napster site, [http://www.napster.com] we are exhorted "Napster is under fire! The recording industry won't stop until they've shut down file sharing. We're not going to let them. You can make a difference. Join the Napster Action Network now! You have the power to keep file-sharing over the Internet alive. High-powered lobbyists and big campaign contributions should never win out over the will of the people." However the legal problems are a feature of the particular content; the technology is lasting and in my opinion uncontroversial and one implementation of an essential P2P capability. There some 200 available Napster clones to support this area [http://www.ultimateresourcesite.com/napster/main.htm] Currently the most popular is Imesh [http://www.imesh.com], which has some 2 million users and can share any type of file. Some of the best known file sharing systems are MojoNation [http://www.mojonation.net], Freenet [http://freenet.sourceforge.net/] , Gnutella [http://gnutella.wego.com/] These are not server based like Napster but rather support waves of software agents expressing resource availability and interest propagating among an informal dynamic networks of peers. There are many interesting ideas being explored; breaking shared files into many parts to both increase bandwidth (parallel I/O) and increase security of content as no one site can access files without cooperation from its peers. This type of technology is controversial as it makes censorship very hard. MojoNation has a load balancing and scheduling algorithm in the form of micro payments to reward those who contribute most to the community of peers. Gnutella - which is a family of related products -- is usually described as a P2P search engine as its interface is nearer that of a search engine than a Web file system. General discussions on P2P technology can be found at two good web sites; http://www.openp2p.com from the O'Reilly group and http://www.peer-to-peerwg.org/ as an industry working group originally initiated by Intel. There is a remarkable book Peer-to-Peer: Harnessing the Power of Disruptive Technologies by Andrew Oram, Nelson Minar, Clay Shirky, Tim O'Reilly (March 15, 2001, O'Reilly & Associates; ISBN: 059600110X) ; I recommend it to anybody interested in this field. Above we discussed some basic P2P services; file registration, access and search. We can categorize other P2P systems in various ways; we will choose distributed computing, collaboration, and core technologies. Let us give an overview of these three areas. The distributed computing P2P applications are very well illustrated by the CISE article Distributed Projects Tackle Protein Mystery by Keri Schreiner in the first issue of 2001. This discussing the use of millions of Internet clients to analyze data looking for extraterrestrial life (SETI@home http://setiathome.ssl.berkeley.edu/) and the newer project examining the folding of proteins (Folding@home http://www.stanford.edu/group/pandegroup/Cosm/). These are building distributed computing solutions for applications, which can be divided into a huge number of essentially independent computations, and a central server system doles out separate work chunks to each participating client. In the parallel computing community, these problems are called "pleasingly or embarrassingly parallel". This approach is included in the P2P category because the computing is Peer based even though it does not have the "Peer only communication" characteristic of all aspects of Gnutella and Napster for information transfer. SETI@home and Folding@home are elegantly implemented as screen savers that you download. Other projects of this type include United Devices (http://www.ud.com/home.htm based on SETI@home), AppliedMeta (http://www.appliedmeta.com based on well known Legion project from the University of Virginia), Parabon computation (http://www.parabon.com), Condor (from Wisconsin http://www.cs.wisc.edu/condor/) and Entropia (http://www.entropia.com/). Other applications for this type of system include financial modeling, bio-informatics, web performance and the scheduling of different jobs to use idle time on a network of workstations. Ian Foster has given a more detailed review of these activities at http://www.nature.com/nature/webmatters/grid/grid.html and related them to computational grids (http://www.gridforum.org). Collaborative systems form a rather different type of P2P network. We have a community of clients working together and sharing different Internet resources. Probably the Instant Messenger (IM) or various forms of chat room are the most used capability in this arena. We have a set of clients exchanging messages with each other. Unlike the file-sharing case, one typically needs to multi-cast the same message to multiple clients at the same time and the best architecture is still active research. In fact it is area where your fearless reporter works (See the Garnet system at http://aspen.csit.fsu.edu/collabtools/). Groove Networks (http://www.groove.net/) founded by the creator of Lotus Notes is the best-known P2P collaboration project and uses relay servers to implement the P2P multi-cast. Collaboration systems form an "illusion of P2P" using some static or dynamic suite of servers to optimally route messages. When the clients are scattered around the world, the relaying server(s) would perhaps be in the "middle of the Web; when a group of clients are clustered together then their relay would be "on the edge' and perhaps dynamically created on a peer machine of this cluster. Typically one also needs some sort of server to establish the initial session and manage the permanent state. So this type of P2P application gives a rich mix of true peers and servers. Collaboration systems offer the IM/Chat/email capabilities but also other shared resources like white boards, shared documents and audio-video conferencing. The HearMe system is a nice example of the P2P illusion (http://www.hearme.com). A central server manages digital audio conferences with a general mix of phone and pure Internet audio. As their technology advances, they should move to the Groove and Garnet models with dynamic relay servers positioned throughout the Web. All forms of collaboration are supported by some form of messaging with the message (typically called an event) carrying a variety of content including text of the IM, pixel changes to record a changed shared display (frame buffer) or as above digital audio packets. XML is the natural way of encoding such messages and the open source Instant Messenger Jabber (http://www.jabber.org) provides a clean framework of this kind. Several Napster-like systems have based their service on IM technology; Aimster, http://www.aimster.com/) is one of the best known. OpenCola (http://www.opencola.com/) has a general XML framework to support P2P systems. Core technologies or services include P2P management, messaging, security, client grouping as well as the file or more generally object registration, discovery and access capabilities already discussed for Napster. These core capabilities are where we need to develop the community standards to enable different projects to interoperate. Sun Microsystems has two important technology projects. Jini (http://www.sun.com/jini/) deserves a column of its own; it has a beautiful simple model for dynamic self defining objects which act like Napster peers and register with distributed servers allowing other peers to discover and access them. JXTA (from juxtaposition http://www.openp2p.com/pub/a/p2p/2001/02/15/joy_keynote.html) is a new project from Bill Joy aiming at core P2P capabilities including grouping (of the peers) and security. Above we already touched on the need for messaging services to implement collaborative P2P systems. There is also a Java message service JMS (http://java.sun.com/products/jms/) providing the core publish/subscribe mechanism on which most P2P services are built. This needs some upgrading to join the P2P revolution; Sun should add XML, a more dynamic matching paradigm (of collaborating peers) and support for relay servers. I expect research and commercial experience to identify more base services, as we understand better the common needs of P2P systems. Management of resources in such a network must be an important challenge; it is our Nirvana - the Web Operating System. Maybe society can live in a Gallimaufry of unstructured knowledge swept back and forth by armies of Gnutella agents. However this will not do for what the Gartner Group (http://www.oreillynet.com/pub/d/547) and O'Reilly term Enterprise P2P needed by a Fortune 500 organization. Here we will need to manage structured information within a dynamic P2P grouping. We see the right approach is to generalize Napster and Jini; ensure that all objects are tied to meta-data (possibly in a separate record) that define the discovery, rendering, access and sharing characteristics. One homely example is family photos; usually these are indeed a Gallimaufry of folders haphazardly stuffed in shoeboxes with gets even worse for a community event recorded in shoeboxes across the nation. With the proper metadata and Enterprise P2P support such photos could be nicely organized and presumably of greater value. I would like to finish by pointing out another important characteristic of P2P networks. Namely the clients can be quite heterogeneous with the P2P session including basic desktops but also hand-held devices, cell phones and special interfaces for those having physical handicaps. This requires the copied files or shared objects to be differently rendered on each peer. This is quite possible with careful design of the XML meta-data for both clients and display devices. P2P networks can and will unite us all. There are lots of good research topics and obviously lots of business opportunities. P2P as part of the next wave of the Web is intellectually challenging to design and socially and intellectually rewarding to use.