Given by Geoffrey Fox at Beijing Web Tutorial on May 27-30 1997. Foils prepared 5 July 97
Outside Index
Summary of Material
We review some of the base material assumed in CPS616 using curricula material taken from CPS606 |
History and Structure/Size of Internet and Web |
Basic Internet and Web Services |
What is WebWindows and basic Web architecture |
Overview of Networking for Internet |
MIME HTTP |
but not HTML or CGI (see separate presentations) |
Outside Index
Summary of Material
http://www.npac.syr.edu/users/gcf/cps616webreviewjune97 |
Material from CPS606 Assumed in CPS616 |
Used in Trip to China May 97 |
Geoffrey Fox |
Syracuse University NPAC |
111 College Place Syracuse NY 13244 4100 |
3154432163 |
We review some of the base material assumed in CPS616 using curricula material taken from CPS606 |
History and Structure/Size of Internet and Web |
Basic Internet and Web Services |
What is WebWindows and basic Web architecture |
Overview of Networking for Internet |
MIME HTTP |
but not HTML or CGI (see separate presentations) |
For the World Wide Web or Internet Itself |
For use in Enterprise/Corporate Information Systems
|
Use of Web Technology as base software Infrastructure
|
The World Wide Web (WWW) (the Web) is a hyperlinked collection of documents and programs that reside on computers all over the world, linked by the Internet. |
This talk will show the underlying components and mechanisms that make the Web work.
|
This works on a world-wide basis is because these protocols are based on Open Standards which have been implemented by many vendors on a variety of machines. The Web software structure is strictly non- proprietary, while allowing proprietary pieces to fit in where needed. |
The same architecture and software that makes the Web work is also suitable for implementing distributed applications between hetereogeneous machines and networks. This makes the architecture attractive for the corporate Intranet as well. |
Server: A program in charge of a resource or information.
|
Client: Any program that makes a request for service from the server. |
Clients and servers send their messages over a network connection. |
All over the world, users can use browsers to access information stored in multimedia document collections of web server machines. Programs are also accessible through the Common Gateway Interface (CGI). |
All over the company, employees (and possibly affiliates and the public) can use browsers to access databases and use distributed applications stored on server machines, using web technology to interface to existing databases and applications. |
Browsers have SAME interface on ALL Computers |
CGI Programs are typically written in PERL but can be essentially ANY UNIX Process and so do simulation, database access (this is Oracle WoW), advanced document processing etc. |
The Internet is a loose federation of networks. |
Cooperative organization - no administration, no fees. Protocols and standards are evolved through the IETF, Internet Engineering Task Force. |
Most national and international networks are members: NSFNET, ESNET, ARPANET, BITNET |
All these networks are packet switched systems based on TCP/IP. Together these protocols allow for communication over a wide variety of technologies. Machines called gateways connect the networks. |
Standard domain name system - names are looked up by name server to obtain routing information.
|
1969 The first locations commissioned by DOD (ARPA) |
1971 # host computers = 23 |
1982 Standards for TCP and IP established. |
1983-4 Name server and domain name server developed. |
1984 #host computers > 1,000 |
1986 NSFNET backbone established, 56Kbps |
1987 #host computers > 10,000 |
1989 NSFNET backbone upgraded to T1 (1.544Mbps)
|
1992 Internet Society is chartered, World Wide Web released by CERN
|
1993 NSF experiments with 600 Megabit backbone
|
From General Magic http://www.genmagic.com/Internet/Trends/ |
From General Magic http://www.genmagic.com/Internet/Trends/ |
From General Magic http://www.genmagic.com/Internet/Trends/ |
From General Magic http://www.genmagic.com/Internet/Trends/ |
From General Magic http://www.genmagic.com/Internet/Trends/ |
From General Magic http://www.genmagic.com/Internet/Trends/ |
Each of three components (network connections, clients, servers) has capital value of order $10 to $100 Billion |
InfoVision is ultimate "client-server" application
|
Democracy on the NII (Gore)
|
Telnet basically allows you to log in to a system over a network just as though you were logging in from a terminal attached to the system or from a dial-up modem. |
You may use telnet from a command line such as:
|
where you give the internet name of the machine that you wish to connect to. The telnet service will proceed to ask you for a name and password just as if you were logging in. |
Or you may have a telnet program which prompts you for the same information. |
Between two unix systems, you can use the rlogin command instead. |
Mostly, you must already have an account on the machine to log in. There are a few publicly available telnet machines, such as the FAA Flight Service at duats.gtefsd.com, where student pilots can log in to get the latest weather data. |
FTP (File Transfer Protocol) is the way that people transfer files from one internet machine to another. |
You can use the ftp protocol directly from Unix machines using a command line:
|
where it will prompt you for an account login name and password. You will then be connected to the home directory of that account and can use commands to move around the directory structure (cd and ls) and commands get and put to copy a file to or from your original location. |
Other ftp interfaces may be provided by your telnet program, or by other software programs such as fetch. |
FTP will transfer files of all types and formats. If the files are large, such as images, you may want to transfer in binary mode (the default is ascii). |
Some machines may provide a special ftp account called "anonymous". You use your ftp program as usual, except that the login name is "anonymous". The password can be anything, but netiquette obliges you to give your email address. The directory that you are connected to is a public directory provided by the host machine. |
Usenet newsgroups provide discussion forums on a wide range of topics. You can read the forums from a news server installed at your site. |
The topics are organized into hierarchies. Some of the main categories are
|
Subtopic names are always shown as part of the hierarchy
|
People participate in newsgroups by contributing messages, called "posting", which everyone else on the list can read. |
Some newsgroups are moderated, which means that posted messages are scanned by a human for appropriate content and style before being made public. |
Many software packages are news readers, including Netscape web browsers - just ask your systems administrator what news server to use. |
Other discussion forums on interesting topics are provided through mail lists. The discussion is delivered through your regular email. |
In this case, the discussion is again provided through messages. But instead of posting the message through special software (as is the case with news readers), the message is sent to an email address, and then forwarded to everyone in the group. |
Mail list addresses
|
Mail lists may also be moderated. |
The World Wide Web is a collection of documents located all over the world, and which can have links to images, motion videos and audio files. |
Links use Web addresses called URL's (Uniform Resource Locators) which have the form
|
where
|
Types of files follow the standard MIME (Multipurpose Internet Mail Extensions) originally developed to include multimedia and multi-part content with electronic mail messages. |
File extensions on the server tell which MIME format the file is in. |
The browser is configured to have a set of helper applications or "plug-ins" to appropriately display or play files in various MIME formats. |
Web servers provide what is called HTTP service (for HyperText Transfer Protocol), but links can also direct connections to other Internet services. |
For other services, the Web server transfers the connection to the appropriate server. |
Image types:
|
Audio types:
|
Video types:
|
Forms are used to allow the user to send information from the browser back to the server. |
The server must provide a program, called a CGI script, that will process the user information and provide an appropriate response.
|
The CGI program parses the input from the server and performs any number of computing and data access functions:
|
When the CGI program terminates, the server closes the connection. |
Search Engines enable users to look up text documents stored on the Web, usually by one or more keywords appearing in the document. |
Information gathering and filtering
|
Indexing: the information gathered by the robots is organized into an indexing database at the search server.
|
Searching: the indexing database allows (keyword) searches by the user.
|
User Interface
|
Challenge of WWW search:
|
There are evolving/confusing/overlapping capabilities ... |
Many useful Web applications provide a web page interface to a commercial product database of information. |
This is currently done through CGI scripting. |
The database must have a programmable interface (in addition to an interactive interface). For relational databases, this has been standardized in the query language SQL. |
Web queries to the database are taken from an HTML form, the information is passed to the CGI script, which makes appropriate SQL queries to the database. The results of the database query can be formatted and returned to the web page. |
Developed by Netscape from HTML scripting language LiveScript, and including some features of Java, that allows HTML authors to have more control over the behavior of the browser. |
JavaScript is text embedded in an HTML document using the <SCRIPT> tags, which a JavaScript browser will interpret (and other browsers ignore). |
JavaScript can perform animations, respond to buttons and other forms of user input, and allow the author more control over the appearance of the Web Page. |
JavaScript can also provide an object-oriented view of other browser plug-in programs. |
Reference: JavaScript Authoring Guide at http://home.netscape.com/eng/mozilla/Gold/handbook/javascript/ |
Java is a general-purpose object-oriented language developed by Sun with the capability of providing distributed computing through the Web (http://www.javasoft.com). |
Browsers (HotJava, Netscape 2.0/3.0 ..) supporting Java allow arbitrarily sophisticated dynamic multimedia applications inserts called Applets, written in Java, to be embedded in the regular HTML pages and activated on each exposure of a given page. |
VRML is a computer graphics language for describing 3-Dimensional scenes. It was developed as a standard for the WWW from OpenInventor of SGI. |
VRML includes language elements for creating simple shapes, various lighting effects, applying textures to shapes, and various points of view (referred to as cameras). |
A VRML enabled browser will recogize VRML files of the form file.wrl, and create an interface where the user has controls to fly through space and examine objects. |
Objects within a VRML scene may be configured as URL links to other Web pages of any document type. |
VRML documents are huge - most serious current drawback to using VRML more widely on the Web is the slow download time. |
New versions of VRML include motion in the scenes. |
In future one will NOT write software for either
|
Rather one will write software for WebWindows defined as the operating environment for World Wide Web |
WebWindows builds on top of Web Servers and Web Client open interfaces as in
|
Applications written for WebWindows will be portable to all computers running Web Servers or Clients which hide hardware and native O/S specifics |
WebWindows Interface |
Further WebWindows Software will be modular and allow plug and play insertion of capabilities developed around the Web World -- not a bunch of isolated stovepipe solutions
|
As an example some of Current Netscape and last year(!) NPAC's WebTools implements UNIX shell/PC file manager capabilities in terms CGI scripts -- allows universal access to these capabilities including powerful Web based (mh) mail |
NPAC's WebFoil is HotJava/Netscape 1,2,3 Open replacement for Powerpoint/Persuasion |
Particular Application areas (Business, Healthcare, Education) will be built on top of generic NII services so that for instance
|
Persuasion and Powerpoint are rather similar monolithic packages which can for instance only be clumsily ported to UNIX as cannot access internal data-structures defining foils |
WebFoil (NPAC prototype WebWindows presentation package) has |
Extended open HTML source manipulated by powerful PERL5 scripts allowing global changes and linkages of foils from many sources
|
Backend Oracle database illustrating modular WebWindows approach |
Using Appropriate templates WebFoil Uses Hotjava or Netscape 1,2 or 3 to display HTML with full Web Power including applets to enable Multimedia and dynamic presentations |
Initial webfoil 0.1 release Halloween 1995 |
Rome Laboratory Collaborative and Interactive Visualization Jan 31,96 |
The WebTop Productivity environment will be built in a more modular fashion than current PC Windows or Macintosh arena
|
Java or equivalent future technology is key to understanding how WebWindows application/service software will look as it allows balanced client server applications to be built |
Note require an open display software so can produce appropriate customized interfaces for browsing, presenting, word processing etc. |
There are evolving/confusing/overlapping capabilities ... |
Application Specific NII Specific Services for
|
We have a set of Services hosted by Web Servers and accessed by clients |
Groups of clients (electronic societies) are linked by collaboration systems such as TANGO |
Access |
Resources |
Store |
Multimedia Information |
TANGO Server |
File Systems |
and/or Database |
Object Broker |
Database |
Simulation |
Computer |
Person2 |
Shared |
WhiteBoard |
Shared Client Appl |
Person1 |
General User |
The first section of this talk covers basic networking terminology, the OSI networking layers, the TCP/IP protocol, and routing. |
A computer network is a communication system for connecting end-systems usually called hosts. |
A local area network, LAN, connects computer systems within a few kilometers, usually within a single building. A common technology is Ethernet, which operates at 10Mbps (million bits per second). Computers or workstations connect to the LAN via an interface card. |
A wide area network, WAN, connects computers in different cities or countries. A common technology is leased telephone lines operating between 9600 bps and 1.544 Mbps. |
Computers in a network use a set of protocols to communicate. |
Network communication protocols are usually described via a set of layering conventions from the International Standards Organization (ISO) known as the Open Systems Interconnection (OSI) Model. |
We simplify the model to the four lowest software layers - user applications use the process layer and the remaining three are usually included in the operating system, such as Unix, which has an OSI stack to process messages through the layers. |
TCP - Transmission Control Protocol. A connection-oriented protocol used by most Internet applications to provide a reliable, full-duplex, byte stream for a user process. |
UDP - User Datagram Protocol. A connectionless protocol for user processes. Also not reliable. |
ICMP - Internet Control Message Protocol. Handles error and control information between gateways and hosts. |
IP - Internet Protocol. Provides the packet delivery service for the upper layers. |
ARP - Address Resolution Protocol. Maps an Internet address into a hardware address. |
RARP - Reverse Address Resolution Protocol. |
Each layer adds control information to the message - this process is called encapsulation. |
The Internet is a packet-switched network. Each message (or document) is broken up into a number of packets. Each packet has an address. A computer called a router sits on the local network and decides where to send it first on its way to its final address. Each computer along the network connection examines messages that come in and either keeps it or reroutes it along its way. The message is reassembled on the other end. |
Multiplexing - Different protocols can be used to send different messages through the same network. |
Fragmentation and reassembly - Most networks have a maximum packet size. In the TCP/IP protocols, the IP layer breaks up two long packets into a sequence of shorter frames, which are reassembled on the other side. |
Sequencing is the property that data is received by the receiver in the same order as transmitted by the sender, which is not true in a packet-switched network. |
Error control guarantees that error-free data is received by the application programs. Data can either get corrupted by the transmission medium or get lost. Checksums are added to the data and received data is acknowledged. If there is any problem, retransmission occurs. |
Flow control assures that the sender doesn't overwhelm the receiver by sending data at a faster rate than it can process. |
Error and flow control are handled on an end-to-end basis by TCP and on a hop-by-hop basis by IP. (A hop goes to only one intermediate machine on the network route.) |
Performance of network delivery depends on the size of the message, the capacity of the various pieces of network that the message may travel along and the congestion of the network. |
All the network protocols just discussed are agreed on by various standards committees. The principal standards organization of the Internet is the Internet Engineering Task Force (IETF). The principal standards organization of the WWW is the World Wide Web Consortium (W3C). |
Some material presented here comes from Internet documents. Here is a summary of various document formats you may find. |
Internet Drafts
|
Internet Memos
|
Internet Standards
|
Here are a few sample Internet documents relevant for Internet and WWW message-passing. |
RFC-822: Crocker, D., "Standard for the Format of ARPA Internet Text Messages", SRD 11, RFC 822, UDEL, 1982. |
RFC-1036: R. Horton and R. Adams, "Standard for Interchange of USENET Messages", RFC 850, AT&T, December 1987. |
RFC-1521: Borenstein, N. and Freed, N., "MIME (Multipurpose Internet Mail Extension) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, September 1993. |
RFC-1524: Borenstein, N. "A User Agent Configuration Mechanism for Multimedia Mail Format Information", RFC 1524, Bellcore, September 1993. |
Internet Draft: Tim Berners-Lee, "Basic HTTP", CERN, 1992/3. |
We all know and use it, but here is a formal specification. |
Each message is a stream of 7-bit ASCII chars which contains a header and optional (newline separated) body. |
Header consists of a set of entries with one entry per line given by a colon separated key:value pair. |
Key contains no spaces or tabs and cannot exceed 63 chars. |
Body is a fully unstructured sequence of ASCII chars. |
There is a finite set of standard keys and an extension mechanism via the "X"-prefix. The standard set (as used by MH) is: |
Date Bcc Resent-Date Resent-Fcc |
From Fcc Resent-From resent- |
Sender Message-ID Resent-To Message-Id |
To Subject Resent-cc Forwarded |
cc In-Reply-To Resent-Bcc Replied |
Goals
|
Retain RFC-822 header+body format |
Add new header fields |
Allow for multipart multimedia bodies |
Include media type and encoding information in new header fields such as: Content-Type, Content-Description, Content-Transfer-Encoding, Content-ID |
Retain 7-bit ASCII for all valid encoding schemes |
Implement multi-component bodies via a special 'magic type' Content-Type: multipart |
Two level hierarchical typing scheme adopted of the form: basetype/subtype |
Seven base media types are defined this minimal set is enforced, i.e. all extensions must pass the whole ID->RFC->STD process. |
Allow for less restrictive subtyping the base types, for example:
|
Some standard subtypes are specified and many more are expected. New subtypes must be registered with the IANA (Internet Assigned Numbers Authority). |
Private experimental subtypes prefixed with "X-" may be used freely and without registration. |
Seven base types are: text, image, audio, video, multipart, message, application. |
text
|
image
|
audio
|
video
|
multipart
|
message
|
application
|
Server: A program in charge of a resource or information.
|
Client: Any program that makes a request for service from the server. |
Web servers provide access to a collection of files containing hyperlinked information
|
Browsers provide an easy graphical interface for users to request information. The client machine also provides viewers for a standard set of image and video formats. |
The interface is kept very simple to run on all networks and most machines. |
HTTP provides an upper level to the Internet, that is, it is built on top of a back-bone network with all the packets flowing from client to server and vice versa using the standard TCP/IP protocol. |
It uses MIME formats and concepts, but does not fully conform to MIME as the WWW is not a mail system. |
HTTP protocol is compatible with other network services such as FTP (File Transfer Protocol), NNTP (Network News Transport Protocol).
|
The HTTP service is standardly assigned to port 80 - it provides a much shorter service connection than the other services. |
The HTTP daemon is the server which responds to the Internet service requests on standard port 80 (or on another custom port). The server program is available from NCSA and is easily installed by editing a set of configuration files which give directory locations for documents, cgi scripts, error messages and icons, and which allows for options regarding path names, domain access, and so on. |
A URL has the standard form
|
HTML hyperlinks typically use the service http for linking to other documents and media files. Some other internet services can also be used such as
|
In this way, a Web server can provide other Internet services through the browser interface. |
The machine is an Internet address and can either be a symbolic name provided by the Domain Name Service (DNS) or the IP numbers. |
If the port is not specified, it defaults to 80. |
The file.file-extension is given by any Unix path name starting from the directory known to the server as "document root". Which path names are valid is one of the options of the server - whether "public_html" is automatically put into the path name and whether paths starting with "~username" are allowed. |
In the http service, the file-extension is used to tell the browser what helper application to use to view the file. Typical file extensions are html, gif, jpeg, mpeg, au, ram, etc. |
For other services, the Web server transfers the connection to the appropriate server. |
On each hyperlink click, the browser (client) initiates a connection with the server at the "machine" (e.g. using UNIX BSD connect call on the default port 80, or a custom user-defined port) |
A request is sent to the server, formatted as a MIME-like message. |
The server replies with another MIME-like message which is received by the browser and either formatted in the browser window or viewed with a helper application. |
The connection is closed on both sides. (The exception to this is the "server push" connection.) |
GET /document.html HTTP/1.0 |
Accept: www/source |
Accept: text/html |
Accept: image/gif |
User-Agent: Lynx/2.2 libww/2.14 |
From: mnotulli@ukonaix.cc.ukans.edu
|
First line syntax is always: METHOD URL ProtocolVersion |
The following lines form a header of an (extended) MIME message |
"User-Agent" specifies the browser type |
"Accept" specifies MIME types recognized by the browser |
The server is expected to provide the requested data in one of these acceptable formats. |
HTTP/1.0 200 OK |
Date: Wednesday, 02-Feb-95 23:04:12 GMT |
Server: NCSA/1.1 |
MIME-version: 1.0 |
Last-modified: Monday, 15-Nov-94 23:33:16 GMT |
Content-type: text/html |
Content-length: 2345 --
|
<HTML><HEAD> |
<TITLE> Document Title </TITLE> |
. . . |
This message contains both header and body |
Some replies contain only header (e.g. error reports, such as HTTP/1.0 404 Not Found) |
GET request also contained header only, whereas POST request (see next example) contains both header and body |
POST /cgi-bin/post-query HTTP/1.0 |
Accept: www/source |
Accept: text/html |
Accept: video/mpeg |
Accept: image/x-rgb |
Accept: application/postscript |
User-Agent: Lynx/2.2 libwww/2.14 |
From: grobe@unanaix.cc.ukans.edu |
Content-type: application/x-www-form-urlencoded |
Content-length: 150
|
org=Academic%20Computing%20Services |
&users=10000 |
&browser=lynx |
&contact=Michael%20Grobe%20grobe@kuhbuh.cc.ukans.edu |
Both header and body present in POST requests - the body is typically used to pass a form contents to the server. |