The Internet is a loose federation of networks.
|
Cooperative organization - no administration, no fees. Protocols and standards are evolved through the IETF, Internet Engineering Task Force.
|
Most national and international networks are members: NSFNET, ESNET, ARPANET, BITNET
|
All these networks are packet switched systems based on TCP/IP. Together these protocols allow for communication over a wide variety of technologies. Machines called gateways connect the networks.
|
Standard domain name system - names are looked up by name server to obtain routing information.
-
symbolic names: npac.syr.edu
-
internet addresses: 128.230.7.2
|
1969 The first locations commissioned by DOD (ARPA)
|
1971 # host computers = 23
|
1982 Standards for TCP and IP established.
|
1983-4 Name server and domain name server developed.
|
1984 #host computers > 1,000
|
1986 NSFNET backbone established, 56Kbps
|
1987 #host computers > 10,000
|
1989 NSFNET backbone upgraded to T1 (1.544Mbps)
-
#host computers > 100,000
|
1992 Internet Society is chartered, World Wide Web released by CERN
-
NSFNET backbone upgraded to T3 (44.736Mbps)
-
#host computers > 1,000,000
|
1993 NSF experiments with 600 Megabit backbone
-
#host computers > 2,000,000
|
Telnet basically allows you to log in to a system over a network just as though you were logging in from a terminal attached to the system or from a dial-up modem.
|
You may use telnet from a command line such as:
-
> telnet nova.npac.syr.edu
|
where you give the internet name of the machine that you wish to connect to. The telnet service will proceed to ask you for a name and password just as if you were logging in.
|
Or you may have a telnet program which prompts you for the same information.
|
Between two unix systems, you can use the rlogin command instead.
|
Mostly, you must already have an account on the machine to log in. There are a few publicly available telnet machines, such as the FAA Flight Service at duats.gtefsd.com, where student pilots can log in to get the latest weather data.
|
FTP (File Transfer Protocol) is the way that people transfer files from one internet machine to another.
|
You can use the ftp protocol directly from Unix machines using a command line:
-
> ftp internethostmachinename
|
where it will prompt you for an account login name and password. You will then be connected to the home directory of that account and can use commands to move around the directory structure (cd and ls) and commands get and put to copy a file to or from your original location.
|
Other ftp interfaces may be provided by your telnet program, or by other software programs such as fetch.
|
FTP will transfer files of all types and formats. If the files are large, such as images, you may want to transfer in binary mode (the default is ascii).
|
Some machines may provide a special ftp account called "anonymous". You use your ftp program as usual, except that the login name is "anonymous". The password can be anything, but netiquette obliges you to give your email address. The directory that you are connected to is a public directory provided by the host machine.
|
Usenet newsgroups provide discussion forums on a wide range of topics. You can read the forums from a news server installed at your site.
|
The topics are organized into hierarchies. Some of the main categories are
-
alt - alternative topics
-
comp - computers and computing
-
misc - miscelleneous newsgroups
-
rec - recreational topics
-
sci - science-related topics
-
soc - social and cultural topics
|
Subtopic names are always shown as part of the hierarchy
-
sci.chem.electrochem and comp.parallel
|
People participate in newsgroups by contributing messages, called "posting", which everyone else on the list can read.
|
Some newsgroups are moderated, which means that posted messages are scanned by a human for appropriate content and style before being made public.
|
Many software packages are news readers, including Netscape web browsers - just ask your systems administrator what news server to use.
|
The World Wide Web is a collection of documents located all over the world, and which can have links to images, motion videos and audio files.
|
Links use Web addresses called URL's (Uniform Resource Locators) which have the form
-
http://www.place.org:8888/mydirectory/mydoc.html
|
where
-
http is the hyperlink web service
-
www.place.org is the internet name of the web server
-
8888 is the optional port number
-
/mydirectory/ is the directory or folder path to the document within the web server document space
-
mydoc.html is the document to be retrieved (with an html file extension)
|
Types of files follow the standard MIME (Multipurpose Internet Mail Extensions) originally developed to include multimedia and multi-part content with electronic mail messages.
|
File extensions on the server tell which MIME format the file is in.
|
The browser is configured to have a set of helper applications or "plug-ins" to appropriately display or play files in various MIME formats.
|
Indexing: the information gathered by the robots is organized into an indexing database at the search server.
-
Primarily keyword indexing is currently used - some full text searching is just on single site search engines.
-
Key issue is size of resulting database.
|
Searching: the indexing database allows (keyword) searches by the user.
-
Queries are formed, some number of most highly ranked results are returned.
|
User Interface
-
uniform interface for HTTP, FTP, GOPHER, WAIS, Harvest, Lycos
|
Challenge of WWW search:
-
estimated total size is 30 Gigabytes, 5 million documents (many search engines now take months to crawl the web to update index files.)
-
diversity - huge distributed database, unstructured, non-relational, hierarchical information with many formats.
|
Developed by Netscape from HTML scripting language LiveScript, and including some features of Java, that allows HTML authors to have more control over the behavior of the browser.
|
JavaScript is text embedded in an HTML document using the <SCRIPT> tags, which a JavaScript browser will interpret (and other browsers ignore).
|
JavaScript can perform animations, respond to buttons and other forms of user input, and allow the author more control over the appearance of the Web Page.
|
JavaScript can also provide an object-oriented view of other browser plug-in programs.
|
Reference: JavaScript Authoring Guide at http://home.netscape.com/eng/mozilla/Gold/handbook/javascript/
|
VRML is a computer graphics language for describing 3-Dimensional scenes. It was developed as a standard for the WWW from OpenInventor of SGI.
|
VRML includes language elements for creating simple shapes, various lighting effects, applying textures to shapes, and various points of view (referred to as cameras).
|
A VRML enabled browser will recogize VRML files of the form file.wrl, and create an interface where the user has controls to fly through space and examine objects.
|
Objects within a VRML scene may be configured as URL links to other Web pages of any document type.
|
VRML documents are huge - most serious current drawback to using VRML more widely on the Web is the slow download time.
|
New versions of VRML include motion in the scenes.
|
Multiplexing - Different protocols can be used to send different messages through the same network.
|
Fragmentation and reassembly - Most networks have a maximum packet size. In the TCP/IP protocols, the IP layer breaks up two long packets into a sequence of shorter frames, which are reassembled on the other side.
|
Sequencing is the property that data is received by the receiver in the same order as transmitted by the sender, which is not true in a packet-switched network.
|
Error control guarantees that error-free data is received by the application programs. Data can either get corrupted by the transmission medium or get lost. Checksums are added to the data and received data is acknowledged. If there is any problem, retransmission occurs.
|
Flow control assures that the sender doesn't overwhelm the receiver by sending data at a faster rate than it can process.
|
Error and flow control are handled on an end-to-end basis by TCP and on a hop-by-hop basis by IP. (A hop goes to only one intermediate machine on the network route.)
|
Here are a few sample Internet documents relevant for Internet and WWW message-passing.
|
RFC-822: Crocker, D., "Standard for the Format of ARPA Internet Text Messages", SRD 11, RFC 822, UDEL, 1982.
|
RFC-1036: R. Horton and R. Adams, "Standard for Interchange of USENET Messages", RFC 850, AT&T, December 1987.
|
RFC-1521: Borenstein, N. and Freed, N., "MIME (Multipurpose Internet Mail Extension) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, September 1993.
|
RFC-1524: Borenstein, N. "A User Agent Configuration Mechanism for Multimedia Mail Format Information", RFC 1524, Bellcore, September 1993.
|
Internet Draft: Tim Berners-Lee, "Basic HTTP", CERN, 1992/3.
|
We all know and use it, but here is a formal specification.
|
Each message is a stream of 7-bit ASCII chars which contains a header and optional (newline separated) body.
|
Header consists of a set of entries with one entry per line given by a colon separated key:value pair.
|
Key contains no spaces or tabs and cannot exceed 63 chars.
|
Body is a fully unstructured sequence of ASCII chars.
|
There is a finite set of standard keys and an extension mechanism via the "X"-prefix. The standard set (as used by MH) is:
|
Date Bcc Resent-Date Resent-Fcc
|
From Fcc Resent-From resent-
|
Sender Message-ID Resent-To Message-Id
|
To Subject Resent-cc Forwarded
|
cc In-Reply-To Resent-Bcc Replied
|
Goals
-
Multimedia, multi-language, multi-component extension of RFC-822
-
Full backward compatibility with RFC-822
-
Open design to incorporate multiple well-known formats
-
Easy extension to new types and formats
|
Retain RFC-822 header+body format
|
Add new header fields
|
Allow for multipart multimedia bodies
|
Include media type and encoding information in new header fields such as: Content-Type, Content-Description, Content-Transfer-Encoding, Content-ID
|
Retain 7-bit ASCII for all valid encoding schemes
|
Implement multi-component bodies via a special 'magic type' Content-Type: multipart
|
Two level hierarchical typing scheme adopted of the form: basetype/subtype
|
Seven base media types are defined this minimal set is enforced, i.e. all extensions must pass the whole ID->RFC->STD process.
|
Allow for less restrictive subtyping the base types, for example:
-
Content-Type: text/plain
-
Content-Type: text/richtext
|
Some standard subtypes are specified and many more are expected. New subtypes must be registered with the IANA (Internet Assigned Numbers Authority).
|
Private experimental subtypes prefixed with "X-" may be used freely and without registration.
|
Seven base types are: text, image, audio, video, multipart, message, application.
|
multipart
-
Specifies a MIME message composed of several parts with possible different Content-Type fields.
-
Parts are separated by a boundary string, specified in the multipart header entry
-
Subtypes: mixed (serial combination of media), parallel (for parallel presentation if possible), alternative (multiple representations of the same data) and digest (all parts are messages)
|
message
-
Subtypes: rfc822 (standard ARPA e-mail format), partial (a single chunk of a larger message, chopped into pieces for transmission and then reassembled), external-body (pointer to a remote data - similar to typerlink/URL but different representation)
|
application
-
Current subtypes: postscript, ODA
-
Placeholder for "anything else" - several interactive/custom/creative extensions expected here
-
Already registered: Andrew-inset,t ATOMICMAIL (Bellcore)
|
HTTP provides an upper level to the Internet, that is, it is built on top of a back-bone network with all the packets flowing from client to server and vice versa using the standard TCP/IP protocol.
|
It uses MIME formats and concepts, but does not fully conform to MIME as the WWW is not a mail system.
|
HTTP protocol is compatible with other network services such as FTP (File Transfer Protocol), NNTP (Network News Transport Protocol).
-
On a UNIX-based machine, the basic services are enumerated in the file /etc/services. Each service cooresponds to a standard port. For example, telnet is mapped to port 43, and FTP is mapped to port 21. All ports below 1024 are privileged - only the system administrator can determine port use.
|
The HTTP service is standardly assigned to port 80 - it provides a much shorter service connection than the other services.
|
A URL has the standard form
-
service://machine:port/file.file-extension
|
HTML hyperlinks typically use the service http for linking to other documents and media files. Some other internet services can also be used such as
-
ftp://machine/file.file-extension.
|
In this way, a Web server can provide other Internet services through the browser interface.
|
The machine is an Internet address and can either be a symbolic name provided by the Domain Name Service (DNS) or the IP numbers.
|
If the port is not specified, it defaults to 80.
|
The file.file-extension is given by any Unix path name starting from the directory known to the server as "document root". Which path names are valid is one of the options of the server - whether "public_html" is automatically put into the path name and whether paths starting with "~username" are allowed.
|
In the http service, the file-extension is used to tell the browser what helper application to use to view the file. Typical file extensions are html, gif, jpeg, mpeg, au, ram, etc.
|
GET /document.html HTTP/1.0
|
Accept: www/source
|
Accept: text/html
|
Accept: image/gif
|
User-Agent: Lynx/2.2 libww/2.14
|
From: mnotulli@ukonaix.cc.ukans.edu
-
-- blank-line-terminating-the-request --
|
First line syntax is always: METHOD URL ProtocolVersion
|
The following lines form a header of an (extended) MIME message
|
"User-Agent" specifies the browser type
|
"Accept" specifies MIME types recognized by the browser
|
The server is expected to provide the requested data in one of these acceptable formats.
|
HTTP/1.0 200 OK
|
Date: Wednesday, 02-Feb-95 23:04:12 GMT
|
Server: NCSA/1.1
|
MIME-version: 1.0
|
Last-modified: Monday, 15-Nov-94 23:33:16 GMT
|
Content-type: text/html
|
Content-length: 2345 --
-
-- blank-line-separating-header-and-body--
|
<HTML><HEAD>
|
<TITLE> Document Title </TITLE>
|
. . .
|
This message contains both header and body
|
Some replies contain only header (e.g. error reports, such as HTTP/1.0 404 Not Found)
|
GET request also contained header only, whereas POST request (see next example) contains both header and body
|
POST /cgi-bin/post-query HTTP/1.0
|
Accept: www/source
|
Accept: text/html
|
Accept: video/mpeg
|
Accept: image/x-rgb
|
Accept: application/postscript
|
User-Agent: Lynx/2.2 libwww/2.14
|
From: grobe@unanaix.cc.ukans.edu
|
Content-type: application/x-www-form-urlencoded
|
Content-length: 150
-
--blank-line-separating-header-and-body---
|
org=Academic%20Computing%20Services
|
&users=10000
|
&browser=lynx
|
&contact=Michael%20Grobe%20grobe@kuhbuh.cc.ukans.edu
|
Both header and body present in POST requests - the body is typically used to pass a form contents to the server.
|