Stuart Moulthrop
& Nancy Kaplan
University of Baltimore
School of Information Arts and Technologies ·
Revised 2008
Part 1: Starting Points
1.1.1 · Hypertext Transport Protocol (HTTP)
Various protocols define types of Internet service; for instance, file transfer (FTP), USENET news (NNTP), e-mail (POP).
Hypertext Transport Protocol is a set of rules governing the transmission of words, images, and other forms of information that make up pages on the World Wide Web.
CLIENT: some computing device (computer, personal digital assistant, cellphone, game console, etc.) connected to the Internet and running a client or browser program through which it issues requests for information to other computers (servers). Popular Web Client Programs: Today most users of the Web rely on Microsoft Internet Explorer. Alternatives include Mozilla's Firefox (an open-source descendant of Mozilla/Netscape Navigator, one of the original full-featured browsers), Opera, and Apple's Safari. Browsers differ much less in basic performance now than they did in the 1990s. |
Popular Web Server Programs: Microsoft provides a suite of Internet services including HTTP server support as part of its .Net framework. Likewise Sun Microsystems and other major software companies sell proprietary HTTP server programs. Alternative to these corporate products are open-source browsers such as Apache, which now runs on Windows, MacOS, Linux, and most flavors of UNIX. |
- HTML 1 (1991-93) --
In the initial scheme of things, Web pages looked like typewritten
documents with graphics awkwardly stuck in.
- HTML 2.0 (1993-94) -- Never officially released, this first
revision of the language concentrated on interactive forms and did little to improve
layout or graphics.
- HTML 3 (1994-95) -- Netscape greatly expanded the range of HTML
commands with Navigator 1.0 (1994) and 1.1 (1995), adding centering, background images,
page color, tables, and dynamic documents. At first these features were considered
suspect and non-standard. The HTML 3.0 standard was never formally approved.
- HTML 3.2 (1996) -- When Microsoft entered the Web field midway
through 1995,
Netscape's "enhancements" showed up on the Internet Explorer feature list as well; this,
along with
the huge popularity of Netscape's innovations, prompted the W3C to issue a
standard including all major additions except
the advanced feature called frames.
- HTML 4.0 (1997-99) -- HTML 4.0,
intended as the last major revision of HTML,
supports an important new control system for typography and layout
called Cascading Stylesheets (CSS).
Along with Stylesheets, two new tags were added -- DIV and
SPAN to make styling elements more flexible.
HTML 4 also incorporates the Document Object Model,
a powerful method for combining scripting languages like JavaScript
and VBScript with elements of standard HTML.
- XHTML 1.0 (2000-) -- XHTML 1.0, provides a transition between the relatively simple environment of HTML and the much more complicated realm of Extensible Markup Language (XML). Though this course concentrates on ordinary HTML, we'll approach the language in a way that is consistent with XHTML. Many Web designers combine HTML 4 and XHTML into a notional "(X)HTML" (Elizabeth Castro). This is not a formal standard, but a way of flexibly mixing aspects of the two most recent, non-XML standards. We'll be following this approach.
Content and layout of Web pages are controlled by markup documents written in Hypertext Markup Language (HTML).
To see what Web markup looks like, use the View Document Source feature of your Web browser to examine the markup for this page.
Standards for HTML are maintained by the World Wide Web Consortium ("W3C," or more recently, "W3""), a committee of academic and industry officials, including Tim Berners-Lee, the computer scientist who invented HTML. However, the Consortium has no formal authority and software companies have extended the standard language considerably.
HTML is a set of instructions that tell browser programs how to display information. These instructions are similar to the so-called invisible commands used in older word processing programs like WordStar and WordPerfect.
HTML is much simpler than any programming language. You can learn the basics in a few hours; the hard part is knowing what to do with them.
In the early Web days it took some effort to keep current with the HTML standard. In their attempts to dominate Internet software, Netscape and Microsoft added new browser features that relied on extensions to HTML. Design practices tended to change radically as new versions of Web browsers came into the market. That competition has died down now (actually it's moved on to the far more complex realm of Extensible Markup Language or XML, which is beyond the purview of this course). The basic outlines of HTML, embodied in the HTML 4.0 standard published by W3C, are generally accepted throughout the Web world.
Nonetheless it's useful to review the history of HTML so far:
Every Web page you write should begin with a DOCTYPE element. Here's the one we recommend:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
This tag tells any browser loading your page that you are using the "Transitional" flavor of HTML 4, a mode that allows considerable deviation from the tighter rules that define the XHTML standard. Even though you'll be learning fairly strict (X)HTML here, we recommend Transitional.
We also insist on a DOCTYPE element. If a browser loads a page without this element, it will almost certainly go into Quirks mode, which is a more or less formally defined set of functions intended to deal with older, idiosyncratic Web pages using things like "Netscape enhancements" and HTML 3.2. Quirky pages are often unreliable, as one of us recently discovered while resurrecting a project originally written in 1996. Always include a DOCTYPE element.
In addition to inoculating you against the Quirks, DOCTYPE also allows your page to be validated by an automated system, such as the one maintained by the World Wide Web Consortium, the international standards organization that regulates HTML, XHTML, and XML. You should validate your Web pages using the W3 Validator. Your design clients -- and your instructor in this class -- may require it.
So DOCTYPE is essential; but the syntax of the DOCTYPE element is wooly, to say the least. No one can expect you to memorize it, and to reproduce it from memory without errors. Therefore you should start a Template file that includes the DOCTYPE element, along with the basic document structure elements (containers) defined in section 1.3.1, below.
Use this template as the starting point for all your Web pages; just remember to RENAME the template file before you begin entering additional page contents.
1.1.5 · Deprecated ElementsPlug-ins and browser-specific elements add to the range of function in HTML, but every flow implies an ebb: deprecation is the process by which tags and practices are removed from the HTML universe.
Adding to HTML is easy. In theory, anyone who programs a Web browser can propose a new tag (though you still have to convince other people to use it). Subtracting is another matter. Tags can be formally deprecated only by the World Wide Web Consortium, and then only after a period of public comment. Once a tag is deprecated, it is assumed the tag will not be developed further, and that future versions of browsing software need not recognize it.
Since W3 standards are only recommendations, however, software makers are not really obliged to drop support for obsolete but popular elements: indeed, they might lose market share by doing so. Like old soldiers, deprecated tags never die... but instead of fading away, they hang around as hazards to design, if not navigation.
In most cases an element becomes deprecated only when a new construction can do the same thing more efficiently or powerfully. Removing deprecated elements therefore should not impair the function of your pages.
The Cascading Stylesheet Standard (CSS) provides valid substitutes for most if not all deprecated structures. See Part 7 for detailed coverage of stylesheets. Since major browsers now support CSS quite reliably, you should rework deprecated elements in any existing pages you have published.
If we discuss deprecated elements in this course, we'll note them as such. Here are some you may encounter if you look at the markup for older pages:
| <CENTER></CENTER> | A Netscape "enhancement" to HTML. Since you can set ALIGN="center" attributes for headings, horizontal rules, paragraphs, and tables, there is virtually no need for this container anymore. |
| <FONT></FONT> | This container was once the only way to set text and link colors locally. It's superseded by stylesheet techniques. We do discuss this element in Part 3, but do not recommend continued use. The suggested alternative, in-line styling, is discussed in Part 7. |
| <LAYER></LAYER> | Netscape introduced layers with version 3.0 of Navigator as a way to add a third dimension (among other things) to Web layout; but these innovations were effectively trumped by the Document Object Model and DHTML, and did not become part of the HTML 4.0 specification. Netscape unilaterally deprecated the layer construction and its related parts in 1998, so we don't cover this material. |
- "Page"
- A page is a collection of information (words, images, other media types)
distinguished as a single entity. Pages are the basic units of information in Web
publishing.
Though Web pages often do look something like
pages in a desktop publishing program, they can contain much more information
than any printed page. Remember, the Web differs significantly from print.
- "Site"
- A Web site is a collection of pages connected by coordinated hypertext links.
Typically a site serves a single purpose or expresses a unified concept -- corporate identity,
information services, publication, etc.
- Uniform Resource Locator (URL)
- Every page on the Web has a distinct electronic address that may be
written as a Uniform Resource Locator. The URL for one of our home pages, for instance, is:
http://iat.ubalt.edu/moulthrop/index.htm
Here's what the elements of this string mean:
- http:// -- signals that this document uses
Hypertext Transfer Protocol (HTTP); in other words, it's a Web page;
- iat.ubalt.edu -- names the server (iat) and the sub- and super- domain (ubalt.edu) where this document is located;
- /moulthrop/ -- specifies the path, that is the nested directory, in which the
document is stored -- there's much more on this subject in
Part 2;
- index.htm -- gives the name of the page; note the .htm extension which is one of the two file extensions that may be used for Web pages (the other is .html).
- http:// -- signals that this document uses
Hypertext Transfer Protocol (HTTP); in other words, it's a Web page;
1.2.1 · Elements
Elements (sometimes informally called tags) are the basic commands or verbs of HTML.
Elements are indentified by the special symbols < and >, which are called angle brackets (or more familiarly, as "less than" and "greater than" signs). The first word after the opening bracket is the element's identifier, or the name of its HTML element (e.g., STRONG, BR, IMG). Elements also have formal names (e.g., "strong emphasis tag"), though most designers use short forms or nicknames.
| Markup Showing the "Emphasis" Element |
|
| Output From This Markup |
|
Before connecting the main relay to the fusion power reactor, make very certain that the primary switch is open. |
In the early days of HTML, some elements came singly and others came in pairs called containers. A container is a compound element consisting of a pair of tags in which the second begins with the "/" character (slash or stroke). For instance, <P> marks the beginning of a paragraph block and </P> marks the end. Together these two tags constitute a paragraph container.
Because the XML standard specifies that all markup elements must be containers, the old singleton elements have become a bit embarrassing. We can still use them as single elements, but we have to make a slight change. XHTML thus alters them slightly to include " /" (a space and a forward-slash) as their final element.
Thus we used to write the image tag simply as <IMG>, but now we write it as <IMG />. That final " /" tells XML-savvy browsers that they have reached the end of an instruction.
If elements are verbs, attributes are adverbs -- they modify the function of the element.
Elements can carry a great deal of internal information in the form of attributes. Attributes follow the identifier and have the general form:
ATTRIBUTE="ARGUMENT"
Though most browser programs let you omit the quotation marks most of the time, in some cases they are absolutely required (in the IMG element, for instance, which we'll discuss in Part 3): so get in the habit of putting all your attribute values inside quotation marks.
| Elements With And Without Attributes |
<HR />
|
| The first element inserts a horizontal rule (shadowed line) using default settings; the second tag inserts a rule 8 pixels high, half the current window width, aligned on the right of the window. |
1.3.1 · Document Definition Containers
- <HTML>
- In almost all Web pages, the initial tag of the HTML container, <HTML>
occurs in the second position in the document, on a line following the DOCTYPE element.
Most pages similarly end with the closing counterpart, </HTML>.
Everything within this container is identified as markup in Hypertext Markup Language.
- <HEAD>
- The HEAD container usually begins immediately inside the HTML container.
- The head of an HTML document contains a number of special containers for document definition:
-
<TITLE> The contents of this container appear
in the title bar of most browser programs (look at the top of
your screen) and are also recorded in the history, or recently-visited list accessible through the browser.
<ADDRESS> This container conventionally holds postal and e-mail addresses for the author of the page.
<META /> This tag can be used for several purposes:
-
The most common is to hold keywords that summarize or identify the content
of the document. These keywords may be used by search engines like Google help people find your pages. Used in this way,
the <META> tag takes this form:
<META NAME="keywords" CONTENT="dreams, beasts, sex" />
Your content may vary.
<BASE /> This tag gives a base URL for the current document. We'll return to this concept in Part 2.
- The head of an HTML document contains a number of special containers for document definition:
- <BODY>
- On conventional pages (i.e., those that do not use frames),
the BODY container holds the majority of page content -- virtually all the written
text and elements.
- Attributes of the initial BODY tag can set background image and the color of key layout divisions; we'll discuss these in Part 3.
Schematic View of Document Structure Containers <HTML> <HEAD> <TITLE> ... </TITLE> </HEAD> <BODY...> [Visible Content of Page Here] </BODY> </HTML>These containers form the 'skeleton' of the Web page; note that HEAD and BODY containers are separate, parallel divisions within the HTML container - Attributes of the initial BODY tag can set background image and the color of key layout divisions; we'll discuss these in Part 3.
- <BR />
- The BR element introduces a line break.
- Note that, with a few exceptions, line breaks typed into the markup are ignored when the page is presented to the viewer; line breaks must be encoded with specific tags.
- <BR /> is one of the original HTML singletons, so it now gets a terminal " /".
The Line Break Problem Exemplified MARKUP OUTPUT This line was broken
by typing Return.This line was broken by typing Return.
This line is broken<BR/> with a BR tag.This line is broken
with a BR tag. - Note that, with a few exceptions, line breaks typed into the markup are ignored when the page is presented to the viewer; line breaks must be encoded with specific tags.
- <P>
- The P element signals the beginning of a paragraph by inserting a blank line.
In the early days of Web design, before stylesheets arrived, <P> was used
as a solitary tag even though it ought always to be treated as a container--every paragraph has both a beginning and an end.
You may still see some increasingly moldy markup in which the <P> tag is used simply
to insert a blank line.
However, stylesheets and other advances in HTML require that the <P> container be closed with
</P>.
Always close paragraph structures.
Note that repeating <P> does not create two blank lines--browsers register only the first in any sequence of P tags. The safest way around this problem is a series of BR tags, since <BR /> is treated cumulatively:
-
<BR /><BR /> = one skipped line
<BR /><BR /><BR /><BR /> = two skipped lines, and so forth.
- <H1>, <H2>, etc.
- "H" stands for heading: the H container creates a heading of a given size from 1 (maximum) to 6 (minimum), thus:
- Notice the skipped lines between the examples above; these are not caused by <P> or <BR /> elements.
Because it is a block-level element (i.e., a discrete division of the document), the H
container forces a skipped line both before and after the heading. As a result, it is
often preferable to use styling effects for larger font size and weight when you want larger type not separated by breaks (see Part 7 for more about this procedure).
- Note also that heading values below 4 are often smaller than the ordinary body type, thus of little value. As with most elements, the visible effect of the H container can be controlled by Cascading Stylesheets, which we cover in Part 7.
- Note also that heading values below 4 are often smaller than the ordinary body type, thus of little value. As with most elements, the visible effect of the H container can be controlled by Cascading Stylesheets, which we cover in Part 7.
- Simple List Containers: <UL> and <OL>
- The containers UL (unordered list) and OL (ordered list) create simple lists of the following types:
- lions
- tigers
- beers
- liars
- talkers
- bores
- Note that the list item element was originally also
a singleton. For some reason, however, it was decided that list items should
be fully containerized, so we now use </LI> to close.
- The primary difference is that items in the ordered list are numbered, while items in the unordered list are marked with dingbat characters commonly called bullets.
- By adding a TYPE attribute to the initial UL tag, you can set the bullet shape to "SQUARE," "CIRCLE," or "DISC" (the default); likewise TYPE can be set in the OL tag to "I" (Roman numerals, uppercase), "i" (Roman numerals, lowercase), "A" (uppercase letters), "a" (lowercase letters), or "1" (Arabic numerals, the default).
- The TYPE attribute may also be added to the leading tage of the <LI> container, which raises the possibility of a heterogeneous list, in which some items are marked with letters, some with numbers, and so forth ("number B," as Click and Clack always say). This is probably not a desirable practice.
- The primary difference is that items in the ordered list are numbered, while items in the unordered list are marked with dingbat characters commonly called bullets.
- Descriptive List: <DL>
- The DL container creates a more complex list that is useful for glossaries, commentaries, bibliographies, and other two-decked structures:
- Non-combatant:
- A dead Quaker
- In the descriptive list, two containers take the place of <LI>:
<DT> (described term) and <DD> (descriptive data).
- You can nest one type of list within another. The descriptive data following a term in a descriptive list could contain an ordered list, or an unordered list could be inserted within an ordered list, and so forth. These course notes contain several examples of this technique.
- You can nest one type of list within another. The descriptive data following a term in a descriptive list could contain an ordered list, or an unordered list could be inserted within an ordered list, and so forth. These course notes contain several examples of this technique.
- <HR />
- This singleton tag inserts a horizontal rule to mark
a division of the page.
- You may also add SIZE (height), and WIDTH attributes, whose values may be given either as percentage of available space or as pixels. Browsers may interpret these variations somewhat differently, but the general effect is fairly constant.
- The NOSHADE attribute for HR converts the default shaded line into a solid bar. In (X)HTML style, every attribute needs a value, so we write this one as NOSHADE="noshade".
- <PRE>
- The PRE or preformatted text container is the most common exception to the general rule about line breaks in markup (see above): with PRE, breaks typed in the markup ARE carried through.
- You may also add SIZE (height), and WIDTH attributes, whose values may be given either as percentage of available space or as pixels. Browsers may interpret these variations somewhat differently, but the general effect is fairly constant.
- Some beginners see the PRE container as a formatting shortcut; but of course there is a catch.
Everything within the PRE tag appears in a
mono-spaced or teletype font.
- Seasoned Web users will thus see that you have used the PRE tag and will be apt to regard your content as merely dumped from older formats into your Web page. They will regard your work as unprofessional. Use the PRE container only when there are no reasonable alternatives. We tend to use it, for instance, only when transcribing code examples with multiple levels of indentation.
A heading in <H1>
A heading in <H2>
A heading in <H3>
A heading in <H4>
A heading in <H5>
A heading in <H6>
| Simple Lists | |
| Markup | Output |
| Unordered <UL> | |
<UL> |
|
| Ordered <OL> | |
<OL> |
|
| Descriptive List | |
| Markup | Output |
|
<DL> <DT>Non-combatant:</DT> <DD>A dead Quaker</DD> </DL> |
|
| Preformatted Text Container | |
| Markup | Output |
<PRE> |
Here at ZipDotCom we are ready to serve YOU, the savvy Internet consumer!!! |
- <TT>
- Of course, you might wish to use a teletype font on occasion, perhaps to
mark a change of tone or a discursive shift. The TT container does this, but doesn't
pass along any line breaks from the markup.
- Related to TT is the <CODE> container, which is used to set examples of computer code for documentation or discussion; it is preferable to TT for specialized uses because it signals content type (or document structure) as well as defining appearance.
- Related to TT is the <CODE> container, which is used to set examples of computer code for documentation or discussion; it is preferable to TT for specialized uses because it signals content type (or document structure) as well as defining appearance.
- <EM> and <STRONG>
- These containers set emphasis and strong emphasis; ordinarily they are
interpreted as italics and bold, but substitutions
may be made on systems that do not support these options.
- The EM and STRONG containers are preferable to the more precise <I> and <B> containers, partly because they do not lock in only one style (bold or italic) and partly because they convey information about content as well as appearance.
- EM and STRONG may be used together to produce doubly strong emphasis. For example:
This is <STRONG><EM>really</EM></STRONG> important! - The EM and STRONG containers are preferable to the more precise <I> and <B> containers, partly because they do not lock in only one style (bold or italic) and partly because they convey information about content as well as appearance.
- Menu-driven editors
- These simple programs (e.g., Adobe's now defunct HomeSite)
let you select HTML elements from
menus and other visual arrays. They can be useful but also confusing, because
there are so many elements and variations to choose from. Since in many cases
you'll have to add attributes by hand, why not just learn the tags?
- WYSIWYG HTML "assistants"
- Programs like Microsoft's FrontPage and Adobe's Dreamweaver
promise Web authoring without the bother of coding. They can be
valuable if you have to convert large amounts of regularly
structured text or if you find yourself working with people who
have not learned HTML.
- On the other hand, most serious Web designers avoid these tools, largely because they generate highly complicated and sometimes non-standard code. Some of these programs exasperatingly re-generate code structures after the user edits them out.
- On the other hand, most serious Web designers avoid these tools, largely because they generate highly complicated and sometimes non-standard code. Some of these programs exasperatingly re-generate code structures after the user edits them out.
- Simple text editors
- All you really need to write Web markup is a text editor such as WordPad or (for Macintosh) BBEdit. WordPad is a Windows system utility; a "lite" version of BBEdit circulates free on the Internet. Another good choice is TextPad, available online for a reasonable price.
- Case sensitivity
- With two important exceptions, HTML makes no distinction between
upper and lowercase letters -- so <STRONG>,
<strong>, and <StRoNg> are
all valid ways to write the same element.
- The exceptions are NAME anchors (see notes for Part 2) and URLs for external pages stored on UNIX systems (also discussed in the next session).
- The exceptions are NAME anchors (see notes for Part 2) and URLs for external pages stored on UNIX systems (also discussed in the next session).
- Begin tags on separate lines
- This helps separate tags from content, which helps when you're trying
to edit one or the other.
- Skip lines and indent for clarity
- Browsers disregard skips and indentations just as they
ignore line breaks, except of course inside a PRE container.
- Comment your markup!
- Comments in HTML are enclosed in a special container, thus:
<!-- insert comment here -->
- Since HTML markup can be bewilderingly complex, you should write comments to help yourself and others understand what's going on. Comments are invisible to the casual reader and do not add significantly to load time.
- Since HTML markup can be bewilderingly complex, you should write comments to help yourself and others understand what's going on. Comments are invisible to the casual reader and do not add significantly to load time.
- Keep embedded structures in order
- Containers frequently overlap, as in this example:
<FONT SIZE="5">
<STRONG>
Big and Bold
</STRONG>
</FONT>
- In this example, containers are closed on a last-opened, first-closed basis. Always close containers according to this logical sequence.
Index · Part 2 · Part 3 · Part 4 · Part 5 · Part 6 · Part 7 · Part 8 · Part 9 · Top
Course and Materials ©1997 - 2008 by Stuart Moulthrop
and Nancy Kaplan