Stuart Moulthrop & Nancy Kaplan University of Baltimore School of Communications Design · 2000 | |||||||||||||||
|
Week 2: Links and Document Systems |
|||||||||||||||
|
2.1 | CONCEPTUAL BASIS OF HYPERTEXT LINKS A link is an automatic connection between (at least) two bodies of information. Links are what let Web users surf from one document to another. Links weave the fabric of connections that defines the space of the Web. At its simplest, a link transfers the user's attention between two points -- a departure and an arrival. We use metaphors of travel or "navigation" to describe this process, but the motion is figurative: the only thing that really moves is the information, which travels from a server program to a client program.
![]() Some information displayed at each end helps create context for the link. A word or phrase may suggest what will come if the link is activated, or may re-orient the reader once the link has been followed. It's also possible to use an image in this way. These visible indicators associated with a link are often called "hot spots." Using what we think is a more useful terminology, we'll call the indicator of a link its cue, using a direct analogy to theater. In a stage script, a cue is a term with double meaning: it is both a line of dialogue and a private signal to the actors about an action. Link cues in hypertext function in the same way. For instance:
If you're not interested in the link to the previous notes, then this is just a sentence. You'll probably overlook the underlining and color effect on the last words. If you're interested in ways off this page, though, you'll probably see "Week 1" as an exit sign. A cue is both a statement and a call to action. Since cues can occur at both the origin and endpoint of links, we'll be talking about departure cues and arrival cues. Links also come in various kinds and styles, so we'll need to throw a few more terms at you as well. 2.2.1 · <A> or "Anchor" Tag Virtually all link transactions in HTML are controlled by a single container whose first element is the <A> or "anchor" tag. The initial <A> tag may take two attributes, HREF (for "hypertext reference") and NAME. Values stored in these attributes indicate a destination or arrival point for the link; they are explained further in the text and examples below. The <A> tag may have both attributes at once, allowing a single construction to serve as both an arrival and departure point. For more information about this, see section 2.7.5 below. In anchor tags using HREF, the value assigned to this attribute is a reference string or Uniform Resource Locator (URL) that gives the Internet address of the document to which the link refers. Here are some examples:
<a href="http://www.conspiracies.gov"> interesting stuff</a> <a href="../../lostFiles/forgetThis.htm"> absolutely crucial</a> We'll explain a little later how these URLs work and why they look so different. For the moment, just learn to recognize them as designations for other documents or information sources. The cue is an item of information (words or an image) that offers a visible point of departure or arrival for a hypertext link. As we'll see, departure cues differ importantly from arrival cues, but their general function is the same. In terms of markup, the cue is the material enclosed within the <A> container. Though cues can be graphics as well (we're coming to that), we'll concentrate on words here:
<a href="http://www.conspiracies.gov"> interesting stuff</a> <a href="../../lostFiles/forgetThis.htm"> absolutely crucial</a> Unlike the HREF and other attributes of the anchor tag, the cue appears within the body of the document when it is viewed in a browser. This tag closes the anchor container, no matter whether it serves on the departure or the arrival end of the link. Though it's small, </A> is essential -- if you forget it, your link anchor will run on through your page until another </A> is encountered. The initial <A> tag of an anchor container may have one more attribute in addition to HREF or NAME. This is TARGET, a feature that was added to HTML after the introduction of frames in Netscape Navigator 2.0. When used on a page contained within a frameset (to which we'll come eventually), TARGET is usually assigned the name of one of the frames as a value. Activating the link then causes the arrival point to be displayed in the designated frame. TARGET can also be used without frames, however. Several special values are reserved for TARGET, each one beginning with the underscore character "_". These reserved values control the behavior of windows. For instance:
This link opens a new browser window over the present window and displays the arrival content there. The original window remains available behind the new one. We'll discuss the other reserved values of TARGET in Week 4.
In most cases hypertext links on the Web simply shuttle users from one page or page location to another: they operate within the general confines of HTTP. However, HTML anchor containers can also be used for other purposes, for instance, sending e-mail. By using the prefix MAILTO: and a valid e-mail address as the value assigned to the HREF attribute, you can cause the browser to activate its e-mail feature or launch a separate e-mail application. The user will see a blank e-mail document with the address filled in. Here's an example of an e-mail link construction
Note that MAILTO: occurs as a prefix within the value of the attribute. It is not an attribute itself, even though it modifies the action of the link in the same way as an attribute. This is one of the many charming inconsistencies built into HTML. In time you'll learn to love them all. So much for theory; the best way to understand HTML links is to examine them in practice, so here are two examples of actual, working links (try them out): 2.3.1 · Link To Another Web Site (External)
In this example, the value assigned to HREF is an absolute URL, containing the complete Internet address for the document in question. But what exactly is that document? The address belongs to the Eastgate Web site, not to any particular page (which would have the extension ".htm"). This link actually leads to a page called "index.htm," the default destination for any links pointing to the Eastgate site. There is usually an index page or default starting point not just for the top level of a Web site but for every subdivision or directory within that site. The default page does not need to be fully named because Eastgate's Web server automatically looks for an index file when no other page name is given. Virtually all HTTP servers behave this way, and these days most will let you use a variety of names and file extensions for your index page. You might still run across an older server that requires index files to to have only one name way -- home.html, default.htm, etc. The people who run the server can give you this information. Regarding markup style, note that the anchor tag in this example begins on a new line; this makes the markup easier to read.
Here the value of HREF is a page, but only the name of that page. Though we could have written an absolute URL for this link, it isn't necessary to do so, because the destination page is part of the current Web site.
More precisely, the destination page resides within the same
directory (or virtual "folder") as the page you are reading now.
This relationship is explained more fully in the sections that follow.
2.4 | ABSOLUTE AND RELATIVE REFERENCES 2.4.1 · How They Differ Absolute URLs are like postal addresses: they contain all the information necessary to transfer information from one point to another. Consider this example:
In the case of the Internet, "all the information necessary" means the protocol type (HTTP), the name and address of the HTTP server (raven.ubalt.edu), the nested directories on the server in which the file is located ("kaplan" nested within "staff"), and the name of the file itself (nak_home.htm). By contrast, a local link from another page within the "kaplan" directory could use this much simpler reference:
All the other information can be left implicit, since in the case of two documents within the same directory, the larger elements of context (server name, surrounding directories) remain the same. You should use simplified or relative references for all local links. Because local links use less information, they are more general and thus more adaptable. They don't mention the name of the Web server on which the page is located, for example, and this is crucial. Because the server is not named specifically, relative links continue to work properly even when the document that contains them is transferred to another server. This aspect of relative linking lets you develop your Web site in one context -- for example, your local hard drive -- then upload it successfully to a remote server. The BASE container, which is installed within the HEAD container at the beginning of a Web page, specifies a default pathway for any relative links in the BODY section of that page; BASE lets you streamline link constructions by holding part of the URL constant. Suppose for instance that most of the links on this page run to other pages stored on a remote Web server called "swissbank.account.com"; without BASE, all my links would need to look something like this:
Now suppose I add the following BASE container to the HEAD part of my page:
Links to the "swissbank" location can now refer simply to the pages in question (such as "fidel.htm"). An important qualification, though: any link to an Internet location other than "swissbank.account.com/rvesco/skeletons/" now needs to be fully qualified in order to override the BASE instruction. That means even a link to a page in the same directory as the current page would have to start with "http://" and include a full address. BASE is thus useful primarily in cases where nearly all links go to the same remote location. 2.5.1 · GUI vs. Command Line Client/server technology was supposed to make the Internet friendly to all operating systems, including those like Windows and MacOS that use graphics to represent files and their relationships; but the Web was created largely by UNIX programmers, who generally prefer command-line interfaces. Managing links in HTML requires at least a basic understanding of hierarchical file structures and how to express them in lines of text. People who have used command-line UNIX interfaces, or older PC hands who got started on systems like DOS, probably won't have much trouble with links and pathways. Those who've been more dependent on visual metaphors may have to re-orient themselves. There's nothing terribly difficult here, however. The rules governing URLs fall into two handy groups: Using these rules and principles you can navigate your way through any file structure; but this is perhaps best demonstrated by example. 2.6.1 · A Simple File Structure Below you'll see a partial file schematic for raven.ubalt.edu, an HTTP server; after some remarks about the diagram we'll consider some linking problems based upon it:
In this diagram directories (folders, in a GUI) appear in larger, green type. The chart shows five subdirectories within the main directory of the Raven server: features, projects, departments, staff, and guests. Some of the directories here have no contents listed; others contain further subdirectories (green) or individual Web pages. Pages are shown here in small, red type and may be indentified by the ".htm" extension at the end of their names: newFeatures.htm and abstracts.htm are both HTML files. Note the position within the hierarchy of the two pages: newFeatures.htm resides in a folder called features on the server. The second page, abstracts.htm, lies inside a folder called seminar, which is inside a folder called comDesign, which is inside a folder called departments. This last, outermost folder is on the same level as features. The examples below assume that we have the ability to make changes to files on the Web server, or at least to files within the directories departments and features. Suppose I want to make a link from the phrase, "abstract archive" on the page newFeatures.htm to the top of the page abstracts.htm. The markup looks like this:
The HREF string describes a path with five stages:
Now suppose I want to write a reciprocating link from the phrase "and other features" on the page abstracts.htm to the top of the page newFeatures.htm. If you understood the Rules of URLs above, you'll see how this goes. Here's the markup, along with an explanation:
Remember this time we're starting out where the last link left us -- two directories
down inside of departments --
so most of this trip consists of climbing back up to
the main Raven directory, after which we dive down again into
features. This
transition also has five stages, this time in reverse:
This third example differs significantly from the first two; in this case we want to direct the link not to another page in general, but to a specific point within that page. Say I want to link from the phrase "electronic commerce" within the page abstracts.htm to the phrase "future of publishing," on the page newFeatures.htm. To do this I need two anchor tags, one at the departure point and the other at the arrival point. Let's begin with the anchor container at the departure point:
<a href="../../../features/newFeatures.htm#future">
The string #future is shown in red here for visibility; this crucial addition makes this anchor container into the first half of a two-part link. Notice that #future comes after the ".htm" extension in the name of the page file; it may seem bad form to add an extension to an extension, but this is how the HTML specifications require anchored links to be written. The additional string at the end of the departure anchor is all you need to point your link toward a specific destination, though you still need to create the appropriate arrival construction on the destination page netage_home.htm. The code you need looks like this:
<A NAME="future">the future of Web publishing</A>
So what is the effect of this coordinated pair of anchor containers -- or as we might also call it, a doubly-anchored link? -- the answer is in the next section... 2.7.1 · How Departure-Only Links Differ from Departure-Arrival Links When readers follow a simple link -- that is, a link written without a particular arrival point specified -- the current reading point shifts from the location of the departure cue to the beginning or top of the page to which the link refers. In the absence of arrival anchors, links always point to the top of a page. However, when readers follow a link with a specified arrival point (hereafter known as a doubly-anchored link), the reading point shifts from the departure location to a specfied point within the arrival page. The arrival point is first designated in the additional string following the "#" symbol in the departure anchor -- in our example it was the word "future," but it could have been anything at all. There is only one restriction on the arrival anchor: It must be written exactly the same way each time it is used. That is, the value assigned to the NAME attribute in the arrival anchor container (on the destination page) must precisely match the value used in the departure anchor (the material following the # in the outgoing link code) on the departure page.
While the comments above may help you understand arrival links in general, we haven't yet fully explained the function of arrival cues -- the textual or graphic material that is placed inside the arrival anchor container:
Earlier browsers (e.g., the original Mosaic) visually highlighted the arrival cue when a doubly-anchored link was followed; for some reason, though, this feature does not appear in current version of Navigator and Explorer. Current major browsers treat arrival cues in this way: they redraw the display so that the cue is as close as possible to the top of the current window; if the destination page is scrollable, they re-set the scroll point to bring the arrival cue onto the first line of the window. So in about 95% of the world's Web browsers, arrival cues determine scroll position -- obviously, if there is too little content to scroll, the arrival cue is irrelevant. Why worry about arrival cues if they're so obscure? If you're creating long, scrolling documents with multiple divisions -- rather like the page you're reading now -- then you might want to include links to allow readers to jump to other sections of the document (such as this link to Section 2.7.3). To be most useful, arrival cues sometimes must be set a few lines above the material or section to which they refer, so that readers can orient themselves to the context. For instance, if a paragraph is preceded by a heading it is better to set the arrival cue on the heading rather than the first line of the paragraph, since in the latter case the heading would end up above the scroll point.
So far we have discussed specific links only within a single Web page (this one) and between two pages on the same Web server (in Section 2.6.4); but what about pages on other servers? Here we run into a major limitation of the World Wide Web as a hypertext system -- readers cannot contribute any information, including arrival anchors, to external pages. Unless you have write privileges on the server in question, all your external links must be of the simple variety, pointing to the top of an external page. Are there ways around this problem? The Web as we know it offers three: 2.7.6 · Combined Departure/Arrival Structures Finally, a small but elegant technical design feature: you may include both the HREF attribute and the NAME attribute in a single anchor container, allowing the cue within that container to function both as a departure and an arrival point. The link cue in the title of this section ("Departure/Arrival") is such a double operator; here's what it looks like:
<a name="bothways" href="#bothwaysback">Departure/Arrival</a>
View the markup for this page if you'd like a closer look. END OF NOTES FOR WEEK 2
Course and Materials ©1997 - 2000 by Stuart Moulthrop
and Nancy Kaplan | |||||||||||||||