logo for course notes

Stuart Moulthrop & Nancy Kaplan
University of Baltimore
School of Information Arts and Technologies · Revised 2008


Part 2: Links and Document Systems


2.1 | CONCEPTUAL BASIS OF HYPERTEXT LINKS


A link is an automatic connection between (at least) two bodies of information. Links are what let Web users surf from one document to another. Links weave the fabric of connections that defines the space of the Web.

At its simplest, a link transfers the user's attention between two points--a departure and an arrival. (These terms were introduced by the hypertext theorist George Landow, who wrote the first rhetoric of hypertext.) We use metaphors of travel or "navigation" to describe this process, but the motion is figurative: the only thing that really moves is the information, which travels from a server to a client.

schematic illustration of a hypertext link

Some information displayed at each end helps create context for the link. A word or phrase may suggest what will come if the link is activated, or may re-orient the reader once the link has been followed. It's also possible to use an image in this way. In the early days, these visible indicators associated with a link were often called hot words or hot spots. Lately, people have come to call them simply links -- but however useful, this reference confuses some important distinctions.

We'll call the visible indicator of a link its cue, using a direct analogy to theater. In a stage script, a cue is a term with double meaning: it is both a line to be spoken and a private signal to the actors about an upcoming action. That is, when the actor playing Horatio hears a certain phrase in a speech by the actor playing Hamlet, he knows it's nearly time for his entrance.

Link cues in hypertext function in the same way. For instance:

This line contains a link to Part 1.

If you're not interested in the link to the previous notes, then this is just a sentence. You'll probably overlook the underlining and color effect on the last words. If you're interested in information beyond this page, though, you'll see "Part 1" as an exit sign. A cue is both a statement and a call to action.

Since cues can occur at both the origin and endpoint of links, we'll be talking about departure cues and arrival cues. Links also come in various kinds and styles, so we'll need to throw a few more terms at you as well, as we go along.


2.2 | WRITING LINKS


2.2.1 · <a> or Anchor Tag

Most link transactions in HTML are controlled by a container whose first part is the <a> or anchor tag.

2.2.2 · Attributes of <a>: href and name

The initial <a> tag may take two attributes, href (for "hypertext reference") and name. Values stored in these attributes indicate an intended destination, or define an arrival point for a link; they are explained further in the text and examples below.

The <a> tag may have both attributes at once, allowing a single construction to serve as both an arrival and departure point. For more information about this, see section 2.7.5 below.
2.2.3 · Reference String or URL

In anchor tags using href -- outgoing or departure anchors -- the value assigned to this attribute is a reference string or Uniform Resource Locator (URL) that gives the Internet address of the document to which the link refers. Here are some examples:

<a href="nextPage.htm"> the next page</a>
<a href="http://www.vastBatWingConspiracy.gov"> interesting stuff</a>
<a href="../../lostFiles/forgetThis.htm"> absolutely crucial</a>

We'll explain a little later how these URLs work and why they look so different from one another. For the moment, just learn to recognize them as designations for other documents or information sources.

2.2.4 · Cue

The cue is an item of information (words or an image) that offers a point of departure or arrival for a hypertext link. As we'll see, departure cues differ importantly from arrival cues, but they share one key property: they mark one endpoint of the link transition.

In terms of markup, the cue is the material enclosed within the <a> container. Although cues can be graphics as well (as explained in Part 3), we'll concentrate on words here:

<a href="nextPage.htm"> the next page</a>
<a href="http://www.vastBatWingConspiracy.gov"> interesting stuff</a>
<a href="../../lostFiles/forgetThis.htm"> absolutely crucial</a>

Look closely at the structure of the examples above: the href attribute and its URL value fall inside the angle brackets of the initial anchor tag; they are therefore part of the invisible infrastructure of the markup. The cue material, on the other hand, occurs between the two containing tags and therefore has the status of visible text. Though it will appear with a color treatment and underlining, it will nonetheless fall into line with the text that precedes and follows it. Remember, a cue has double value: it both carries on the current discourse and holds the key to a special, disjunctive action.

2.2.5 · </a> or "End Anchor" Tag

This tag closes the anchor container, no matter whether it serves on the departure or the arrival end of the link. Though it's small, </a> is essential -- if you forget it, your link anchor will run on through your page until another </a> is encountered--a fairly common mistake in markup. Here's an example:

First </a> Tag Missing
Markup Output
Some people go
<a href="oneWay.htm">one way,
and some go
<a href="antherWay.htm">the other</a>.

Some people go one way, and some go the other.

First </a> Tag In Place
Markup Output
Some people go
<a href="oneWay.htm">one way</a>,
and some go
<a href="antherWay.htm">the other</a>.

Some people go one way, and some go the other.

2.2.6 · The target attribute

The initial <a> tag of an anchor container may have another attribute in addition to href or name. This is target, a feature that was added to HTML with the introduction of frames in Netscape Navigator 2.0, but which has survived into general practice, even as frames have faded away.

When used on a page contained within a frameset--a concept covered in Part 5--target is usually assigned the name of one of the frames as a value. Activating the link then causes the arrival point to be displayed in the designated frame.

Several special values are reserved for target, each one beginning with the underscore character "_". These reserved values control the behavior of windows. For instance:

<a href="somePage.htm" target="_new">a fresh start<a>

This link opens a new browser window over the present window and displays the arrival content there. The original window remains available behind the new one. Yes, this is one way to generate those infamous pop-up windows so many people detest. (The worst pop-up offenders tend to use other, more insidious methods.)

Other reserved values of target are discussed in Part 5.

2.2.7 · MAILTO:

In most cases hypertext links on the Web simply shuttle users from one page or page location to another: they operate within the general confines of HTTP. However, HTML anchor containers can also be used for other purposes, for instance, sending e-mail.

By using the prefix MAILTO: and a valid e-mail address as the value assigned to the href attribute, you can cause the browser to activate its e-mail feature or launch a separate e-mail application. The user will see a blank e-mail document with the address filled in. Here's an example of an e-mail link construction

<a href="mailto: nobody@nowhere.com">
Send all complaints to Mr. Noah Bodé</a>

Note that MAILTO: occurs as a prefix within the value of the attribute. It is not an attribute itself, even though it modifies the action of the link in the same way as an attribute. This is one of the many charming inconsistencies built into HTML. In time you'll learn to love them as much as we do.

Links using MAILTO have become considerably less common these days, and it's not hard to see why, since they publish someone's e-mail address to the world, allowing automated page-analyzers (bots or spiders) to feed that information to spammers. Believe it or not, the Web was once a fairly innocent place. Use the MAILTO link with caution. You may want to restrict it to institutional, rather than personal mail accounts.


2.3 | TWO SIMPLE LINKS


So much for theory; the best way to understand HTML links is to examine them in practice, so here are two examples of actual, working links (click away):

2.3.1 · Link To Another Web Site (External)

A Simple External Link
Markup Output
Follow this link to
<a href="http://www.eastgate.com">
Eastgate Systems</a>, home of Serious Hypertext.

Follow this link to Eastgate Systems, home of Serious Hypertext.


In this example, the value assigned to href is an absolute URL, containing the complete Internet address for the document in question.

But what exactly is that document? The address belongs to the Eastgate Web site, not to any particular page, which would have a file extension such as ".htm".

This link actually leads to a page called "index.htm," the default destination for any links pointing to the Eastgate site. There is usually an index page or default starting point not just for the top level of a Web site but for every subdivision or directory within that site.

The default page does not need to be fully named because Eastgate's Web server automatically looks for an index file when no other page name is given. Virtually all HTTP servers behave this way, and most allow a variety of names and file extensions for the index page.

Regarding markup style, note that the anchor tag in our example above begins on a new line; this makes the markup easier to read. You do not have to write links in this way, but we strongly suggest it.

2.3.2 · Link To Another Page of the Present Site (Local)
A Local Link
Markup Output
Follow this link to
<a href="loc_dmo.htm">
a nearby page</a>

Follow this link to a nearby page

Here the value of href is a page, but only the name of that page. Though we could have written an absolute URL for this link, it isn't necessary, because the destination page is part of the current Web site. More precisely, the destination page resides within the same directory (or virtual folder) as the page you are reading now. This relationship is explained more fully in the sections that follow.

For the moment, it's enough to recognize that links to pages nearby in virtual space do not require the elaborate mechanism of "http://", and so on.


2.4 | ABSOLUTE AND RELATIVE REFERENCES


2.4.1 · How They Differ

Absolute URLs are like postal addresses--they contain all the information necessary to transfer information from one point to another. Consider this example:

http://student-iat.ubalt.edu/tester/testPage.htm

In the case of the Internet, "all the information necessary" means the protocol type (HTTP), the name and address of the HTTP server (student-iat.ubalt.edu), the directory on the server in which the file is located (tester), and the name of the file itself (testPage.htm).

By contrast, a local link from another page within the tester directory could use this much simpler reference:

testPage.htm

All the other information can be left implicit, since in the case of two documents within the same directory, the larger elements of context (server name and relevant directories) remain the same.

2.4.2 · Why Relative References Are Better

You should use simplified or relative references for all local links.

Because local links use less information, they are more general and thus more adaptable. They don't mention the name of the Web server on which the page is located, for example, and this is crucial. Because the server is not named specifically, relative links continue to work properly even when the document that contains them is transferred to another server. This aspect of relative linking lets you develop your Web site in one context -- for example, your local hard drive -- then upload it successfully to a remote server.

Failure to observe this practice -- using absolute instead of relative links -- may be the most common failure of inexperienced Web coders. We have often heard first-timers complain that links on their pages no longer work when uploaded. On inspection, the links look something like this:

http://C:/Documents%20and%20Settings/Joe_User/MYFIRSTWEBSITE/page_2.htm

The reference here has been written to a specific directory on the writer's personal computer. It is therefore unintelligible to a Web server.

Since most people wouldn't bother to write such elaborate addresses by hand, cases like this one usually involve some code-generating program, such as Dreamweaver, which if not properly configured can write useless links. This is another reason we insist on hand-coding, without software assistance.

2.4.3 · <BASE>

The BASE container, which if used should be installed within the HEAD container at the beginning of a Web page, specifies a default pathway for any relative links in the BODY section of that page; BASE lets you streamline link constructions by holding part of the URL constant.

Suppose for instance that most of the links on this page run to other pages stored on a remote Web server called "swissbank.account.com"; without BASE, all my links would need to look something like this:

http://swissbank.account.com/scam_4/skeletons/payoffs.htm

Now suppose I add the following BASE container to the HEAD part of my page:

<BASE>
http://swissbank.account.com/scam_4/skeletons/
</BASE>

Links to the "swissbank" location can now refer simply to the pages in question (such as payoffs.htm).

The <BASE> technique is very powerful and very widely used. Some situations require it, for instance if you are using virtual addresses or aliases under Apache. But remember that <BASE> can affect all the links on your page, so be careful. Under certain circumstances, the BASE tag can wreck all the links on your page, if part of your site is changed, or if the page is moved.

We'll deal with other powerful, highly generalized structures when we come to Stylesheets. Get used to the concept.


2.5 | WORKING WITH URLs


2.5.1 · GUI vs. Command Line

Client/server technology was supposed to make the Internet friendly to all operating systems, including those like Windows and MacOS that use graphics to represent files and their relationships; but the Web was created largely by UNIX programmers, who generally prefer command line interfaces. Managing links in HTML requires at least a basic understanding of hierarchical file structures and how to express them in lines of text.

People who have used command line UNIX systems, or older PC hands who got started on DOS, probably won't have much trouble with links and pathways.

When we first wrote these notes, about a decade ago, there were still plenty of first-time Web users who remembered command lines and DOS; now, not so much. Most of you GUI natives will have to re-orient yourselves. Happily, it's not too difficult.

2.5.2 · Rules of URLs

The rules governing URLs fall into two handy groups:

  1. Rules of hierarchy

    Directories are organized hierarchically; folders have three general relationships:

    • A IS NEXT TO B (A and B are parallel)
    • A IS WITHIN B (A is lower in the hierarchy than B)
    • B IS WITHIN A (A is higher in the hierarchy than B)

    A file hierarchy is an inverted tree with its source or root at the highest point and its smallest branches (individual files) spread out below.

  2. Rules of reference

    Stroke or slash ( / ) indicates a change of directory.

    Two dots ( .. ) mean a move upward in the hierarchy.

Using these rules and principles you can navigate your way through any file structure; but this is perhaps best demonstrated by example.


2.6 | CASE STUDIES OF LOCAL LINKS


2.6.1 · A Simple File Structure

Below you'll see a partial file schematic for www.someplace.net, an HTTP server; after some remarks about the diagram we'll consider some linking problems based upon it:

  • s o m e p l a c e
    features
    newFeatures.htm

    projects

    departments
    planetX
    documents
    scam_4.htm

    staff

    guests

In this diagram, directories (folders, in a GUI) appear in larger, green type. The chart shows five subdirectories within the main directory of the someplace server: features, projects, departments, staff, and guests.

Some of the directories here have no contents listed; others contain further subdirectories (green) or individual Web pages. Pages are shown here in small, red type and may be indentified by the ".htm" extension at the end of their names: newFeatures.htm and scam_4.htm are both HTML files.

Note the position within the hierarchy of the two pages: newFeatures.htm resides in a folder called features on the server. The second page, scam_4.htm, lies inside a folder called documents, which is inside a folder called planetX, which is inside a folder called departments. This last, outermost folder is on the same level as features.

The examples below assume that we have the ability to make changes to files on the Web server, or at least to files within the directories departments and features.

2.6.2 · Local Link from Higher to Lower

Suppose I want to make a link from the phrase, "asset transfers" on the page newFeatures.htm to the top of the page scam_4.htm. The markup looks like this:

<a href="../departments/planetX/documents/scam_4.htm">
asset transfers</a>

The href string describes a path with five stages:

Stage 1 ..
move up one level from inside features to the main someplace directory
Stage 2 /departments
move over to departments
Stage 3 /planetX
move within departments to planetX
Stage 4 /documents
move within planetX to documents
Stage 5 /scam_4.htm
open the page scam_4.htm within documents
2.6.3 · Local Link from Lower to Higher

Now suppose I want to write a reciprocating link from the phrase "and other features" on the page scam_4.htm to the top of the page newFeatures.htm. If you understood the Rules of URLs above, you'll see how this goes. Here's the markup, along with an explanation:

<a href="../../../features/newFeatures.htm">
and other features</a>

Remember, this time we're starting out where the last link left us -- two directories down inside of departments -- so most of this trip consists of climbing back up to the main someplace directory, after which we dive down again into features. This transition also has five stages, this time in reverse:

Stage 1 ..
move up from documents to planetX
Stage 2 /..
move up from planetX to departments
Stage 3 /..
move up from departments to the root level
Stage 4 /features
move over to features
Stage 5 /newFeatures.htm
open the page newFeatures.htm within features
2.6.4 · Local Link with Anchored Arrival

This third example differs significantly from the first two. In this case we want to direct the link not to another page in general, but to a specific point within that page.

Say I want to link from the phrase "electronic commerce" within the page scam_4.htm to the phrase "future of publishing," on the page newFeatures.htm. To do this I need two anchor tags, one at the departure point and the other at the arrival point. Let's begin with the anchor container at the departure point:

<a href="../../../features/newFeatures.htm#future">
electronic commerce</a>

The string #future is shown in red here for visibility; this crucial addition makes the anchor container into the first half of a two-part link.

Notice that #future comes after the ".htm" extension in the name of the page file. It may seem bad form to add an extension to an extension, but this is how the HTML specifications require anchored links to be written.

The additional string at the end of the departure anchor is all you need to point your link toward a specific destination, though you still need to create the appropriate arrival construction on the destination page newFeatures.htm. The code you need looks like this:

<a name="future">the future of Web publishing</a>

So what is the effect of this coordinated pair of anchor containers -- or as we might also call it, a doubly-anchored link? Read on.


2.7 | DOUBLY-ANCHORED LINKS


2.7.1 · How Departure-Only Links Differ from Departure-Arrival Links

When readers follow a simple link--that is, a link written without a particular arrival point specified -- the current reading point shifts from the location of the departure cue to the beginning or top of the page to which the link refers. In the absence of arrival anchors, links always point to the top of a page.

However, when readers follow a link with a specified arrival point (hereafter known as a doubly-anchored link), the reading point shifts from the departure location to a specfied point within the arrival page.

2.7.2 · name Attribute in the Arrival Anchor

The arrival point is first designated in the additional string following the "#" symbol in the departure anchor -- in our example it was the word "future," but it could have been anything at all.

There is only one restriction on the arrival anchor: It must be written exactly the same way each time it is used. That is, the value assigned to the name attribute in the arrival anchor container (on the destination page) must precisely match the value used in the departure anchor (the material following the # in the outgoing link code) on the departure page.

Note that this is one of the few instances where upper and lowercase letters are treated differently -- "myVacation" does not match "myvacation".
2.7.3 · Treatment of the Arrival Cue

While the comments above may help you understand arrival links in general, we haven't yet fully explained the function of arrival cues -- the textual or graphic material that is placed inside the arrival anchor container:

<a name="someString">an arrival cue</a>

Very early browsers (e.g., the near-legendary Mosaic) highlighted the arrival cue when a doubly-anchored link was followed. For some reason, though, this feature does not appear in most contemporary browsers.

Current major browsers treat arrival cues in this way: they redraw the display so that the cue is as close as possible to the top of the current window; if the destination page is scrollable, they reset the scroll point to bring the arrival cue onto the first line of the window.

So in virtually all Web browsers, arrival cues determine scroll position--obviously, if there is too little on the page to scroll, the arrival cue is irrelevant.

2.7.4 · Effective Arrival Cues

Why worry about arrival cues if they're so obscure?

If you're creating long, scrolling documents with multiple divisions, like the page you're reading now, then you might want to include links to allow readers to jump to other sections of the document (such as this link to Section 2.7.3).

To be most useful, arrival cues sometimes must be set a few lines above the material or section to which they refer, so that readers can orient themselves to the context. For instance, if a paragraph is preceded by a heading it is better to set the arrival cue on the heading rather than the first line of the paragraph, since in the latter case the heading would end up above the scroll point.

View the markup of this document and note the position of the arrival cues in this and the previous section (2.7.3).
2.7.5 · Arrival Links on External Pages

So far we have discussed specific links only within a single Web page (this one) and between two pages on the same Web server (in Section 2.6.4); but what about pages on other servers?

Here we run into a major limitation of the World Wide Web as a hypertext system -- readers cannot contribute any information, including arrival anchors, to external pages. Unless you have write privileges on the server in question, all your external links must be of the simple variety, pointing to the top of an external page.

Again, it's time for a bit of historical correction. When we wrote the paragraph above, there was no such thing as Wikis, and Web 2.0 was merely a dream. Wikis are Web-based document systems that do allow users to modify the contents of pages, if not their markup structure. Many developments loosely grouped under the concept of Web 2.0 also envision much more powerful user control of information. What we say here remains true as far as HTML 4.0 is concerned: if you don't have write privileges to the server, you can't change the code; but as Internet technologies develop, ways will increasingly be found around this limitation.

Even stodgy old Web 1.0 offers three possible solutions:

Existing anchors
The author of the page may have written an appropriate arrival anchor for one of her own links; this isn't always the case but it's worth checking the page source to find out.

Social engineering
Write to the author of the page and ask for a new anchor.

Proxy and link servers
The best solution lies with systems that maintain databases of virtual links through which readers can create their own links to be added to third-party representations of Web documents. Once upon a time, a company called ThirdVoicecreated a service allowing users to create annotations for Web sites which were stored on ThirdVoice's server and then visually incorporated with the page when the user (or a collaborator) called it up. Though this idea did not yield the financial returns expected of it (remember those DotCom days?), the concept of proxy serving survives.

2.7.6 · Combined Departure/Arrival Structures

Finally, a small but elegant technical feature: you may include both the href attribute and the name attributes in a single anchor container, allowing the cue within that container to function both as a departure and an arrival point.

The link cue in the title of this section ("Departure/Arrival") is such a double operator; here's what it looks like:

<a name="bothways" href="#bothwaysback">
Departure/Arrival</a>

View the markup for this page if you'd like a closer look.

END OF PART 2

Index · Part 1 · Part 3 · Part 4 · Part 5 · Part 6 · Part 7 · Part 8 · Part 9 · Top



Course and Materials ©1997 - 2008 by Stuart Moulthrop and Nancy Kaplan