Hyper-Text Markup Language (HTML)
HTML is a document-layout and hyperlink-specification language. It defines the syntax and placement of special, embedded directions that aren’t displayed by the browser (unless you reveal the code by going Edit>View Source), but tell it how to display the contents of the document, including text, images, and other support media. The language also tells how to make a document interactive through special hypertext links, which connect documents with other documents on either your computer or someone else’s. The basic syntax and semantics of HTML are defined in the HTML standards, set by the World Wide Web Consortium (w3c.org).
Any new updates are usually held back from being implemented by the pace of software and web developers. The wait is for browser manufacturers to implement the new standards – often browsers interpret the changes differently or not at all. For web developers it means new learning curves which often takes time to bring the new technology into their workflow. As the web grows and continues to evolve, HTML alone has been found to be either limiting or too vague for specific and demanding needs.
Extensible Hyper-Text Markup Language (XHTML)
“XHTML” is a fusion of the terms “XML” (another basic language but with strict guidelines) and “HTML”. Overall, XHTML is the same as HTML but with stricter rules on how to format the code.
HTML Version 4.01 is not XML-compliant. Hence, the W3C offers XHTML, a reformulation of HTML to be compliant under XML. XHTML attempts to support every feature of HTML 4.01 using the more rigid rules of XML. It generally succeeds but has enough differences to make life difficult for the standards-conscious HTML author.
Confused? Don’t be. Learning XHTML is basically learning HTML but with a stricter set of rules. You will be able to recognize HTML if you know XHTML.
What about HTML 5?
HTML5 is just around the corner – in some cases it is already here. Some developers are using it even if it doesn’t work in all browsers yet, some developers are adding additional code for browsers that don’t accept HTML5, some are using only the safe codes from HTML5 that work in all browsers and others are still waiting until it just simply works.
HTML5 is bringing FLASH-like abilities to HTML.
What HTML and XHTML Are Not
With all their multimedia-enabling, new page layout features, and the hot technologies that give life to HTML/XHTML over the Internet, it is also important to understand their limitations.
They are not:
» word processing tools,
» desktop publishing solutions, or even
» programming languages in the strictest sense.
That’s because their fundamental purpose is to define the structure and basic appearance of documents so that they may be delivered quickly and easily to a user over a network for rendering on a variety of display devices. Jack of all trades, but master of none, so to speak.
Content versus Appearance
Before you can fully appreciate the power of a language and begin creating effective documents, you must yield to one fundamental rule. These markup languages are designed to structure documents and make their content more accessible, not to format documents for display purposes, i.e., fix their appearance in the way a book or magazine would.
With HTML and XHTML, content is paramount. Appearance—font specifications, line breaks, and multi column text—is secondary, since the code has no control over how exactly a browser would display the content, given the variety of browser graphics and text-formatting capabilities. Cascading Stylesheets (CSS) is the language used to apply styles to content. HTML/XHTML only have control over the way a document is structured by way of section headers, structured lists (ordered or unordered), paragraphs, rules, titles, links, and embedded images. Importantly, attempts to subvert the supplied structuring elements to achieve specific formatting tricks seldom work across all browsers. Therefore, don’t waste your time trying to force HTML and XHTML to do things they were never designed to do. Instead, use them in the manner for which they were designed, indicating the structure of a document so that the browser can then render its content appropriately.
HTML & XHTML Standards
To make things easier on web browsers and devices, Web Standards have been implemented. In theory if every web page was created using the same set of rules (follow web standards) – they should all be displayed properly without error. In reality – not everyone uses the standards (including software) and problems arise. Just think if everyone in the room wrote an article about Toronto, they would all be different. Trying to get web developers to all do the same thing is pretty much impossible.
For information on web standards one should refer to the World Wide Web Consortium, or W3C. There are “extensions” to the standard —changes to the HTML and XHTML coding capabilities, which may or may not be readily accepted and used by browser manufactures and developers/designers. An extension becomes “standard” once it becomes part of the standard.
NOTE: at the top of every XHTML / HTML page should be a DOC TYPE tag that defines which version of code the page is using and usually includes the address to the corresponding set of standards.
Extensions: Pro and Con
Vendors seek to make their products different and better than the competition’s. Netscape’s and Internet Explorer’s extensions to standard HTML were perfect examples of these market pressures. Many document authors feel safe using these extended browsers’ nonstandard extensions because of their combined and commanding share of users. For better or worse, extensions to HTML made by the folks at Netscape or Microsoft instantly become part of the street version of the language.
Fortunately, with HTML Version 4.0, the W3C standards caught up with the browser manufacturers. In fact, the tables turned somewhat. The many extensions to HTML that originally appeared as extensions in Netscape Navigator and Internet Explorer are now part of the HTML 4 and XHTML 1.0 standards, and there are other parts of the new standard that are not yet features of the popular browsers.
Nonstandard Extensions
A lot of people want you to use the latest and greatest whizbang features in order to stand out. However, even if the latest HTML “gimmick” is a useful extension, as long as not all browsers support it, and/or it is not part of the standard, you would be taking chances (on the accessibility of the site) using that extension.
Beyond Extensions: Exploiting Bugs
It is one thing to take advantage of an extension, and quite another to exploit known bugs in a particular version of a browser in order to achieve some eye-catching effect. Bugs are items within software that do not quite do what they are meant to do. Bugs can also appear in programming languages. Coders may stumble upon these bugs and decide that the result is a desired effect. However – because it is a bug, it probably will not work the same in all environments and ultimately, bugs are also often fixed which means the bug has a limited life span.
In that light, we can unequivocally offer this advice: never exploit a bug in a browser to achieve a particular effect in your documents – no matter how tempting it may be.
Document Structure: Tag Hierarchy
HTML and XHTML documents (may) consist of text, which provides the content of the document, and tags, which define the structure and appearance of the document.
The “bare bones” HTML structure is exemplified below:
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
<html xmlns=”http://www.w3.org/1999/xhtml”>
<head>
<title>Bare bones HTML Document</title>
</head>
<body>
< h1>Page Heading</h1>
<p>This illustrates, in a very simple way, the basic structure of an HTML document. </p>
</body>
</html>
The DOCTYPE tag simply states what version of HTML/XHTML is being used. It is usually generated by the web editor you are using (BBEdit, Dreamweaver).
An outer <html> tag encloses the document head and body, which are in turn delimited by the<head> and <body> tags. The head is where you give your document a title (the nested <title> tag) and where you indicate other parameters the browser may use when displaying the document. This information is not part of the actual contents, which are accommodated between an opening and closing body tags.
The body accommodates the text for display and document control markers, i.e., tags that advise the browser how to display the text. Tags within the body also reference special-effects files, including graphics and sound, and indicate the hot spots (hyperlinks and anchors) that link your document to other documents.
Tags and Their Attributes
the most part, tags—the markup elements of HTML and XHTML— are simple to understand and use, since they are made up of common words, abbreviations, and notations. For instance, the <p>… </p> pair tells the browser where to start and end a paragraph. The HTML and XHTML standards and their various extensions define how and where you place tags within a document, as well as the set and behavior of the attributes the tag is entitled to.
The Syntax of a Tag
Every tag consists of a tag name, followed by an optional list of tag attributes, all placed between opening and closing brackets, “<” and “>”. The attributes appear in the opening tag only but not in its closing counterpart. The simplest tag is nothing more than a name appropriately enclosed in brackets, such as <head> and <title>. More complicated tags contain one or more attributes, which specify/modify their behavior. According to the HTML standard, tag and attribute names are not case-sensitive. There’s no difference in effect between <head>, <Head>, <HEAD>, or even <HeaD>—they are all equivalent.
With XHTML, case is important—all current standard tag and attribute names are in lowercase. For both HTML and XHTML, the values that you assign to a particular attribute may be case-sensitive, depending on your browser and server. In particular, file location and name references—or uniform resource locators (URLs)—are case-sensitive.
Tag attributes, if any, belong after the tag name, each separated by one (or more) tab, space, or return characters. Their order of appearance is not important. A tag attribute’s value, if any, follows an equal sign “=” after the attribute name. You may include spaces around the equal sign, so that width=6, width = 6, width =6, and width= 6 all mean the same. For readability, however, we prefer not to include spaces. That way, it’s easier to pick out an attribute/value pair from a crowd of pairs in a lengthy tag. With HTML, if an attribute’s value is a single word or number (no spaces), you may simply add it after the equal sign. All other values should be enclosed in single or double quotation marks, especially those values that contain several words separated by spaces. With XHTML, all attribute values must be enclosed in double quotes. The length of the value is limited to 1024 characters.
Most browsers are tolerant of how tags are punctuated and broken across lines. Nonetheless, avoid breaking tags across lines in your source document whenever possible. This rule promotes readability and reduces potential errors.
Starting and Ending (a.k.a. “Closing” or “End”) Tags
We alluded earlier to the fact that most tags have a beginning and an end and affect the portion of content between them. That enclosed segment may be large or small, from a single text character, syllable, or word. The starting component of any tag is the tag name and its attributes, if any. The corresponding ending tag is the tag name alone, preceded by a slash. Ending tags have no attributes.
NOTE: XHTML rules state that all tags must close!
Nesting Tags: The Mirror Principle
According to the HTML and XHTML standards, you must end nested tags starting with the most recent one and work your way back out. For instance, in the example above, we end the title tag (</title>) before ending the head tag (</head>) since we started in the reverse order: <head> tag first, then <title> tag. It’s a good idea to follow that standard, even though most browsers don’t absolutely insist you do so. You may get away with violating the nesting rule for one browser, sometimes even with all current browsers. But eventually, a new browser version won’t allow the violation and you’ll be hard pressed to straighten out your source HTML document. Be aware that the XHTML standard explicitly forbids improper nesting.
Tags without Ends
According to the HTML standard, a few tags (sometimes referred to as “standalone” tags) do not have an ending tag. In fact, the HTML standard forbids use of an end tag for these special ones, although most browsers are lenient and ignore the errant end tag.
For example, the <br> tag causes a line break; it has otherwise no effect on the subsequent portion of the document, and hence does not need an ending tag.
Example: <br /> : old method <br>
XHTML always requires end tags. To modify a “standalone” tag to be compliant with XHTML, you add a forward slash “/” in front of the closing bracket “>”, so that technically you end up with a standalone tag, which however combines start and end notation. You often see documents in which the author seemingly has forgotten to include an ending tag in apparent violation of HTML standards. You may even see a missing <body> tag. But your browser doesn’t complain, and the document displays just fine.
What gives? The HTML standard lets you omit certain tags or their endings for clarity and ease of preparation. Its writers didn’t intend the language to be tedious.
For example, the <p> tag that defines the start of a paragraph has a corresponding end tag </p>, but the </p> ending tag rarely gets used. In fact, many HTML authors don’t even know it exists!
The HTML standard lets you omit a starting tag or an ending tag whenever it can be unambiguously inferred by the surrounding context. Many browsers make good guesses when confronted with missing tags, leading to the incorrect conclusion that the code is correct. We recommend that you consistently add the ending tag. It’ll make life easier for you as you transition to XHTML, as well as on the browser and on anyone who might need to modify your document in the future.
Ignored or Redundant Tags
HTML browsers sometimes ignore tags. This usually happens with redundant tags whose effects merely cancel or substitute for themselves. The best example is a series of <p> tags, one after the other, with no intervening content. Unlike the similar series of repeating return characters in a text-processing document, most browsers skip to a new line only once. The extra <p> tags are redundant and usually ignored by the browser. In addition, most HTML browsers ignore any tag that they don’t understand or that was incorrectly specified by the document author.
Browsers habitually forge ahead and make some sense of a document, no matter how badly formed and error-ridden it may be. This isn’t just a tactic to overcome errors; it’s also an important strategy for extensibility. Imagine how much harder it would be to add new features to the language if the existing base of browsers choked on them. The thing to watch out for with nonstandard tags that aren’t supported by most browsers is their enclosed contents, if any. Browsers that recognize the new tag may process those contents differently than those that don’t support the new tag. For example, Internet Explorer and Netscape Navigator now both support the <style> tag, whose contents serve to set the variety of display characteristics of your document. However, previous versions of the popular browsers, many of which are still in use by many people today, don’t support styles. Hence, older browsers ignore the <style> tag and render its contents on the user’s screen, effectively defeating the tag’s purpose in addition to ruining the document’s appearance.
Deprecated tags
As newer versions of code are introduced, so are newer methods of doing things. As a result some tags may be deprecated which means you should stop using them. They will still work if you do but you should move on to the newer methods. Another result s that a LOT of web sites that were created with older code still use older code as well as software allows older code to be used. What we will see is extra buttons and functions within BBEdit and Dreamweaver that we no longer use.
Resources
Exercises
- Read about XHTML at W3Schools.com
- Create a basic HTML page and save it as page1.html and upload to the server. Try viewing the file in your browser
- Prepare your resume on a text document to be used later with HTML – if you find you have the time you can create the page using basic HTML to structure the content.
Extra
A funny commentary on how bad Internet Explorer really is:
http://designmess.com/article/internet-explorer-will-be-downfall-man