HTML

In computing, HyperText Markup Language (HTML) is a markup language designed for the creation of web pages and other information viewable in a browser. HTML is used to structure information -- denoting certain text as headings, paragraphs, lists and so on -- and can be used to define the semantics of a document.

Originally defined by Tim Berners Lee and further developed by the IETF with a simplified SGML syntax, HTML is now an international standard (ISO/IEC 15445:2000). The HTML specification is maintained by the World Wide Web Consortium (W3C).

In terms of file extensions, HTML documents are frequently named ".HTM", a shortened version implemented in order to get the documents to display properly on DOS/Windows 3.1 systems. This variant conforms with the 8.3 limit on file naming which was a result of the File Allocation Table file system. While unnecessary for modern versions of Windows, the shortened form remains common by convention.

Early versions of HTML were defined with looser syntactical rules which helped its adoption by those unfamiliar with web publishing. Web browsers commonly made assumptions about intent and proceeded with rendering of the page. Over time, the trend in the official standards has been to create an increasingly strict language syntax; however, browsers still continue to render pages that are far from valid HTML. HTML 4.01 is the current version of the HTML specification, although the W3C is moving toward replacing it with XHTML, which applies the stricter rules of XML to HTML.

Introduction
HTML is a form of markup that is oriented toward the construction of single-page text documents with specialized rendering software called HTML user agents, the most common example of which is a web browser. HTML provides a means by which the document's content can be annotated with various kinds of metadata and rendering hints. The rendering cues may range from minor text decorations, such as specifying that a certain word be underlined or that an image be inserted, to sophisticated imagemaps and form definitions. The metadata may include information about the document's title and author, structural information such as headings, paragraphs, lists, and information that allows the document to be linked to other documents to form a hypertext web.

HTML is a text based format that is designed to be both readable and editable by humans using a text editor. However, writing and updating a large number of pages by hand in this way is time consuming, requires a good knowledge of HTML and can make consistency difficult to maintain. Visual HTML editors such as Macromedia Dreamweaver, Adobe GoLive or Microsoft FrontPage allow the creation of web pages to be treated much like word processor documents. However, the code generated by these programs is frequently of poor quality.

HTML can be generated on the fly using a server-side scripting system such as PHP, JSP or ASP. Many web applications like content management systems, wikis and web forums generate HTML pages.

HTML is also used in email. Many email clients include a GUI HTML editor for composing emails and a rendering engine for displaying them once received. Use of HTML in email is quite controversial, because of a variety of issues. The most obvious issue is size: an email with lots of formatting will be much larger than the plain text equivalent. This issue is intensified by the fact that for compatibility most clients send a plaintext version as well. Other issues are overuse of formatting (there was at one stage a craze for making letterheads using HTML and sending them as part of every e-mail) and the potential security issues of rendering a complex format like HTML. For these reasons many mailing lists deliberately block HTML email either stripping out the HTML part to just leave the plain text part or rejecting the entire message.

Version history of the standard

 * HTML 2.0 &mdash; published November 1995 as IETF RFC 1866, and declared obsolete/historic by RFC 2854 in June 2000
 * HTML 3.2 &mdash; published January 14, 1997 as a W3C Recommendation
 * HTML 4.0 &mdash; published December 18, 1997 as a W3C Recommendation
 * HTML 4.01 (minor fixes) &mdash; published December 24, 1999 as a W3C Recommendation
 * ISO/IEC 15445:2000 ("ISO HTML", based on HTML 4.01 Strict) &mdash; published May 15, 2000 as an ISO/IEC international standard

There is no official standard HTML 1.0 specification because there were multiple informal HTML standards at the time. However, some people consider the initial edition provided by Tim Berners-Lee to be the definitive HTML 1.0. That version did not include an IMG tag. Work on a successor for HTML, then called 'HTML+', began in late 1993, designed originally to be "A superset of HTML &hellip; which will allow a gradual rollover from the previous format of HTML". The first formal specification was therefore given the version number 2.0 in order to distinguish it from these unofficial "standards". Work on HTML+ continued, but this never became a standard.

The HTML 3.0 standard was proposed by the newly formed W3C in March 1995, and provided many new capabilities such as support for tables, text flow around figures and the display of complex math elements. Even though it was designed to be compatible with HTML 2.0, it was too complex at the time to be implemented, and when the draft expired in September 1995 work in this direction was discontinued due to lack of browser support. HTML 3.1 was never officially proposed, and the next standard proposal was HTML 3.2 (code-named 'Wilbur'), which dropped the majority of the new features in HTML 3.0 and instead adopted many browser-specific elements and attributes which had been created for the Netscape and Mosaic web browsers. Support for math as proposed by HTML 3.0 finally came about years later with a different standard, MathML.

HTML 4.0 likewise adopted many browser-specific elements and attributes, but at the same time began to try to 'clean up' the standard by marking some of them as deprecated, and suggesting they not be used.

Minor editorial revisions to the HTML 4.0 specification were published as HTML 4.01. Since the advent of XHTML, there will not be any more new versions of HTML. The most common extension for HTML is '.html', but previous operating systems limited file extensions to three letters, so a '.htm' extension was also once used, and is less common now but is still interpreted the same and works with most browsers.

Markup elements
Below are the kinds of markup elements in HTML.
 * Structural markup. Describes the purpose of text. For example,
 * directs the browser to render "Golf" as a second-level heading, similar to "Markup elements" at the start of this section. Structural markup does not denote any specific rendering, but most web browsers have standardised on how elements should be formatted. For example, by default headings like these will appear in large, bold text. Further styling should be done with Cascading Style Sheets (CSS).
 * directs the browser to render "Golf" as a second-level heading, similar to "Markup elements" at the start of this section. Structural markup does not denote any specific rendering, but most web browsers have standardised on how elements should be formatted. For example, by default headings like these will appear in large, bold text. Further styling should be done with Cascading Style Sheets (CSS).


 * Presentational markup. Describes the appearance of the text, regardless of its function. For example,
 * will render "boldface" in bold text. In the majority of cases, using presentational markup is inappropriate, and presentation should be controlled by using CSS.
 * will render "boldface" in bold text. In the majority of cases, using presentational markup is inappropriate, and presentation should be controlled by using CSS.


 * Hypertext markup. Links parts of the document to other documents. For example,
 * will render the word Wikipedia as a hyperlink to the specified URL.
 * will render the word Wikipedia as a hyperlink to the specified URL.

The Document Type Definition
In order to specify which version of the HTML standard they conform to, all HTML documents should start with a Document Type Declaration (informally, a "DOCTYPE"), which makes reference to a Document Type Definition (DTD). For example:



This declaration asserts that the document conforms to the Strict DTD of HTML 4.01, which is purely structural, leaving formatting to Cascading Style Sheets. In some cases, the presence or absence of an appropriate DTD may influence how a web browser will display the page.

In addition to the Strict DTD, HTML 4.01 provides Transitional and Frameset DTDs. The Transitional DTD was intended to gradually phase in the changes made in the Strict DTD, while the Frameset DTD was intended for those documents which contained frames.

Separation of style and content
Efforts of the web development community have led to a new thinking in the way a web document should be written; XHTML epitomizes this effort. Standards stress using markup which suggests the structure of the document, like headings, paragraphs, block quoted text, and tables, instead of using markup which is written for visual purposes only, like &lt;font&gt;, &lt;b&gt; (bold), and &lt;i&gt; (italics). Some of these elements are not permitted in certain varieties of HTML, like HTML 4.01 Strict. CSS provides a way to separate the HTML structure from the content's presentation, by keeping all code dealing with presentation defined in a CSS file. See separation of style and content.

W3C Specifications

 * HTML 4.01 Specification

Validators

 * W3C's Markup Validator
 * WDG HTML Validator
 * Validators and checkers (Site Check)

Selected Tutorials/Guides

 * HTMLSource: HTML Tutorials
 * HTML Dog