2 Introduction to HTML 4.0

2.1 What is the World Wide Web?

The World Wide Web (Web) is a network of information resources. In order to make these resources available to the widest audience, three mechanisms are used on the Web:

  1. A common naming scheme for searching resources on the Web (for example, a URI).
  2. Protocols for accessing named resources via the Web (for example, HTTP).
  3. Hypertext for simple resource navigation (for example, HTML).

The links between these three mechanisms become apparent upon reading this specification.

2.1.1 Introduction to the URI

Each resource in the Web is an HTML document, an image, a video clip, a program, etc. - has an address that can be encoded using a Universal Resource Identifier (URI).

URIs usually consist of three parts:

  1. Scheme of the name of the mechanism used to access the resource.
  2. The name of the machine on which the resource is located.
  3. The name of the resource itself, which is in the form of a path.

Consider the URI of this HTML specification on the W3C server:

  Http://www.w3.org/TR/PR-html4/cover.shtml

This URI can be read as follows: this document can be obtained via the HTTP protocol (see [RFC2068] ), it is located on the machine www.w3.org, the path to this document is "/TR/PR-html4/cover.shtml". In addition, in documents in HTML format you can see the "mailto" schemes for e-mail and "ftp" for the FTP protocol.

Here is another example of a URI. It refers to the user's mailbox:

  ... text ... Comments send <A href="mailto:[email protected]"> Joe Coulou </A>.

Note. Most readers are already familiar with the term "URL" , but do not know the term "URI". URLs form a subset of a more general URI name scheme.

2.1.2 Fragment identifiers

Some URIs point to the location within the resource. This type of URI ends with a "#" followed by a pointer ( fragment identifier ). For example, the following URI points to a fragment named section_2 :

 Http://somesite.com/html/top.shtml#section_2

2.1.3 Relative URIs

The relative URI does not contain information about the naming scheme. The path in it points to the resource on the machine on which the current document is located. Relative URIs can contain components of the relative path (for example, ".." means one level higher in the hierarchy ) and fragment identifiers .

Relative URIs are brought to the full URI using the base URI. As an example of bringing a relative URI, suppose that we have a basic URI "http://www.acme.com/support/intro.shtml". The relative URI in the following link:

  <A href="suppliers.shtml"> Suppliers </A>

Will be converted to the full URI "http://www.acme.com/support/suppliers.shtml", and the relative URI in the following snippet

  <IMG src = "../ icons / logo.gif" alt = "logo">

Will be converted to a full URI "http://www.acme.com/icons/logo.gif".

In HTML, the URI is used to:

  • Links to other documents or resources (see elements A and LINK ).
  • Links to external style sheets or scripts (see LINK and SCRIPT elements).
  • Including images, objects or applets in the page (see IMG , OBJECT , APPLET and INPUT elements).
  • Create image maps (see MAP and AREA elements).
  • Sending forms (see FORM ).
  • Create documents using frames (see FRAME and IFRAME elements).
  • References to external sources (see elements Q , BLOCKQUOTE , INS and DEL ).
  • References to metadata agreements describing the document (see the HEAD element).

For more information about URIs, see the section on URI types.

2.2 What is HTML?

To provide information for global use, a universal language is needed that all computers would understand. The publication language used in the World Wide Web is HTML (HyperText Markup Language).

HTML gives authors the means to:

  • Publication of electronic documents with headings, text, tables, lists, photos, etc.
  • Download electronic information by clicking on a hypertext link.
  • Developing forms for performing transactions with remote services, for use in information retrieval, reservation, ordering of products, etc.
  • Including spreadsheets, video clips, sound clips and other applications directly into the documents.

2.2.1 A Brief History of HTML

The HTML language was developed by Tim Berners-Lee during his work in CERN and distributed by the Mosaic browser developed at NCSA. In the 1990s, he was particularly successful thanks to the rapid growth of the Web. At that time, HTML was expanded and expanded. On the Web, it is very important to use the same HTML agreements by Web authors and manufacturers. This was the reason for working together on the specifications of the HTML language.

HTML 2.0 (November 1995, see [RFC1866] ) was developed under the auspices of the Internet Engineering Task Force (IETF) to streamline the generally accepted provisions at the end of 1994. HTML + (1993) and HTML 3.0 (1995, see [HTML30] ) are richer versions of the HTML language. Despite the fact that in normal discussions consent was never reached, these drafts led to the adoption of a number of new properties. The efforts of the World Wide Web Consortium Working Group on HTML in ordering commonplace provisions in 1996 led to the version of HTML 3.2 (January 1997, see [HTML32] ). Changes in relation to HTML 3.2 are listed in Appendix A

Most people recognize that HTML documents must work in different browsers and on different platforms. Achieving compatibility reduces the cost of authors, since they can only develop one version of the document. Otherwise, there is an even greater risk that the Web will be a mixture of personal incompatible formats, which ultimately lead to a decrease in the Web's commercial potential for all participants.

In each version of HTML, an attempt was made to reflect an increasing number of agreements between employees and users of this industry, so that the efforts of the authors would not be wasted, and their documents would not become unreadable in a short time.

The HTML language was developed from the point of view that all types of devices should be able to use information on the Web: personal computers with graphical displays with different resolutions and colors, cell phones, portable devices, devices for output and input of speech, computers with high and low Frequency, etc.

2.3 HTML 4.0

HTML 4.0 introduces the mechanisms of style sheets, scripts, frames, object injection, improved support for different directions of the letter and directions from right to left, tables with more features and new properties of forms, providing better accessibility for people with physical disabilities.

2.3.1 Internationalization

This version of HTML is developed with the help of experts in the field of internationalization, so that documents can be written in any language and easily transferred to all over the world. This is achieved through the use of [RFC2070] relating to the internationalization of HTML.

An important step was the adoption of the ISO / IEC standard: 10646 (see [ISO10646] ) as a character set for HTML documents. This is the most informative standard in the world, where questions of representation of national symbols, direction of writing, punctuation and other language issues are solved.

HTML now provides better support for different languages ​​in one document. This provides more efficient indexing of documents for search engines, super-quality printing, text-to-speech, more convenient hyphenation, etc.

2.3.2 Availability

As the Web community grows and the capabilities and skills of its members vary, it is very important that the core technologies meet the needs. The HTML language is designed to make Web pages more accessible to users with physical disabilities. In HTML 4.0, there are the following additions, dictated by considerations of availability:

  • Enhanced separation of the structure and presentation of the document, which encourages the use of style sheets instead of HTML elements and attributes.
  • Improved forms, the ability to assign access keys, the possibility of semantic grouping of control elements of the form, the semantic grouping of options in the SELECT tag, and active labels are included.
  • Added the ability to markup a text description of the included object (using the OBJECT element).
  • The new mechanism of action of images-cards on the client side ( MAP element) is introduced, which allows authors to integrate images and text links.
  • Alternative text for images included with the IMG element is required.
  • Added support for title and lang attributes in all elements.
  • Added support for ABBR and ACRONYM elements .
  • A wider range of target devices (teletype, Broyl's font, etc.) for use in style sheets.
  • Improved tables, support for headers, column groups and mechanisms to simplify non-visual representation of the document.
  • Long descriptions of tables, images, frames, etc. were added.

Authors who develop pages based on availability will not only get this opportunity, but also some others: well-designed HTML documents with a split structure and presentation will be easier to adapt to new technologies.

Note. You can read more about developing available HTML documents in [WAIGUIDE] .

2.3.3 Tables

The new table model in HTML is based on [RFC1942] . Now the authors have more power over the structure and layout of the table (for example, a group of columns). The ability of designers to recommend column widths allows user agents to display table data gradually (as they are received) and not wait for the entire table before creating the image.

Note. At the time of writing this document, some HTML document development tools were used extensively to format the pages of the table , which caused compatibility problems.

2.3.4 Compound documents

HTML now has a standard mechanism for implementing objects and applications in HTML documents. The OBJECT element (as well as more specific elements, its successors, IMG and APPLET ) provides a mechanism for including images, video files, sound files, mathematical expressions, specialized applications and other objects into the document. It also allows authors to specify a hierarchy or an alternative way of creating an image for user agents that do not support the specified image creation method.

2.3.5 Style Sheets

Style sheets simplify HTML markup and significantly reduce the participation of HTML in the document view. They provide both authors and users with the ability to manage the presentation of documents - fonts, alignment, colors, etc.

You can specify style information for individual elements or groups of items, in an HTML document, or in external style sheets.

The mechanisms for linking style sheets to documents do not depend on the language of the style sheets.

Before the appearance of stylesheets, the authors were limited in their ability to control the creation of images. In HTML 3.2, a number of attributes and elements have been included to control the alignment, font size and color of the text. The authors also used to arrange the pages of the table and the image. Since users will take a long time to update their browsers, these tools will be used for some time. However, because style sheets provide more powerful presentation engines, the World Wide Web Consortium will significantly reduce the number of elements and presentation attributes in HTML. In this specification, items and attributes that can be subsequently excluded are marked as " undesirable ." They are accompanied by examples of half-achievement of the same effect with the help of other elements or style sheets.

2.3.6 Scripts

Using scripts, authors can create dynamic Web pages (for example, "smart forms" that change as they are filled by the user) and use HTML as a means of building network applications.

The mechanisms that enable the inclusion of scripts in HTML documents do not depend on the scripting language.

2.3.7 Printing

Sometimes authors want to simplify the printing of the current document for users. If the document is part of another document, the relationship between them can be described using an HTML LINK element or a W3C Resource Description Language (RDF) (see [RDF] ).

2.4 Creating HTML 4.0 documents

Authors and developers for working with HTML 4.0 are recommended to familiarize themselves with the following general principles .

2.4.1 Separation of structure and presentation

HTML comes from SGML, which has always been the language of defining structural markup. As the development of HTML, an increasing number of its elements and attributes for the presentation are replaced by other mechanisms, in particular, style sheets. Experience shows that separating the structure of a document from aspects of its presentation reduces the cost of maintaining a wide range of platforms, media, etc. And makes it easier to change documents.

2.4.2 Universal access to the Web

To make your Web server accessible to all users, especially for users with physical disabilities, the authors must assume how their documents can be displayed on various platforms: speech browsers, read programs of the Braille alphabet, etc. We do not recommend authors to limit the creative process, but it is recommended to provide alternative methods of information delivery. HTML offers a number of such mechanisms (for example, the alt attribute , the accesskey attribute , etc.)

Authors should also bear in mind that users of their documents can be contacted by another computer configuration. For correct interpretation of documents, authors should include in their documents information about the language and direction of the letter in the text, the encoding of the document and other similar information.

2.4.3 Assistance to user agents in sequential imaging

With careful development of tables and use of new features of HTML 4.0, authors can speed up the display of documents by user agents. Authors can read here about how to create tables for sequential representation (see the TABLE element). Developers can obtain information about sequential presentation algorithms in the notes about tables in the application.