6 Basic HTML data types

This section of the specification describes the basic data types that can be the contents of elements or the value of attributes.

For introductory information about reading the HTML DTD, see the SGML manual .

6.1 Information about the register

Each attribute definition includes information about register accounting by its values. Information about the register is represented by the following keys:

CS
The value takes into account the register (that is, the user agents differently interpret "a" and "A").
CI
The value does not take into account the register (that is, the user agents equally interpret "a" and "A").
CN
The value is case-insensitive, for example, because it is a number or character from the character set of the document.
CA
The very definition of an element or attribute provides information about the register.
CT
For more details on registering the register, see the type definition.

If the value of the attribute is a list, the keys are applied to each value in the list, unless otherwise specified.

6.2 Basic types of SGML

In the document type definition, the syntax of HTML element content and attribute values ​​is determined using SGML labels (for example, PCDATA, CDATA, NAME, ID, etc.). For complete definitions, see [ISO8879] . Here is a summary of the keys:

  • CDATA is a sequence of characters from a document's character set, it can include character entities. User agents should interpret attribute values ​​as follows:
    • Replace character entities with symbols,
    • Ignore line feed,
    • Replace each carriage return or tab by one space.

    User agents can ignore spaces at the beginning and end of the CDATA attribute values ​​(for example, "myval" is interpreted as "myval"). Authors should not declare attribute values ​​with spaces at the beginning or at the end.

    On some HTML 4.0 attributes with CDATA attribute values, the specification imposes additional restrictions on the set of valid attribute values ​​that are not expressed in the DTD.

    Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, user agents must handle the CDATA differently. Markup and entities should be considered text and sent to the application as is. The first occurrence of the "</" character string (which opens the end tag delimiter) is considered the end of the content of the element. In valid documents, this will be the end tag of the element.

  • The ID and NAME tags must begin with a letter ([A-Za-z]), followed by any number of letters, digits ([0-9]), hyphenation characters ("-"), underscores ("_") , Colons (":"), and periods (".").
  • IDREF and IDREFS are references to ID tags defined by other attributes. IDREF is a single label, and IDREFS is a space-separated list of labels.
  • The NUMBER tags must contain at least one digit ([0-9]).

6.3 Text strings

A number of attributes ( % Text; in DTD) accept text that is meant for reading by people. Introductory information about the attributes you can see in the discussion of attributes .

6.4 URI

In this specification, the term URI is used, as defined in [URI] (see also [RFC1630] ).

Remember that URIs include URLs (as defined in [ RFC1738 ] and [RFC1808] ).

Relative URIs are resolved to full URIs using the primary URI. [RFC1808] , section 3, where the regulatory algorithm of this process is defined. For more information about the basic URIs, see the basic URI in the chapter on references .

URIs are represented in the DTD by a % URI character combination ; .

URIs are generally case sensitive. There may be URIs, or parts of URIs in which the register does not matter (for example, machine names), but identifying them may not be easy. Users should always assume that the URIs are case-sensitive (so as not to be mistaken).

For information about non-ASCII characters in the URI attribute values, see the appendix.

6.5 Colors

The value of the "color" attribute ( % Color; ) refers to the color definitions, as specified in [SRGB] . The color value can be a hexadecimal number (preceded by a sharp sign) or one of the following sixteen color names. Color names are case-sensitive.

Color names and RGB values
Black = "# 000000" Green = "# 008000"
Silver = "# C0C0C0" Lime = # 00FF00 "
Gray = "# 808080" Olive = "# 808000"
White = "#FFFFFF" Yellow = "# FFFF00"
Maroon = "# 800000" Navy = # 000080 "
Red = "# FF0000" Blue = "# 0000FF"
Purple = "# 800080" Teal = "# 008080"
Fuchsia = "# FF00FF" Aqua = "# 00FFFF"

That is, the values ​​"# 800080" and "Purple" both mean a purple color.

6.5.1 Notes on the use of colors

Although colors can significantly add information to the document and improve the convenience of reading, when using colors, keep in mind the following basic principles:

  • Use of elements and attributes of HTML for color indication is undesirable . Instead, you should use style sheets .
  • Do not use color combinations that cause problems for users.
  • If you use an image as a background or set a background color, do not forget to set the color of the text.
  • The colors specified in the BODY and FONT elements and in the bgcolor in the tables are ironed differently on different platforms (on workstations, Mac, Windows and LCD and CRT panels), so do not count on a certain effect. In the future, the support of the color model [SRGB], together with the ICC color profiles, should eliminate these problems.
  • If possible, accept general agreements.

6.6 Lengths

HTML defines three types of length values ​​for attributes:

  1. Pixels : The value ( % Pixels; in DTD) is an integer representing the number of pixels (on the screen, on paper). Thus, the value "50" means fifty pixels. For regulatory information about specifying a pixel, see [CSS1] .
  2. Length : The value ( % Length; in the DTD) can be% Pixel; Or the percentage of vertical or horizontal distance in percent. Thus, the value "50%" means half the available space.
  3. MultiLength : The value ( % MultiLength; in the DTD) can be% Length; Or relative length . The relative length has the form "i *", where "i" is an integer. When allocating space between elements competing for this space, user agents first allocate space for the lengths defined in pixels and percentages, and then divide the remaining space between relative lengths. Each relative length receives a portion of the available space proportional to the integer preceding the "*". The value of "*" is equivalent to "1 *". Thus, if there are 60 pixels of space after the user agent allocates space for the lengths defined in pixels and percentages, and the competing relative lengths are 1 *, 2 * and 3 *, 1 * will receive 10 pixels, 2 * - 20 pixels , And 3 * - 30 pixels.

The lengths are case-insensitive.

6.7 Content Types (MIME Types)

Note. "Media Type" (defined in [RFC2045] and [RFC2046] ) indicates the nature of the associated resource. This specification uses the term "content type" instead of "media type" in accordance with its use. Moreover, in this specification, the "media type" can mean the medium on which the user agent generates the document.

This type is represented in the DTD using % ContentType ;.

Content types are case- sensitive.

Examples of content types include "text / html", "image / png", "image / gif", "video / mpeg", "audio / basic", "text / tcl", "text / javascript" and "text / vbscript ". For the current list of registered MIME types, see [MIMETYPES].

Note. The "text / css" content type, although it is not registered in the IANA, should be used if the associated element is the style sheet [CSS1] .

6.8 Language Codes

The values ​​of the attributes whose type is the language code ( % LanguageCode in the DTD) refers to the language code, as specified in [RFC1766] , section 2. For information on specifying the language codes in HTML, see the language codes section. In language codes, spaces are not allowed.

Language codes are case-sensitive.

6.9 Character encodings

The attributes "charset" ( % Charset in DTD) refer to character encodings , as described in the character encoding section. The values ​​must be strings (for example, "euc-jp") from the IANA registry (see [CHARSETS] for a complete list).

Character encoding names are case-sensitive.

User agents to determine the character encoding of an external resource must perform the steps described in the section on specifying character encodings .

6.10 Single characters

Certain attributes call a single character from the document character set . These attributes are of type % Character in the DTD.

Individual characters can be specified using character references (for example, "& amp;").

6.11 Date and time

[ISO8601] allows many options for presenting date and time. The current specification uses one of the formats described in the [DATETIME] profile to define valid date / time rows ( % Datetime in the DTD).

This is the following format:

  YYYY-MM-DDThh: mm: ssChUP
Where:
  YYYY = year of four digits
  MM = month of two digits (01 = January, etc.)
  DD = day of two digits (01 - 31)
  Hh = two digits of hours (00 - 23) (up to / np NOT allowed)
  Mm = two digits of minutes (00 - 59)
  Ss = two digits of seconds (00 to 59)
  УЧП = time zone indicator

Time zone indicators:

Z
Means UTC (General Coordinated Time). "Z" must be uppercase.
+ Hh: mm
Indicates that the local time is at hh hours and mm minutes from UTC ahead.
-h: mm
Indicates that the local time is behind on hh hours and mm minutes from UTC.

These components should be present exactly, with exactly the same punctuation. Remember that the letter "T" is displayed in the line literally (it must be in uppercase), to indicate the start time, as described in [ISO8601]

If the generating application does not know the time to within a second, "00" (for minutes and for hours, if necessary) can be used for seconds.

Note. [DATETIME] does not apply for extra seconds.

6.12 Reference types

Authors can use the following recognizable types of references listed here along with conditional interpretations. In the DTD, % LinkTypes means a list of link types separated by spaces. Spacing characters in link types are not allowed.

These types of references do not take into account the register, i.e. "Alternate" means the same as "alternate".

User agents, search engines, etc. Can interpret these types of links in several ways. For example, user agents can provide access to linked documents using the navigation bar.

Alternate
Indicates alternative versions of the document in which the link is located. Together with the lang attribute, it means a translated version of the document. Together with the media attribute, it means the version created for another media.
Stylesheet
Indicates an external style sheet. For more information, see the section on external style sheets . Used with the "Alternate" link type for style sheets that are user selectable.
Start
Indicates the first document in the set. This type of link informs the search engines which document the author considers to be the beginning of the set.
Next
Indicates the next document in a linear sequence of documents. User agents can preload the "next" document to reduce download time.
Prev
Denotes the previous document in an ordered series of documents. Some user agents also support the synonym "Previous".
Contents
Indicates a document serving content. Some user agents also support the ToC synonym (from "Table of Contents").
Index
Indicates a document that is the index of the current document.
Glossary
Denotes a document - a glossary of terms related to the current document.
Copyright
Indicates a copyright notice for the current document.
Chapter
Indicates a document that is the chapter in the set of documents.
Section
Indicates a document that is a section in a set of documents.
Subsection
Indicates a document that is a subsection in the set of documents.
Appendix
Indicates a document that is an application in a set of documents.
Help
Indicates a document containing a reference (more detailed information, links to other information resources, etc.)
Bookmark and Share
Indicates a bookmark. A bookmark is a link to a key point in an extended document. The title attribute can be used, for example, to mark a bookmark. Remember that in each document you can define several bookmarks.

Authors can define additional reference types that are not described in this specification. In doing so, they must use a profile to indicate the conventions used to determine the types of links. For more information, see the profile attribute of the HEAD element.

For further discussion of reference types, see the reference section in HTML documents .

6.13 Carrier Descriptors

Below is a list of recognizable media descriptors ( % MediaDesc in DTD).

Screen
Designed for computer screens that are not divided into pages.
Tty
It is intended for a carrier with a fixed grid for symbols, such as teletypes, terminals or portable devices with limited display capabilities.
Tv
Designed for devices such as TV (low resolution, colors, limited scrolling capabilities).
Projection
Designed for projectors.
Handheld
Designed for handheld devices (small screen, monochrome, raster graphics, limited range).
Print
Designed for page, opaque materials and documents that are viewed on the screen in print preview mode.
Braille
Designed for tactile devices with the Broyles alphabet.
Aural
Designed for speech synthesizers.
All
For all devices.

In future versions of HTML, new values ​​can be entered and parameterized values ​​are allowed. To simplify the introduction of these extensions, the corresponding user agent specifications should be able to analyze the value of the media attribute as follows:

  1. A value is a comma-separated list of items. For example,
     Media = "screen, 3d-glasses, print and resolution> 90dpi"
    

    Displayed in:

     "Screen"
     "3d-glasses"
     "Print and resolution> 90dpi"
    
  2. Each element is truncated before the first character that is not the letter of the US ASCII encoding [a-zA-Z] (Unicode decimal codes 65-90, 97-122), the number [0-9] (hexadecimal Unicode codes 30-39) (45). In this example, you get:
     "Screen"
     "3d-glasses"
     "Print"
    
  3. Then , taking into account the register, a reconciliation with a set of the types of descriptors defined above is carried out. User agents can ignore non-matching elements. In this example, only the screen and print elements remain.

Note. Style sheets can include variations depending on the media (for example, the design of CSS @media ). In such cases it makes sense to use " media = all" .

6.14 Scenario data

The script data ( % Script; in the DTD ) can be the contents of the SCRIPT element and the value of the attributes of internal events . User agents should not evaluate script data in HTML markup, but must pass this data to the script engine.

Register accounting in these scenarios depends on the scripting language.

Remember that script data that is the contents of an element can not contain references to symbols , but script data that is an attribute value can. The appendix provides information about specifying data other than HTML data .

6.15 Data of style sheets

The style sheet data ( % StyleSheet; in the DTD ) can be the contents of the STYLE element and the value of the style attribute. User agents should not evaluate data styles in HTML markup.

The registration of the style data register depends on the language of the style sheets.

Remember that style sheet data that is the content of an element can not include symbol references, but style sheet data that is an attribute value can include them. The appendix provides further information about specifying data other than HTML data .

6.16 Target frame names

Except for the reserved names listed below, the target frame names ( % FrameTarget; in the DTD) must begin with alphabetic characters (a-zA-Z). User agents should ignore all other names.

The following target names are reserved and have special values.

_blank
User agents must download the document to a new window without a name.
_self
User agents must upload the document to the same frame as the document referring to it.
_parent
User agents must load the document into the immediate parent frame of this frame in the FRAMESET . This value is equivalent to _self if the current frame does not have a parent frame.
_top
User agents should upload the document to the full window (closing all other frames). This value is equivalent to _self if the current frame does not have a parent frame.