Move a few obsolete RFCs to the Attic

2026-02-12 07:13:41 -05:00 · 2000-02-07 05:48:17 +00:00 · 2000-02-07 05:48:17 +00:00 · eeefab745c
commit eeefab745c
parent bc51bd5180
3 changed files with 0 additions and 1129 deletions
--- a/doc/rfc/rfc1488.txt
+++ b/doc/rfc/rfc1488.txt
@ -1,619 +0,0 @@
-
-
-
-
-
-
-Network Working Group                                         T. Howes
-Request for Comments: 1488                      University of Michigan
-                                                              S. Kille
-                                                      ISODE Consortium
-                                                              W. Yeong
-                                     Performance Systems International
-                                                            C. Robbins
-                                                            NeXor Ltd.
-                                                             July 1993
-
-
-     The X.500 String Representation of Standard Attribute Syntaxes
-
-Status of this Memo
-
-   This RFC specifies an IAB standards track protocol for the Internet
-   community, and requests discussion and suggestions for improvements.
-   Please refer to the current edition of the "IAB Official Protocol
-   Standards" for the standardization state and status of this protocol.
-   Distribution of this memo is unlimited.
-
-Abstract
-
-   The Lightweight Directory Access Protocol (LDAP) [9] requires that
-   the contents of AttributeValue fields in protocol elements be octet
-   strings.  This document defines the requirements that must be
-   satisfied by encoding rules used to render Directory attribute
-   syntaxes into a form suitable for use in the LDAP, then goes on to
-   define the encoding rules for the standard set of attribute syntaxes
-   defined in [1,2] and [3].
-
-1.  Attribute Syntax Encoding Requirements
-
-   This section defines general requirements for lightweight directory
-   protocol attribute syntax encodings. All documents defining attribute
-   syntax encodings for use by the lightweight directory protocols are
-   expected to conform to these requirements.
-
-   The encoding rules defined for a given attribute syntax must produce
-   octet strings.  To the greatest extent possible, encoded octet
-   strings should be usable in their native encoded form for display
-   purposes. In particular, encoding rules for attribute syntaxes
-   defining non-binary values should produce strings that can be
-   displayed with little or no translation by clients implementing the
-   lightweight directory protocols.
-
-
-
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 1]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-2.  Standard Attribute Syntax Encodings
-
-   For the purposes of defining the encoding rules for the standard
-   attribute syntaxes, the following auxiliary BNF definitions will be
-   used:
-
-     <a> ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' |
-             'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' |
-             's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | 'A' |
-             'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' |
-             'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' |
-             'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z'
-
-     <d> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
-
-     <hex-digit> ::= <d> | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' |
-                      'A' | 'B' | 'C' | 'D' | 'E' | 'F'
-
-     <k> ::= <a> | <d> | '-'
-
-     <p> ::= <a> | <d> | ''' | '(' | ')' | '+' | ',' | '-' | '.' |
-             '/' | ':' | '?' | ' '
-
-     <CRLF> ::= The ASCII newline character with hexadecimal value 0x0A
-
-     <letterstring> ::= <a> | <a> <letterstring>
-
-     <numericstring> ::= <d> | <d> <numericstring>
-
-     <keystring> ::= <a> | <a> <anhstring>
-
-     <anhstring> ::= <k> | <k> <anhstring>
-
-     <printablestring> ::= <p> | <p> <printablestring>
-
-     <space> ::= ' ' | ' ' <space>
-
-2.1.  Undefined
-
-   Values of type Undefined are encoded as if they were values of type
-   Octet String.
-
-2.2.  Case Ignore String
-
-   A string of type caseIgnoreStringSyntax is encoded as the string
-   value itself.
-
-
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 2]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-2.3.  Case Exact String
-
-   The encoding of a string of type caseExactStringSyntax is the string
-   value itself.
-
-2.4.  Printable String
-
-   The encoding of a string of type printableStringSyntax is the string
-   value itself.
-
-2.5.  Numeric String
-
-   The encoding of a string of type numericStringSyntax is the string
-   value itself.
-
-2.6.  Octet String
-
-   The encoding of a string of type octetStringSyntax is the string
-   value itself.
-
-2.7.  Case Ignore IA5 String
-
-   The encoding of a string of type caseIgnoreIA5String is the string
-   value itself.
-
-2.8.  IA5 String
-
-   The encoding of a string of type iA5StringSyntax is the string value
-   itself.
-
-2.9.  T61 String
-
-   The encoding of a string of type t61StringSyntax is the string value
-   itself.
-
-2.10.  Case Ignore List
-
-   Values of type caseIgnoreListSyntax are encoded according to the
-   following BNF:
-
-     <caseignorelist> ::= <caseignorestring> |
-                          <caseignorestring> '$' <caseignorelist>
-
-     <caseignorestring> ::= a string encoded according to the rules
-                             for Case Ignore String as above.
-
-
-
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 3]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-2.11.  Case Exact List
-
-   Values of type caseExactListSyntax are encoded according to the
-   following BNF:
-
-     <caseexactlist> ::= <caseexactstring> |
-                          <caseexactstring> '$' <caseexactlist>
-
-     <caseexactstring> ::= a string encoded according to the rules for
-                            Case Exact String as above.
-
-2.12.  Distinguished Name
-
-   Values of type distinguishedNameSyntax are encoded to have the
-   representation defined in [5].
-
-2.13.  Boolean
-
-   Values of type booleanSyntax are encoded according to the following
-   BNF:
-
-     <boolean> ::= "TRUE" | "FALSE"
-
-   Boolean values have an encoding of "TRUE" if they are logically true,
-   and have an encoding of "FALSE" otherwise.
-
-2.14.  Integer
-
-   Values of type integerSyntax are encoded as the decimal
-   representation of their values, with each decimal digit represented
-   by the its character equivalent. So the digit 1 is represented by the
-   character
-
-2.15.  Object Identifier
-
-   Values of type objectIdentifierSyntax are encoded according to the
-   following BNF:
-
-     <oid> ::= <descr> | <descr> '.' <numericoid> | <numericoid>
-
-     <descr> ::= <keystring>
-
-     <numericoid> ::= <numericstring> | <numericstring> '.' <numericoid>
-
-   In the above BNF, <descr> is the syntactic representation of an
-   object descriptor. When encoding values of type
-   objectIdentifierSyntax, the first encoding option should be used in
-   preference to the second, which should be used in preference to the
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 4]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-   third wherever possible. That is, in encoding object identifiers,
-   object descriptors (where assigned and known by the implementation)
-   should be used in preference to numeric oids to the greatest extent
-   possible. For example, in encoding the object identifier representing
-   an organizationName, the descriptor "organizationName" is preferable
-   to "ds.4.10", which is in turn preferable to the string "2.5.4.10".
-
-2.16.  Telephone Number
-
-   Values of type telephoneNumberSyntax are encoded as if they were
-   Printable String types.
-
-2.17.  Telex Number
-
-   Values of type telexNumberSyntax are encoded according to the
-   following BNF:
-
-     <telex-number> ::= <actual-number> '$' <country> '$' <answerback>
-
-     <actual-number> ::= <printablestring>
-
-     <country> ::= <printablestring>
-
-     <answerback> ::= <printablestring>
-
-   In the above, <actual-number> is the syntactic representation of the
-   number portion of the TELEX number being encoded, <country> is the
-   TELEX country code, and <answerback> is the answerback code of a
-   TELEX terminal.
-
-2.18.  Teletex Terminal Identifier
-
-   Values of type teletexTerminalIdentifier are encoded according to the
-   following BNF:
-
-     <teletex-id> ::= <printablestring> 0*( '$' <printablestring>)
-
-   In the above, the first <printablestring> is the encoding of the
-   first portion of the teletex terminal identifier to be encoded, and
-   the subsequent 0 or more <printablestrings> are subsequent portions
-   of the teletex terminal identifier.
-
-2.19.  Facsimile Telephone Number
-
-   Values of type FacsimileTelephoneNumber are encoded according to the
-   following BNF:
-
- <fax-number> ::= <printablestring> [ '$' <faxparameters> ]
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 5]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
- <faxparameters> ::= <faxparm> | <faxparm> '$' <faxparameters>
-
- <faxparm> ::= 'twoDimensional' | 'fineResolution' | 'unlimitedLength' |
-               'b4Length' | 'a3Width' | 'b4Width' | 'uncompressed'
-
-   In the above, the first <printablestring> is the actual fax number,
-   and the <faxparm> tokens represent fax parameters.
-
-2.20.  Presentation Address
-
-   Values of type PresentationAddress are encoded to have the
-   representation described in [6].
-
-2.21.  UTC Time
-
-   Values of type uTCTimeSyntax are encoded as if they were Printable
-   Strings with the strings containing a UTCTime value.
-
-2.22.  Guide (search guide)
-
-   Values of type Guide, such as values of the searchGuide attribute,
-   are encoded according to the following BNF:
-
-     <guide-value> ::= [ <object-class> '#' ] <criteria>
-
-     <object-class> ::= an encoded value of type objectIdentifierSyntax
-
-     <criteria> ::= <criteria-item> | <criteria-set> | '!' <criteria>
-
-     <criteria-set> ::= [ '(' ] <criteria> '&' <criteria-set> [ ')' ] |
-                        [ '(' ] <criteria> '|' <criteria-set> [ ')' ]
-
-     <criteria-item> ::= [ '(' ] <attributetype> '$' <match-type> [ ')' ]
-
-     <match-type> ::= "EQ" | "SUBSTR" | "GE" | "LE" | "APPROX"
-
-2.23.  Postal Address
-
-Values of type PostalAddress are encoded according to the following BNF:
-
-     <postal-address> ::= <t61string> | <t61string> '$' <postal-address>
-
-   In the above, each <t61string> component of a postal address value is
-   encoded as a value of type t61StringSyntax.
-
-
-
-
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 6]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-2.24.  User Password
-
-   Values of type userPasswordSyntax are encoded as if they were of type
-   octetStringSyntax.
-
-2.25.  User Certificate
-
-   Values of type userCertificate are encoded according to the following
-   BNF:
-
- <certificate> ::= <signature> '#' <issuer> '#' <validity> '#' <subject>
-                   '#' <public-key-info>
-
- <signature> ::= <algorithm-id>
-
- <issuer> ::= an encoded Distinguished Name
-
- <validity> ::= <not-before-time> '#' <not-after-time>
-
- <not-before-time> ::= <utc-time>
-
- <not-after-time> ::= <utc-time>
-
- <algorithm-parameters> ::=  <null> | <integervalue> |
-                             '{ASN}' <hex-string>
-
- <subject> ::= an encoded Distinguished Name
-
- <public-key-info> ::= <algorithm-id> '#' <encrypted-value>
-
- <encrypted-value> ::= <hex-string> | <hex-string> '-' <d>
-
- <algorithm-id> ::= <oid> '#' <algorithm-parameters>
-
- <utc-time> ::= an encoded UTCTime value
-
- <hex-string> ::= <hex-digit> | <hex-digit> <hex-string>
-
-2.26.  CA Certificate
-
-   Values of type cACertificate are encoded as if the values were of
-   type userCertificate.
-
-2.27.  Authority Revocation List
-
-   Values of type authorityRevocationList are encoded according to the
-   following BNF:
-
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 7]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-     <certificate-list> ::= <signature> '#' <issuer> '#'
-                            <utc-time> [ '#' <revoked-certificates> ]
-
-     <revoked-certificates> ::= <algorithm> '#' <encrypted-value>
-                                [ '#' 0*(<revoked-certificate>) '#']
-
-     <revoked-certificates> ::= <subject> '#' <algorithm> '#'
-                                <serial> '#' <utc-time>
-
-   The syntactic components <algorithm>, <issuer>, <encrypted-value>,
-   <utc-time>, <subject> and <serial> have the same definitions as in
-   the BNF for the userCertificate attribute syntax.
-
-2.28.  Certificate Revocation List
-
-   Values of type certificateRevocationList are encoded as if the values
-   were of type authorityRevocationList.
-
-2.29.  Cross Certificate Pair
-
-   Values of type crossCertificatePair are encoded according to the
-   following BNF:
-
-     <certificate-pair> ::= <certificate> '|' <certificate>
-
-   The syntactic component <certificate> has the same definition as in
-   the BNF for the userCertificate attribute syntax.
-
-2.30.  Delivery Method
-
-   Values of type deliveryMethod are encoded according to the following
-   BNF:
-
-     <delivery-value> ::= <pdm> | <pdm> '$' <delivery-value>
-
-     <pdm> ::= 'any' | 'mhs' | 'physical' | 'telex' | 'teletex' |
-               'g3fax' | 'g4fax' | 'ia5' | 'videotex' | 'telephone'
-
-2.31.  Other Mailbox
-
-   Values of the type otherMailboxSyntax are encoded according to the
-   following BNF:
-
-     <otherMailbox> ::= <mailbox-type> '$' <mailbox>
-
-     <mailbox-type> ::= an encoded Printable String
-
-     <mailbox> ::= an encoded IA5 String
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 8]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-   In the above, <mailbox-type> represents the type of mail system in
-   which the mailbox resides, for example "Internet" or "MCIMail"; and
-   <mailbox> is the actual mailbox in the mail system defined by
-   <mailbox-type>.
-
-2.32.  Mail Preference
-
-   Values of type mailPreferenceOption are encoded according to the
-   following BNF:
-
- <mail-preference> ::= "NO-LISTS" | "ANY-LIST" | "PROFESSIONAL-LISTS"
-
-2.33.  MHS OR Address
-
-   Values of type MHS OR Address are encoded as strings, according to
-   the format defined in [10].
-
-2.34.  Photo
-
-   Values of type Photo are encoded as if they were octet strings
-   containing JPEG images in the JPEG File Interchange Format (JFIF), as
-   described in [8].
-
-2.35.  Fax
-
-   Values of type Fax are encoded as if they were octet strings
-   containing Group 3 Fax images as defined in [7].
-
-3.  Acknowledgements
-
-   Many of the attribute syntax encodings defined in this document are
-   adapted from those used in the QUIPU X.500 implementation. The
-   contribu- tions of the authors of the QUIPU implementation in the
-   specification of the QUIPU syntaxes [4] are gratefully acknowledged.
-
-4.  Bibliography
-
-   [1] The Directory: Selected Attribute Syntaxes.  CCITT,
-       Recommendation X.520.
-
-   [2] Information Processing Systems -- Open Systems Interconnection --
-       The Directory: Selected Attribute Syntaxes.
-
-   [3] Barker, P., and S. Kille, "The COSINE and Internet X.500 Schema",
-       RFC 1274, University College London, November 1991.
-
-   [4] The ISO Development Environment: User's Manual -- Volume 5:
-       QUIPU.  Colin Robbins, Stephen E. Kille.
-
-
-
-Howes, Kille, Yeong & Robbins                                   [Page 9]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-   [5] Kille, S., "A String Representation of Distinguished Names", RFC
-       1485, July 1993.
-
-   [6] Kille, S., "A String Representation for Presentation Addresses",
-       RFC 1278, University College London, November 1991.
-
-   [7] Terminal Equipment and Protocols for Telematic Services -
-       Standardization of Group 3 facsimile apparatus for document
-       transmission.  CCITT, Recommendation T.4.
-
-   [8] JPEG File Interchange Format (Version 1.02).  Eric Hamilton, C-
-       Cube Microsystems, Milpitas, CA, September 1, 1992.
-
-   [9] Yeong, W., Howes, T., and S. Kille, "Lightweight Directory Access
-       Protocol", RFC 1487, Performance Systems International,
-       University of Michigan, ISODE Consortium, July 1993.
-
-  [10] Kille, S., "Mapping between X.400(1988)/ISO 10021 and RFC 822",
-       RFC 1327, University College London, May 1992.
-
-5.  Security Considerations
-
-   Security issues are not discussed in this memo.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Howes, Kille, Yeong & Robbins                                  [Page 10]
-
-RFC 1488                 X.500 Syntax Encoding                 July 1993
-
-
-6.  Authors' Addresses
-
-   Tim Howes
-   University of Michigan
-   ITD Research Systems
-   535 W William St.
-   Ann Arbor, MI 48103-4943
-   USA
-
-   Phone: +1 313 747-4454
-   EMail: tim@umich.edu
-
-
-   Steve Kille
-   ISODE Consortium
-   PO Box 505
-   London
-   SW11 1DX
-   UK
-
-   Phone: +44-71-223-4062
-   EMail: S.Kille@isode.com
-
-
-   Wengyik Yeong
-   PSI, Inc.
-   510 Huntmar Park Drive
-   Herndon, VA 22070
-   USA
-
-   Phone: +1 703-450-8001
-   EMail: yeongw@psilink.com
-
-
-   Colin Robbins
-   NeXor Ltd
-   University Park
-   Nottingham
-   NG7 2RD
-   UK
-
-
-
-
-
-
-
-
-
-
-
-Howes, Kille, Yeong & Robbins                                  [Page 11]
-
--- a/doc/rfc/rfc1558.txt
+++ b/doc/rfc/rfc1558.txt
@ -1,171 +0,0 @@
-
-
-
-
-
-
-Network Working Group                                           T. Howes
-Request for Comments: 1558                        University of Michigan
-Category: Informational                                    December 1993
-
-
-             A String Representation of LDAP Search Filters
-
-Status of this Memo
-
-   This memo provides information for the Internet community.  This memo
-   does not specify an Internet standard of any kind.  Distribution of
-   this memo is unlimited.
-
-Abstract
-
-   The Lightweight Directory Access Protocol (LDAP) [1] defines a
-   network representation of a search filter transmitted to an LDAP
-   server.  Some applications may find it useful to have a common way of
-   representing these search filters in a human-readable form.  This
-   document defines a human-readable string format for representing LDAP
-   search filters.
-
-1.  LDAP Search Filter Definition
-
-   An LDAP search filter is defined in [1] as follows:
-
-     Filter ::= CHOICE {
-             and                [0] SET OF Filter,
-             or                 [1] SET OF Filter,
-             not                [2] Filter,
-             equalityMatch      [3] AttributeValueAssertion,
-             substrings         [4] SubstringFilter,
-             greaterOrEqual     [5] AttributeValueAssertion,
-             lessOrEqual        [6] AttributeValueAssertion,
-             present            [7] AttributeType,
-             approxMatch        [8] AttributeValueAssertion
-     }
-
-     SubstringFilter ::= SEQUENCE {
-             type    AttributeType,
-             SEQUENCE OF CHOICE {
-                     initial        [0] LDAPString,
-                     any            [1] LDAPString,
-                     final          [2] LDAPString
-             }
-     }
-
-
-
-
-
-Howes                                                           [Page 1]
-
-RFC 1558             Representation of LDAP Filters        December 1993
-
-
-     AttributeValueAssertion ::= SEQUENCE
-             attributeType   AttributeType,
-             attributeValue  AttributeValue
-     }
-
-     AttributeType ::= LDAPString
-
-     AttributeValue ::= OCTET STRING
-
-     LDAPString ::= OCTET STRING
-
-   where the LDAPString above is limited to the IA5 character set.  The
-   AttributeType is a string representation of the attribute object
-   identifier in dotted OID format (e.g., "2.5.4.10"), or the shorter
-   string name of the attribute (e.g., "organizationName", or "o").  The
-   AttributeValue OCTET STRING has the form defined in [2].  The Filter
-   is encoded for transmission over a network using the Basic Encoding
-   Rules defined in [3], with simplifications described in [1].
-
-2.  String Search Filter Definition
-
-   The string representation of an LDAP search filter is defined by the
-   following BNF.  It uses a prefix format.
-
-     <filter> ::= '(' <filtercomp> ')'
-     <filtercomp> ::= <and> | <or> | <not> | <item>
-     <and> ::= '&' <filterlist>
-     <or> ::= '|' <filterlist>
-     <not> ::= '!' <filter>
-     <filterlist> ::= <filter> | <filter> <filterlist>
-     <item> ::= <simple> | <present> | <substring>
-     <simple> ::= <attr> <filtertype> <value>
-     <filtertype> ::= <equal> | <approx> | <greater> | <less>
-     <equal> ::= '='
-     <approx> ::= '~='
-     <greater> ::= '>='
-     <less> ::= '<='
-     <present> ::= <attr> '=*'
-     <substring> ::= <attr> '=' <initial> <any> <final>
-     <initial> ::= NULL | <value>
-     <any> ::= '*' <starval>
-     <starval> ::= NULL | <value> '*' <starval>
-     <final> ::= NULL | <value>
-
-   <attr> is a string representing an AttributeType, and has the format
-   defined in [1].  <value> is a string representing an AttributeValue,
-   or part of one, and has the form defined in [2].  If a <value> must
-   contain one of the characters '*' or '(' or ')', these characters
-
-
-
-Howes                                                           [Page 2]
-
-RFC 1558             Representation of LDAP Filters        December 1993
-
-
-   should be escaped by preceding them with the backslash '\' character.
-
-3.  Examples
-
-   This section gives a few examples of search filters written using
-   this notation.
-
-     (cn=Babs Jensen)
-     (!(cn=Tim Howes))
-     (&(objectClass=Person)(|(sn=Jensen)(cn=Babs J*)))
-     (o=univ*of*mich*)
-
-4.  Security Considerations
-
-   Security issues are not discussed in this memo.
-
-5.  References
-
-   [1] Yeong, W., Howes, T., and S. Kille, "Lightweight Directory Access
-       Protocol", RFC 1487, Performance Systems International,
-       University of Michigan, ISODE Consortium, July 1993.
-
-   [2] Howes, T., Kille, S., Yeong, W., and C. Robbins, "The String
-       Representation of Standard Attribute Syntaxes", RFC 1488,
-       University of Michigan, ISODE Consortium, Performance Systems
-       International, NeXor Ltd., July 1993.
-
-   [3] "Specification of Basic Encoding Rules for Abstract Syntax
-       Notation One (ASN.1)", CCITT Recommendation X.209, 1988.
-
-6.  Author's Address
-
-       Tim Howes
-       University of Michigan
-       ITD Research Systems
-       535 W William St.
-       Ann Arbor, MI 48103-4943
-       USA
-
-       Phone: +1 313 747-4454
-       EMail: tim@umich.edu
-
-
-
-
-
-
-
-
-
-
-Howes                                                           [Page 3]
-
--- a/doc/rfc/rfc2044.txt
+++ b/doc/rfc/rfc2044.txt
@ -1,339 +0,0 @@
-
-
-
-
-
-
-Network Working Group                                       F. Yergeau
-Request for Comments: 2044                           Alis Technologies
-Category: Informational                                   October 1996
-
-
-        UTF-8, a transformation format of Unicode and ISO 10646
-
-Status of this Memo
-
-   This memo provides information for the Internet community.  This memo
-   does not specify an Internet standard of any kind.  Distribution of
-   this memo is unlimited.
-
-Abstract
-
-   The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993 jointly
-   define a 16 bit character set which encompasses most of the world's
-   writing systems. 16-bit characters, however, are not compatible with
-   many current applications and protocols, and this has led to the
-   development of a few so-called UCS transformation formats (UTF), each
-   with different characteristics.  UTF-8, the object of this memo, has
-   the characteristic of preserving the full US-ASCII range: US-ASCII
-   characters are encoded in one octet having the usual US-ASCII value,
-   and any octet with such a value can only be an US-ASCII character.
-   This provides compatibility with file systems, parsers and other
-   software that rely on US-ASCII values but are transparent to other
-   values.
-
-1.  Introduction
-
-   The Unicode Standard, version 1.1 [UNICODE], and ISO/IEC 10646-1:1993
-   [ISO-10646] jointly define a 16 bit character set, UCS-2, which
-   encompasses most of the world's writing systems.  ISO 10646 further
-   defines a 31-bit character set, UCS-4, with currently no assignments
-   outside of the region corresponding to UCS-2 (the Basic Multilingual
-   Plane, BMP).  The UCS-2 and UCS-4 encodings, however, are hard to use
-   in many current applications and protocols that assume 8 or even 7
-   bit characters.  Even newer systems able to deal with 16 bit
-   characters cannot process UCS-4 data. This situation has led to the
-   development of so-called UCS transformation formats (UTF), each with
-   different characteristics.
-
-   UTF-1 has only historical interest, having been removed from ISO
-   10646.  UTF-7 has the quality of encoding the full Unicode repertoire
-   using only octets with the high-order bit clear (7 bit US-ASCII
-   values, [US-ASCII]), and is thus deemed a mail-safe encoding
-   ([RFC1642]).  UTF-8, the object of this memo, uses all bits of an
-   octet, but has the quality of preserving the full US-ASCII range:
-
-
-
-Yergeau                      Informational                      [Page 1]
-
-RFC 2044                         UTF-8                      October 1996
-
-
-   US-ASCII characters are encoded in one octet having the normal US-
-   ASCII value, and any octet with such a value can only stand for an
-   US-ASCII character, and nothing else.
-
-   UTF-16 is a scheme for transforming a subset of the UCS-4 repertoire
-   into a pair of UCS-2 values from a reserved range.  UTF-16 impacts
-   UTF-8 in that UCS-2 values from the reserved range must be treated
-   specially in the UTF-8 transformation.
-
-   UTF-8 encodes UCS-2 or UCS-4 characters as a varying number of
-   octets, where the number of octets, and the value of each, depend on
-   the integer value assigned to the character in ISO 10646.  This
-   transformation format has the following characteristics (all values
-   are in hexadecimal):
-
-   -  Character values from 0000 0000 to 0000 007F (US-ASCII repertoire)
-      correspond to octets 00 to 7F (7 bit US-ASCII values).
-
-   -  US-ASCII values do not appear otherwise in a UTF-8 encoded charac-
-      ter stream.  This provides compatibility with file systems or
-      other software (e.g. the printf() function in C libraries) that
-      parse based on US-ASCII values but are transparent to other val-
-      ues.
-
-   -  Round-trip conversion is easy between UTF-8 and either of UCS-4,
-      UCS-2 or Unicode.
-
-   -  The first octet of a multi-octet sequence indicates the number of
-      octets in the sequence.
-
-   -  Character boundaries are easily found from anywhere in an octet
-      stream.
-
-   -  The lexicographic sorting order of UCS-4 strings is preserved.  Of
-      course this is of limited interest since the sort order is not
-      culturally valid in either case.
-
-   -  The octet values FE and FF never appear.
-
-   UTF-8 was originally a project of the X/Open Joint
-   Internationalization Group XOJIG with the objective to specify a File
-   System Safe UCS Transformation Format [FSS-UTF] that is compatible
-   with UNIX systems, supporting multilingual text in a single encoding.
-   The original authors were Gary Miller, Greger Leijonhufvud and John
-   Entenmann.  Later, Ken Thompson and Rob Pike did significant work for
-   the formal UTF-8.
-
-
-
-
-
-Yergeau                      Informational                      [Page 2]
-
-RFC 2044                         UTF-8                      October 1996
-
-
-   A description can also be found in Unicode Technical Report #4 [UNI-
-   CODE].  The definitive reference, including provisions for UTF-16
-   data within UTF-8, is Annex R of ISO/IEC 10646-1 [ISO-10646].
-
-2.  UTF-8 definition
-
-   In UTF-8, characters are encoded using sequences of 1 to 6 octets.
-   The only octet of a "sequence" of one has the higher-order bit set to
-   0, the remaining 7 bits being used to encode the character value. In
-   a sequence of n octets, n>1, the initial octet has the n higher-order
-   bits set to 1, followed by a bit set to 0.  The remaining bit(s) of
-   that octet contain bits from the value of the character to be
-   encoded.  The following octet(s) all have the higher-order bit set to
-   1 and the following bit set to 0, leaving 6 bits in each to contain
-   bits from the character to be encoded.
-
-   The table below summarizes the format of these different octet types.
-   The letter x indicates bits available for encoding bits of the UCS-4
-   character value.
-
-   UCS-4 range (hex.)           UTF-8 octet sequence (binary)
-   0000 0000-0000 007F   0xxxxxxx
-   0000 0080-0000 07FF   110xxxxx 10xxxxxx
-   0000 0800-0000 FFFF   1110xxxx 10xxxxxx 10xxxxxx
-
-   0001 0000-001F FFFF   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
-   0020 0000-03FF FFFF   111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
-   0400 0000-7FFF FFFF   1111110x 10xxxxxx ... 10xxxxxx
-
-   Encoding from UCS-4 to UTF-8 proceeds as follows:
-
-   1) Determine the number of octets required from the character value
-      and the first column of the table above.
-
-   2) Prepare the high-order bits of the octets as per the second column
-      of the table.
-
-   3) Fill in the bits marked x from the bits of the character value,
-      starting from the lower-order bits of the character value and
-      putting them first in the last octet of the sequence, then the
-      next to last, etc. until all x bits are filled in.
-
-
-
-
-
-
-
-
-
-
-Yergeau                      Informational                      [Page 3]
-
-RFC 2044                         UTF-8                      October 1996
-
-
-      The algorithm for encoding UCS-2 (or Unicode) to UTF-8 can be
-      obtained from the above, in principle, by simply extending each
-      UCS-2 character with two zero-valued octets.  However, UCS-2 val-
-      ues between D800 and DFFF, being actually UCS-4 characters trans-
-      formed through UTF-16, need special treatment: the UTF-16 trans-
-      formation must be undone, yielding a UCS-4 character that is then
-      transformed as above.
-
-      Decoding from UTF-8 to UCS-4 proceeds as follows:
-
-   1) Initialize the 4 octets of the UCS-4 character with all bits set
-      to 0.
-
-   2) Determine which bits encode the character value from the number of
-      octets in the sequence and the second column of the table above
-      (the bits marked x).
-
-   3) Distribute the bits from the sequence to the UCS-4 character,
-      first the lower-order bits from the last octet of the sequence and
-      proceeding to the left until no x bits are left.
-
-      If the UTF-8 sequence is no more than three octets long, decoding
-      can proceed directly to UCS-2 (or equivalently Unicode).
-
-      A more detailed algorithm and formulae can be found in [FSS_UTF],
-      [UNICODE] or Annex R to [ISO-10646].
-
-3.  Examples
-
-   The Unicode sequence "A<NOT IDENTICAL TO><ALPHA>." (0041, 2262, 0391,
-   002E) may be encoded as follows:
-
-      41 E2 89 A2 CE 91 2E
-
-   The Unicode sequence "Hi Mom <WHITE SMILING FACE>!" (0048, 0069,
-   0020, 004D, 006F, 006D, 0020, 263A, 0021) may be encoded as follows:
-
-      48 69 20 4D 6F 6D 20 E2 98 BA 21
-
-   The Unicode sequence representing the Han characters for the Japanese
-   word "nihongo" (65E5, 672C, 8A9E) may be encoded as follows:
-
-      E6 97 A5 E6 9C AC E8 AA 9E
-
-
-
-
-
-
-
-
-Yergeau                      Informational                      [Page 4]
-
-RFC 2044                         UTF-8                      October 1996
-
-
-MIME registrations
-
-   This memo is meant to serve as the basis for registration of a MIME
-   character encoding (charset) as per [RFC1521].  The proposed charset
-   parameter value is "UTF-8".  This string would label media types
-   containing text consisting of characters from the repertoire of ISO
-   10646-1 encoded to a sequence of octets using the encoding scheme
-   outlined above.
-
-Security Considerations
-
-   Security issues are not discussed in this memo.
-
-Acknowledgments
-
-   The following have participated in the drafting and discussion of
-   this memo:
-
-      James E. Agenbroad   Andries Brouwer
-      Martin J. D|rst      David Goldsmith
-      Edwin F. Hart        Kent Karlsson
-      Markus Kuhn          Michael Kung
-      Alain LaBonte        Murray Sargent
-      Keld Simonsen        Arnold Winkler
-
-Bibliography
-
-   [FSS_UTF]      X/Open CAE Specification C501 ISBN 1-85912-082-2 28cm.
-                  22p. pbk. 172g.  4/95, X/Open Company Ltd., "File Sys-
-                  tem Safe UCS Transformation Format (FSS_UTF)", X/Open
-                  Preleminary Specification, Document Number P316.  Also
-                  published in Unicode Technical Report #4.
-
-   [ISO-10646]    ISO/IEC 10646-1:1993. International Standard -- Infor-
-                  mation technology -- Universal Multiple-Octet Coded
-                  Character Set (UCS) -- Part 1: Architecture and Basic
-                  Multilingual Plane.  UTF-8 is described in Annex R,
-                  adopted but not yet published.  UTF-16 is described in
-                  Annex Q, adopted but not yet published.
-
-   [RFC1521]      Borenstein, N., and N. Freed, "MIME (Multipurpose
-                  Internet Mail Extensions) Part One: Mechanisms for
-                  Specifying and Describing the Format of Internet Mes-
-                  sage Bodies", RFC 1521, Bellcore, Innosoft, September
-                  1993.
-
-   [RFC1641]      Goldsmith, D., and M. Davis, "Using Unicode with
-                  MIME", RFC 1641, Taligent inc., July 1994.
-
-
-
-Yergeau                      Informational                      [Page 5]
-
-RFC 2044                         UTF-8                      October 1996
-
-
-   [RFC1642]      Goldsmith, D., and M. Davis, "UTF-7: A Mail-safe
-                  Transformation Format of Unicode", RFC 1642,
-                  Taligent, Inc., July 1994.
-
-   [UNICODE]      The Unicode Consortium, "The Unicode Standard --
-                  Worldwide Character Encoding -- Version 1.0", Addison-
-                  Wesley, Volume 1, 1991, Volume 2, 1992.  UTF-8 is
-                  described in Unicode Technical Report #4.
-
-   [US-ASCII]     Coded Character Set--7-bit American Standard Code for
-                  Information Interchange, ANSI X3.4-1986.
-
-Author's Address
-
-      Francois Yergeau
-      Alis Technologies
-      100, boul. Alexis-Nihon
-      Suite 600
-      Montreal  QC  H4M 2P2
-      Canada
-
-      Tel: +1 (514) 747-2547
-      Fax: +1 (514) 747-2561
-      EMail: fyergeau@alis.com
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Yergeau                      Informational                      [Page 6]
-