mirror of
https://git.openldap.org/openldap/openldap.git
synced 2025-12-28 18:49:34 -05:00
Move a few obsolete RFCs to the Attic
This commit is contained in:
parent
bc51bd5180
commit
eeefab745c
3 changed files with 0 additions and 1129 deletions
|
|
@ -1,619 +0,0 @@
|
|||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Network Working Group T. Howes
|
||||
Request for Comments: 1488 University of Michigan
|
||||
S. Kille
|
||||
ISODE Consortium
|
||||
W. Yeong
|
||||
Performance Systems International
|
||||
C. Robbins
|
||||
NeXor Ltd.
|
||||
July 1993
|
||||
|
||||
|
||||
The X.500 String Representation of Standard Attribute Syntaxes
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This RFC specifies an IAB standards track protocol for the Internet
|
||||
community, and requests discussion and suggestions for improvements.
|
||||
Please refer to the current edition of the "IAB Official Protocol
|
||||
Standards" for the standardization state and status of this protocol.
|
||||
Distribution of this memo is unlimited.
|
||||
|
||||
Abstract
|
||||
|
||||
The Lightweight Directory Access Protocol (LDAP) [9] requires that
|
||||
the contents of AttributeValue fields in protocol elements be octet
|
||||
strings. This document defines the requirements that must be
|
||||
satisfied by encoding rules used to render Directory attribute
|
||||
syntaxes into a form suitable for use in the LDAP, then goes on to
|
||||
define the encoding rules for the standard set of attribute syntaxes
|
||||
defined in [1,2] and [3].
|
||||
|
||||
1. Attribute Syntax Encoding Requirements
|
||||
|
||||
This section defines general requirements for lightweight directory
|
||||
protocol attribute syntax encodings. All documents defining attribute
|
||||
syntax encodings for use by the lightweight directory protocols are
|
||||
expected to conform to these requirements.
|
||||
|
||||
The encoding rules defined for a given attribute syntax must produce
|
||||
octet strings. To the greatest extent possible, encoded octet
|
||||
strings should be usable in their native encoded form for display
|
||||
purposes. In particular, encoding rules for attribute syntaxes
|
||||
defining non-binary values should produce strings that can be
|
||||
displayed with little or no translation by clients implementing the
|
||||
lightweight directory protocols.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 1]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
2. Standard Attribute Syntax Encodings
|
||||
|
||||
For the purposes of defining the encoding rules for the standard
|
||||
attribute syntaxes, the following auxiliary BNF definitions will be
|
||||
used:
|
||||
|
||||
<a> ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' |
|
||||
'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' |
|
||||
's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | 'A' |
|
||||
'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' |
|
||||
'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' |
|
||||
'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z'
|
||||
|
||||
<d> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
|
||||
|
||||
<hex-digit> ::= <d> | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' |
|
||||
'A' | 'B' | 'C' | 'D' | 'E' | 'F'
|
||||
|
||||
<k> ::= <a> | <d> | '-'
|
||||
|
||||
<p> ::= <a> | <d> | ''' | '(' | ')' | '+' | ',' | '-' | '.' |
|
||||
'/' | ':' | '?' | ' '
|
||||
|
||||
<CRLF> ::= The ASCII newline character with hexadecimal value 0x0A
|
||||
|
||||
<letterstring> ::= <a> | <a> <letterstring>
|
||||
|
||||
<numericstring> ::= <d> | <d> <numericstring>
|
||||
|
||||
<keystring> ::= <a> | <a> <anhstring>
|
||||
|
||||
<anhstring> ::= <k> | <k> <anhstring>
|
||||
|
||||
<printablestring> ::= <p> | <p> <printablestring>
|
||||
|
||||
<space> ::= ' ' | ' ' <space>
|
||||
|
||||
2.1. Undefined
|
||||
|
||||
Values of type Undefined are encoded as if they were values of type
|
||||
Octet String.
|
||||
|
||||
2.2. Case Ignore String
|
||||
|
||||
A string of type caseIgnoreStringSyntax is encoded as the string
|
||||
value itself.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 2]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
2.3. Case Exact String
|
||||
|
||||
The encoding of a string of type caseExactStringSyntax is the string
|
||||
value itself.
|
||||
|
||||
2.4. Printable String
|
||||
|
||||
The encoding of a string of type printableStringSyntax is the string
|
||||
value itself.
|
||||
|
||||
2.5. Numeric String
|
||||
|
||||
The encoding of a string of type numericStringSyntax is the string
|
||||
value itself.
|
||||
|
||||
2.6. Octet String
|
||||
|
||||
The encoding of a string of type octetStringSyntax is the string
|
||||
value itself.
|
||||
|
||||
2.7. Case Ignore IA5 String
|
||||
|
||||
The encoding of a string of type caseIgnoreIA5String is the string
|
||||
value itself.
|
||||
|
||||
2.8. IA5 String
|
||||
|
||||
The encoding of a string of type iA5StringSyntax is the string value
|
||||
itself.
|
||||
|
||||
2.9. T61 String
|
||||
|
||||
The encoding of a string of type t61StringSyntax is the string value
|
||||
itself.
|
||||
|
||||
2.10. Case Ignore List
|
||||
|
||||
Values of type caseIgnoreListSyntax are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<caseignorelist> ::= <caseignorestring> |
|
||||
<caseignorestring> '$' <caseignorelist>
|
||||
|
||||
<caseignorestring> ::= a string encoded according to the rules
|
||||
for Case Ignore String as above.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 3]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
2.11. Case Exact List
|
||||
|
||||
Values of type caseExactListSyntax are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<caseexactlist> ::= <caseexactstring> |
|
||||
<caseexactstring> '$' <caseexactlist>
|
||||
|
||||
<caseexactstring> ::= a string encoded according to the rules for
|
||||
Case Exact String as above.
|
||||
|
||||
2.12. Distinguished Name
|
||||
|
||||
Values of type distinguishedNameSyntax are encoded to have the
|
||||
representation defined in [5].
|
||||
|
||||
2.13. Boolean
|
||||
|
||||
Values of type booleanSyntax are encoded according to the following
|
||||
BNF:
|
||||
|
||||
<boolean> ::= "TRUE" | "FALSE"
|
||||
|
||||
Boolean values have an encoding of "TRUE" if they are logically true,
|
||||
and have an encoding of "FALSE" otherwise.
|
||||
|
||||
2.14. Integer
|
||||
|
||||
Values of type integerSyntax are encoded as the decimal
|
||||
representation of their values, with each decimal digit represented
|
||||
by the its character equivalent. So the digit 1 is represented by the
|
||||
character
|
||||
|
||||
2.15. Object Identifier
|
||||
|
||||
Values of type objectIdentifierSyntax are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<oid> ::= <descr> | <descr> '.' <numericoid> | <numericoid>
|
||||
|
||||
<descr> ::= <keystring>
|
||||
|
||||
<numericoid> ::= <numericstring> | <numericstring> '.' <numericoid>
|
||||
|
||||
In the above BNF, <descr> is the syntactic representation of an
|
||||
object descriptor. When encoding values of type
|
||||
objectIdentifierSyntax, the first encoding option should be used in
|
||||
preference to the second, which should be used in preference to the
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 4]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
third wherever possible. That is, in encoding object identifiers,
|
||||
object descriptors (where assigned and known by the implementation)
|
||||
should be used in preference to numeric oids to the greatest extent
|
||||
possible. For example, in encoding the object identifier representing
|
||||
an organizationName, the descriptor "organizationName" is preferable
|
||||
to "ds.4.10", which is in turn preferable to the string "2.5.4.10".
|
||||
|
||||
2.16. Telephone Number
|
||||
|
||||
Values of type telephoneNumberSyntax are encoded as if they were
|
||||
Printable String types.
|
||||
|
||||
2.17. Telex Number
|
||||
|
||||
Values of type telexNumberSyntax are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<telex-number> ::= <actual-number> '$' <country> '$' <answerback>
|
||||
|
||||
<actual-number> ::= <printablestring>
|
||||
|
||||
<country> ::= <printablestring>
|
||||
|
||||
<answerback> ::= <printablestring>
|
||||
|
||||
In the above, <actual-number> is the syntactic representation of the
|
||||
number portion of the TELEX number being encoded, <country> is the
|
||||
TELEX country code, and <answerback> is the answerback code of a
|
||||
TELEX terminal.
|
||||
|
||||
2.18. Teletex Terminal Identifier
|
||||
|
||||
Values of type teletexTerminalIdentifier are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<teletex-id> ::= <printablestring> 0*( '$' <printablestring>)
|
||||
|
||||
In the above, the first <printablestring> is the encoding of the
|
||||
first portion of the teletex terminal identifier to be encoded, and
|
||||
the subsequent 0 or more <printablestrings> are subsequent portions
|
||||
of the teletex terminal identifier.
|
||||
|
||||
2.19. Facsimile Telephone Number
|
||||
|
||||
Values of type FacsimileTelephoneNumber are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<fax-number> ::= <printablestring> [ '$' <faxparameters> ]
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 5]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
<faxparameters> ::= <faxparm> | <faxparm> '$' <faxparameters>
|
||||
|
||||
<faxparm> ::= 'twoDimensional' | 'fineResolution' | 'unlimitedLength' |
|
||||
'b4Length' | 'a3Width' | 'b4Width' | 'uncompressed'
|
||||
|
||||
In the above, the first <printablestring> is the actual fax number,
|
||||
and the <faxparm> tokens represent fax parameters.
|
||||
|
||||
2.20. Presentation Address
|
||||
|
||||
Values of type PresentationAddress are encoded to have the
|
||||
representation described in [6].
|
||||
|
||||
2.21. UTC Time
|
||||
|
||||
Values of type uTCTimeSyntax are encoded as if they were Printable
|
||||
Strings with the strings containing a UTCTime value.
|
||||
|
||||
2.22. Guide (search guide)
|
||||
|
||||
Values of type Guide, such as values of the searchGuide attribute,
|
||||
are encoded according to the following BNF:
|
||||
|
||||
<guide-value> ::= [ <object-class> '#' ] <criteria>
|
||||
|
||||
<object-class> ::= an encoded value of type objectIdentifierSyntax
|
||||
|
||||
<criteria> ::= <criteria-item> | <criteria-set> | '!' <criteria>
|
||||
|
||||
<criteria-set> ::= [ '(' ] <criteria> '&' <criteria-set> [ ')' ] |
|
||||
[ '(' ] <criteria> '|' <criteria-set> [ ')' ]
|
||||
|
||||
<criteria-item> ::= [ '(' ] <attributetype> '$' <match-type> [ ')' ]
|
||||
|
||||
<match-type> ::= "EQ" | "SUBSTR" | "GE" | "LE" | "APPROX"
|
||||
|
||||
2.23. Postal Address
|
||||
|
||||
Values of type PostalAddress are encoded according to the following BNF:
|
||||
|
||||
<postal-address> ::= <t61string> | <t61string> '$' <postal-address>
|
||||
|
||||
In the above, each <t61string> component of a postal address value is
|
||||
encoded as a value of type t61StringSyntax.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 6]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
2.24. User Password
|
||||
|
||||
Values of type userPasswordSyntax are encoded as if they were of type
|
||||
octetStringSyntax.
|
||||
|
||||
2.25. User Certificate
|
||||
|
||||
Values of type userCertificate are encoded according to the following
|
||||
BNF:
|
||||
|
||||
<certificate> ::= <signature> '#' <issuer> '#' <validity> '#' <subject>
|
||||
'#' <public-key-info>
|
||||
|
||||
<signature> ::= <algorithm-id>
|
||||
|
||||
<issuer> ::= an encoded Distinguished Name
|
||||
|
||||
<validity> ::= <not-before-time> '#' <not-after-time>
|
||||
|
||||
<not-before-time> ::= <utc-time>
|
||||
|
||||
<not-after-time> ::= <utc-time>
|
||||
|
||||
<algorithm-parameters> ::= <null> | <integervalue> |
|
||||
'{ASN}' <hex-string>
|
||||
|
||||
<subject> ::= an encoded Distinguished Name
|
||||
|
||||
<public-key-info> ::= <algorithm-id> '#' <encrypted-value>
|
||||
|
||||
<encrypted-value> ::= <hex-string> | <hex-string> '-' <d>
|
||||
|
||||
<algorithm-id> ::= <oid> '#' <algorithm-parameters>
|
||||
|
||||
<utc-time> ::= an encoded UTCTime value
|
||||
|
||||
<hex-string> ::= <hex-digit> | <hex-digit> <hex-string>
|
||||
|
||||
2.26. CA Certificate
|
||||
|
||||
Values of type cACertificate are encoded as if the values were of
|
||||
type userCertificate.
|
||||
|
||||
2.27. Authority Revocation List
|
||||
|
||||
Values of type authorityRevocationList are encoded according to the
|
||||
following BNF:
|
||||
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 7]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
<certificate-list> ::= <signature> '#' <issuer> '#'
|
||||
<utc-time> [ '#' <revoked-certificates> ]
|
||||
|
||||
<revoked-certificates> ::= <algorithm> '#' <encrypted-value>
|
||||
[ '#' 0*(<revoked-certificate>) '#']
|
||||
|
||||
<revoked-certificates> ::= <subject> '#' <algorithm> '#'
|
||||
<serial> '#' <utc-time>
|
||||
|
||||
The syntactic components <algorithm>, <issuer>, <encrypted-value>,
|
||||
<utc-time>, <subject> and <serial> have the same definitions as in
|
||||
the BNF for the userCertificate attribute syntax.
|
||||
|
||||
2.28. Certificate Revocation List
|
||||
|
||||
Values of type certificateRevocationList are encoded as if the values
|
||||
were of type authorityRevocationList.
|
||||
|
||||
2.29. Cross Certificate Pair
|
||||
|
||||
Values of type crossCertificatePair are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<certificate-pair> ::= <certificate> '|' <certificate>
|
||||
|
||||
The syntactic component <certificate> has the same definition as in
|
||||
the BNF for the userCertificate attribute syntax.
|
||||
|
||||
2.30. Delivery Method
|
||||
|
||||
Values of type deliveryMethod are encoded according to the following
|
||||
BNF:
|
||||
|
||||
<delivery-value> ::= <pdm> | <pdm> '$' <delivery-value>
|
||||
|
||||
<pdm> ::= 'any' | 'mhs' | 'physical' | 'telex' | 'teletex' |
|
||||
'g3fax' | 'g4fax' | 'ia5' | 'videotex' | 'telephone'
|
||||
|
||||
2.31. Other Mailbox
|
||||
|
||||
Values of the type otherMailboxSyntax are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<otherMailbox> ::= <mailbox-type> '$' <mailbox>
|
||||
|
||||
<mailbox-type> ::= an encoded Printable String
|
||||
|
||||
<mailbox> ::= an encoded IA5 String
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 8]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
In the above, <mailbox-type> represents the type of mail system in
|
||||
which the mailbox resides, for example "Internet" or "MCIMail"; and
|
||||
<mailbox> is the actual mailbox in the mail system defined by
|
||||
<mailbox-type>.
|
||||
|
||||
2.32. Mail Preference
|
||||
|
||||
Values of type mailPreferenceOption are encoded according to the
|
||||
following BNF:
|
||||
|
||||
<mail-preference> ::= "NO-LISTS" | "ANY-LIST" | "PROFESSIONAL-LISTS"
|
||||
|
||||
2.33. MHS OR Address
|
||||
|
||||
Values of type MHS OR Address are encoded as strings, according to
|
||||
the format defined in [10].
|
||||
|
||||
2.34. Photo
|
||||
|
||||
Values of type Photo are encoded as if they were octet strings
|
||||
containing JPEG images in the JPEG File Interchange Format (JFIF), as
|
||||
described in [8].
|
||||
|
||||
2.35. Fax
|
||||
|
||||
Values of type Fax are encoded as if they were octet strings
|
||||
containing Group 3 Fax images as defined in [7].
|
||||
|
||||
3. Acknowledgements
|
||||
|
||||
Many of the attribute syntax encodings defined in this document are
|
||||
adapted from those used in the QUIPU X.500 implementation. The
|
||||
contribu- tions of the authors of the QUIPU implementation in the
|
||||
specification of the QUIPU syntaxes [4] are gratefully acknowledged.
|
||||
|
||||
4. Bibliography
|
||||
|
||||
[1] The Directory: Selected Attribute Syntaxes. CCITT,
|
||||
Recommendation X.520.
|
||||
|
||||
[2] Information Processing Systems -- Open Systems Interconnection --
|
||||
The Directory: Selected Attribute Syntaxes.
|
||||
|
||||
[3] Barker, P., and S. Kille, "The COSINE and Internet X.500 Schema",
|
||||
RFC 1274, University College London, November 1991.
|
||||
|
||||
[4] The ISO Development Environment: User's Manual -- Volume 5:
|
||||
QUIPU. Colin Robbins, Stephen E. Kille.
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 9]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
[5] Kille, S., "A String Representation of Distinguished Names", RFC
|
||||
1485, July 1993.
|
||||
|
||||
[6] Kille, S., "A String Representation for Presentation Addresses",
|
||||
RFC 1278, University College London, November 1991.
|
||||
|
||||
[7] Terminal Equipment and Protocols for Telematic Services -
|
||||
Standardization of Group 3 facsimile apparatus for document
|
||||
transmission. CCITT, Recommendation T.4.
|
||||
|
||||
[8] JPEG File Interchange Format (Version 1.02). Eric Hamilton, C-
|
||||
Cube Microsystems, Milpitas, CA, September 1, 1992.
|
||||
|
||||
[9] Yeong, W., Howes, T., and S. Kille, "Lightweight Directory Access
|
||||
Protocol", RFC 1487, Performance Systems International,
|
||||
University of Michigan, ISODE Consortium, July 1993.
|
||||
|
||||
[10] Kille, S., "Mapping between X.400(1988)/ISO 10021 and RFC 822",
|
||||
RFC 1327, University College London, May 1992.
|
||||
|
||||
5. Security Considerations
|
||||
|
||||
Security issues are not discussed in this memo.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 10]
|
||||
|
||||
RFC 1488 X.500 Syntax Encoding July 1993
|
||||
|
||||
|
||||
6. Authors' Addresses
|
||||
|
||||
Tim Howes
|
||||
University of Michigan
|
||||
ITD Research Systems
|
||||
535 W William St.
|
||||
Ann Arbor, MI 48103-4943
|
||||
USA
|
||||
|
||||
Phone: +1 313 747-4454
|
||||
EMail: tim@umich.edu
|
||||
|
||||
|
||||
Steve Kille
|
||||
ISODE Consortium
|
||||
PO Box 505
|
||||
London
|
||||
SW11 1DX
|
||||
UK
|
||||
|
||||
Phone: +44-71-223-4062
|
||||
EMail: S.Kille@isode.com
|
||||
|
||||
|
||||
Wengyik Yeong
|
||||
PSI, Inc.
|
||||
510 Huntmar Park Drive
|
||||
Herndon, VA 22070
|
||||
USA
|
||||
|
||||
Phone: +1 703-450-8001
|
||||
EMail: yeongw@psilink.com
|
||||
|
||||
|
||||
Colin Robbins
|
||||
NeXor Ltd
|
||||
University Park
|
||||
Nottingham
|
||||
NG7 2RD
|
||||
UK
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Howes, Kille, Yeong & Robbins [Page 11]
|
||||
|
||||
|
|
@ -1,171 +0,0 @@
|
|||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Network Working Group T. Howes
|
||||
Request for Comments: 1558 University of Michigan
|
||||
Category: Informational December 1993
|
||||
|
||||
|
||||
A String Representation of LDAP Search Filters
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This memo provides information for the Internet community. This memo
|
||||
does not specify an Internet standard of any kind. Distribution of
|
||||
this memo is unlimited.
|
||||
|
||||
Abstract
|
||||
|
||||
The Lightweight Directory Access Protocol (LDAP) [1] defines a
|
||||
network representation of a search filter transmitted to an LDAP
|
||||
server. Some applications may find it useful to have a common way of
|
||||
representing these search filters in a human-readable form. This
|
||||
document defines a human-readable string format for representing LDAP
|
||||
search filters.
|
||||
|
||||
1. LDAP Search Filter Definition
|
||||
|
||||
An LDAP search filter is defined in [1] as follows:
|
||||
|
||||
Filter ::= CHOICE {
|
||||
and [0] SET OF Filter,
|
||||
or [1] SET OF Filter,
|
||||
not [2] Filter,
|
||||
equalityMatch [3] AttributeValueAssertion,
|
||||
substrings [4] SubstringFilter,
|
||||
greaterOrEqual [5] AttributeValueAssertion,
|
||||
lessOrEqual [6] AttributeValueAssertion,
|
||||
present [7] AttributeType,
|
||||
approxMatch [8] AttributeValueAssertion
|
||||
}
|
||||
|
||||
SubstringFilter ::= SEQUENCE {
|
||||
type AttributeType,
|
||||
SEQUENCE OF CHOICE {
|
||||
initial [0] LDAPString,
|
||||
any [1] LDAPString,
|
||||
final [2] LDAPString
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Howes [Page 1]
|
||||
|
||||
RFC 1558 Representation of LDAP Filters December 1993
|
||||
|
||||
|
||||
AttributeValueAssertion ::= SEQUENCE
|
||||
attributeType AttributeType,
|
||||
attributeValue AttributeValue
|
||||
}
|
||||
|
||||
AttributeType ::= LDAPString
|
||||
|
||||
AttributeValue ::= OCTET STRING
|
||||
|
||||
LDAPString ::= OCTET STRING
|
||||
|
||||
where the LDAPString above is limited to the IA5 character set. The
|
||||
AttributeType is a string representation of the attribute object
|
||||
identifier in dotted OID format (e.g., "2.5.4.10"), or the shorter
|
||||
string name of the attribute (e.g., "organizationName", or "o"). The
|
||||
AttributeValue OCTET STRING has the form defined in [2]. The Filter
|
||||
is encoded for transmission over a network using the Basic Encoding
|
||||
Rules defined in [3], with simplifications described in [1].
|
||||
|
||||
2. String Search Filter Definition
|
||||
|
||||
The string representation of an LDAP search filter is defined by the
|
||||
following BNF. It uses a prefix format.
|
||||
|
||||
<filter> ::= '(' <filtercomp> ')'
|
||||
<filtercomp> ::= <and> | <or> | <not> | <item>
|
||||
<and> ::= '&' <filterlist>
|
||||
<or> ::= '|' <filterlist>
|
||||
<not> ::= '!' <filter>
|
||||
<filterlist> ::= <filter> | <filter> <filterlist>
|
||||
<item> ::= <simple> | <present> | <substring>
|
||||
<simple> ::= <attr> <filtertype> <value>
|
||||
<filtertype> ::= <equal> | <approx> | <greater> | <less>
|
||||
<equal> ::= '='
|
||||
<approx> ::= '~='
|
||||
<greater> ::= '>='
|
||||
<less> ::= '<='
|
||||
<present> ::= <attr> '=*'
|
||||
<substring> ::= <attr> '=' <initial> <any> <final>
|
||||
<initial> ::= NULL | <value>
|
||||
<any> ::= '*' <starval>
|
||||
<starval> ::= NULL | <value> '*' <starval>
|
||||
<final> ::= NULL | <value>
|
||||
|
||||
<attr> is a string representing an AttributeType, and has the format
|
||||
defined in [1]. <value> is a string representing an AttributeValue,
|
||||
or part of one, and has the form defined in [2]. If a <value> must
|
||||
contain one of the characters '*' or '(' or ')', these characters
|
||||
|
||||
|
||||
|
||||
Howes [Page 2]
|
||||
|
||||
RFC 1558 Representation of LDAP Filters December 1993
|
||||
|
||||
|
||||
should be escaped by preceding them with the backslash '\' character.
|
||||
|
||||
3. Examples
|
||||
|
||||
This section gives a few examples of search filters written using
|
||||
this notation.
|
||||
|
||||
(cn=Babs Jensen)
|
||||
(!(cn=Tim Howes))
|
||||
(&(objectClass=Person)(|(sn=Jensen)(cn=Babs J*)))
|
||||
(o=univ*of*mich*)
|
||||
|
||||
4. Security Considerations
|
||||
|
||||
Security issues are not discussed in this memo.
|
||||
|
||||
5. References
|
||||
|
||||
[1] Yeong, W., Howes, T., and S. Kille, "Lightweight Directory Access
|
||||
Protocol", RFC 1487, Performance Systems International,
|
||||
University of Michigan, ISODE Consortium, July 1993.
|
||||
|
||||
[2] Howes, T., Kille, S., Yeong, W., and C. Robbins, "The String
|
||||
Representation of Standard Attribute Syntaxes", RFC 1488,
|
||||
University of Michigan, ISODE Consortium, Performance Systems
|
||||
International, NeXor Ltd., July 1993.
|
||||
|
||||
[3] "Specification of Basic Encoding Rules for Abstract Syntax
|
||||
Notation One (ASN.1)", CCITT Recommendation X.209, 1988.
|
||||
|
||||
6. Author's Address
|
||||
|
||||
Tim Howes
|
||||
University of Michigan
|
||||
ITD Research Systems
|
||||
535 W William St.
|
||||
Ann Arbor, MI 48103-4943
|
||||
USA
|
||||
|
||||
Phone: +1 313 747-4454
|
||||
EMail: tim@umich.edu
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Howes [Page 3]
|
||||
|
||||
|
|
@ -1,339 +0,0 @@
|
|||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Network Working Group F. Yergeau
|
||||
Request for Comments: 2044 Alis Technologies
|
||||
Category: Informational October 1996
|
||||
|
||||
|
||||
UTF-8, a transformation format of Unicode and ISO 10646
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This memo provides information for the Internet community. This memo
|
||||
does not specify an Internet standard of any kind. Distribution of
|
||||
this memo is unlimited.
|
||||
|
||||
Abstract
|
||||
|
||||
The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993 jointly
|
||||
define a 16 bit character set which encompasses most of the world's
|
||||
writing systems. 16-bit characters, however, are not compatible with
|
||||
many current applications and protocols, and this has led to the
|
||||
development of a few so-called UCS transformation formats (UTF), each
|
||||
with different characteristics. UTF-8, the object of this memo, has
|
||||
the characteristic of preserving the full US-ASCII range: US-ASCII
|
||||
characters are encoded in one octet having the usual US-ASCII value,
|
||||
and any octet with such a value can only be an US-ASCII character.
|
||||
This provides compatibility with file systems, parsers and other
|
||||
software that rely on US-ASCII values but are transparent to other
|
||||
values.
|
||||
|
||||
1. Introduction
|
||||
|
||||
The Unicode Standard, version 1.1 [UNICODE], and ISO/IEC 10646-1:1993
|
||||
[ISO-10646] jointly define a 16 bit character set, UCS-2, which
|
||||
encompasses most of the world's writing systems. ISO 10646 further
|
||||
defines a 31-bit character set, UCS-4, with currently no assignments
|
||||
outside of the region corresponding to UCS-2 (the Basic Multilingual
|
||||
Plane, BMP). The UCS-2 and UCS-4 encodings, however, are hard to use
|
||||
in many current applications and protocols that assume 8 or even 7
|
||||
bit characters. Even newer systems able to deal with 16 bit
|
||||
characters cannot process UCS-4 data. This situation has led to the
|
||||
development of so-called UCS transformation formats (UTF), each with
|
||||
different characteristics.
|
||||
|
||||
UTF-1 has only historical interest, having been removed from ISO
|
||||
10646. UTF-7 has the quality of encoding the full Unicode repertoire
|
||||
using only octets with the high-order bit clear (7 bit US-ASCII
|
||||
values, [US-ASCII]), and is thus deemed a mail-safe encoding
|
||||
([RFC1642]). UTF-8, the object of this memo, uses all bits of an
|
||||
octet, but has the quality of preserving the full US-ASCII range:
|
||||
|
||||
|
||||
|
||||
Yergeau Informational [Page 1]
|
||||
|
||||
RFC 2044 UTF-8 October 1996
|
||||
|
||||
|
||||
US-ASCII characters are encoded in one octet having the normal US-
|
||||
ASCII value, and any octet with such a value can only stand for an
|
||||
US-ASCII character, and nothing else.
|
||||
|
||||
UTF-16 is a scheme for transforming a subset of the UCS-4 repertoire
|
||||
into a pair of UCS-2 values from a reserved range. UTF-16 impacts
|
||||
UTF-8 in that UCS-2 values from the reserved range must be treated
|
||||
specially in the UTF-8 transformation.
|
||||
|
||||
UTF-8 encodes UCS-2 or UCS-4 characters as a varying number of
|
||||
octets, where the number of octets, and the value of each, depend on
|
||||
the integer value assigned to the character in ISO 10646. This
|
||||
transformation format has the following characteristics (all values
|
||||
are in hexadecimal):
|
||||
|
||||
- Character values from 0000 0000 to 0000 007F (US-ASCII repertoire)
|
||||
correspond to octets 00 to 7F (7 bit US-ASCII values).
|
||||
|
||||
- US-ASCII values do not appear otherwise in a UTF-8 encoded charac-
|
||||
ter stream. This provides compatibility with file systems or
|
||||
other software (e.g. the printf() function in C libraries) that
|
||||
parse based on US-ASCII values but are transparent to other val-
|
||||
ues.
|
||||
|
||||
- Round-trip conversion is easy between UTF-8 and either of UCS-4,
|
||||
UCS-2 or Unicode.
|
||||
|
||||
- The first octet of a multi-octet sequence indicates the number of
|
||||
octets in the sequence.
|
||||
|
||||
- Character boundaries are easily found from anywhere in an octet
|
||||
stream.
|
||||
|
||||
- The lexicographic sorting order of UCS-4 strings is preserved. Of
|
||||
course this is of limited interest since the sort order is not
|
||||
culturally valid in either case.
|
||||
|
||||
- The octet values FE and FF never appear.
|
||||
|
||||
UTF-8 was originally a project of the X/Open Joint
|
||||
Internationalization Group XOJIG with the objective to specify a File
|
||||
System Safe UCS Transformation Format [FSS-UTF] that is compatible
|
||||
with UNIX systems, supporting multilingual text in a single encoding.
|
||||
The original authors were Gary Miller, Greger Leijonhufvud and John
|
||||
Entenmann. Later, Ken Thompson and Rob Pike did significant work for
|
||||
the formal UTF-8.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Yergeau Informational [Page 2]
|
||||
|
||||
RFC 2044 UTF-8 October 1996
|
||||
|
||||
|
||||
A description can also be found in Unicode Technical Report #4 [UNI-
|
||||
CODE]. The definitive reference, including provisions for UTF-16
|
||||
data within UTF-8, is Annex R of ISO/IEC 10646-1 [ISO-10646].
|
||||
|
||||
2. UTF-8 definition
|
||||
|
||||
In UTF-8, characters are encoded using sequences of 1 to 6 octets.
|
||||
The only octet of a "sequence" of one has the higher-order bit set to
|
||||
0, the remaining 7 bits being used to encode the character value. In
|
||||
a sequence of n octets, n>1, the initial octet has the n higher-order
|
||||
bits set to 1, followed by a bit set to 0. The remaining bit(s) of
|
||||
that octet contain bits from the value of the character to be
|
||||
encoded. The following octet(s) all have the higher-order bit set to
|
||||
1 and the following bit set to 0, leaving 6 bits in each to contain
|
||||
bits from the character to be encoded.
|
||||
|
||||
The table below summarizes the format of these different octet types.
|
||||
The letter x indicates bits available for encoding bits of the UCS-4
|
||||
character value.
|
||||
|
||||
UCS-4 range (hex.) UTF-8 octet sequence (binary)
|
||||
0000 0000-0000 007F 0xxxxxxx
|
||||
0000 0080-0000 07FF 110xxxxx 10xxxxxx
|
||||
0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
|
||||
|
||||
0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
|
||||
0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
|
||||
0400 0000-7FFF FFFF 1111110x 10xxxxxx ... 10xxxxxx
|
||||
|
||||
Encoding from UCS-4 to UTF-8 proceeds as follows:
|
||||
|
||||
1) Determine the number of octets required from the character value
|
||||
and the first column of the table above.
|
||||
|
||||
2) Prepare the high-order bits of the octets as per the second column
|
||||
of the table.
|
||||
|
||||
3) Fill in the bits marked x from the bits of the character value,
|
||||
starting from the lower-order bits of the character value and
|
||||
putting them first in the last octet of the sequence, then the
|
||||
next to last, etc. until all x bits are filled in.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Yergeau Informational [Page 3]
|
||||
|
||||
RFC 2044 UTF-8 October 1996
|
||||
|
||||
|
||||
The algorithm for encoding UCS-2 (or Unicode) to UTF-8 can be
|
||||
obtained from the above, in principle, by simply extending each
|
||||
UCS-2 character with two zero-valued octets. However, UCS-2 val-
|
||||
ues between D800 and DFFF, being actually UCS-4 characters trans-
|
||||
formed through UTF-16, need special treatment: the UTF-16 trans-
|
||||
formation must be undone, yielding a UCS-4 character that is then
|
||||
transformed as above.
|
||||
|
||||
Decoding from UTF-8 to UCS-4 proceeds as follows:
|
||||
|
||||
1) Initialize the 4 octets of the UCS-4 character with all bits set
|
||||
to 0.
|
||||
|
||||
2) Determine which bits encode the character value from the number of
|
||||
octets in the sequence and the second column of the table above
|
||||
(the bits marked x).
|
||||
|
||||
3) Distribute the bits from the sequence to the UCS-4 character,
|
||||
first the lower-order bits from the last octet of the sequence and
|
||||
proceeding to the left until no x bits are left.
|
||||
|
||||
If the UTF-8 sequence is no more than three octets long, decoding
|
||||
can proceed directly to UCS-2 (or equivalently Unicode).
|
||||
|
||||
A more detailed algorithm and formulae can be found in [FSS_UTF],
|
||||
[UNICODE] or Annex R to [ISO-10646].
|
||||
|
||||
3. Examples
|
||||
|
||||
The Unicode sequence "A<NOT IDENTICAL TO><ALPHA>." (0041, 2262, 0391,
|
||||
002E) may be encoded as follows:
|
||||
|
||||
41 E2 89 A2 CE 91 2E
|
||||
|
||||
The Unicode sequence "Hi Mom <WHITE SMILING FACE>!" (0048, 0069,
|
||||
0020, 004D, 006F, 006D, 0020, 263A, 0021) may be encoded as follows:
|
||||
|
||||
48 69 20 4D 6F 6D 20 E2 98 BA 21
|
||||
|
||||
The Unicode sequence representing the Han characters for the Japanese
|
||||
word "nihongo" (65E5, 672C, 8A9E) may be encoded as follows:
|
||||
|
||||
E6 97 A5 E6 9C AC E8 AA 9E
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Yergeau Informational [Page 4]
|
||||
|
||||
RFC 2044 UTF-8 October 1996
|
||||
|
||||
|
||||
MIME registrations
|
||||
|
||||
This memo is meant to serve as the basis for registration of a MIME
|
||||
character encoding (charset) as per [RFC1521]. The proposed charset
|
||||
parameter value is "UTF-8". This string would label media types
|
||||
containing text consisting of characters from the repertoire of ISO
|
||||
10646-1 encoded to a sequence of octets using the encoding scheme
|
||||
outlined above.
|
||||
|
||||
Security Considerations
|
||||
|
||||
Security issues are not discussed in this memo.
|
||||
|
||||
Acknowledgments
|
||||
|
||||
The following have participated in the drafting and discussion of
|
||||
this memo:
|
||||
|
||||
James E. Agenbroad Andries Brouwer
|
||||
Martin J. D|rst David Goldsmith
|
||||
Edwin F. Hart Kent Karlsson
|
||||
Markus Kuhn Michael Kung
|
||||
Alain LaBonte Murray Sargent
|
||||
Keld Simonsen Arnold Winkler
|
||||
|
||||
Bibliography
|
||||
|
||||
[FSS_UTF] X/Open CAE Specification C501 ISBN 1-85912-082-2 28cm.
|
||||
22p. pbk. 172g. 4/95, X/Open Company Ltd., "File Sys-
|
||||
tem Safe UCS Transformation Format (FSS_UTF)", X/Open
|
||||
Preleminary Specification, Document Number P316. Also
|
||||
published in Unicode Technical Report #4.
|
||||
|
||||
[ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor-
|
||||
mation technology -- Universal Multiple-Octet Coded
|
||||
Character Set (UCS) -- Part 1: Architecture and Basic
|
||||
Multilingual Plane. UTF-8 is described in Annex R,
|
||||
adopted but not yet published. UTF-16 is described in
|
||||
Annex Q, adopted but not yet published.
|
||||
|
||||
[RFC1521] Borenstein, N., and N. Freed, "MIME (Multipurpose
|
||||
Internet Mail Extensions) Part One: Mechanisms for
|
||||
Specifying and Describing the Format of Internet Mes-
|
||||
sage Bodies", RFC 1521, Bellcore, Innosoft, September
|
||||
1993.
|
||||
|
||||
[RFC1641] Goldsmith, D., and M. Davis, "Using Unicode with
|
||||
MIME", RFC 1641, Taligent inc., July 1994.
|
||||
|
||||
|
||||
|
||||
Yergeau Informational [Page 5]
|
||||
|
||||
RFC 2044 UTF-8 October 1996
|
||||
|
||||
|
||||
[RFC1642] Goldsmith, D., and M. Davis, "UTF-7: A Mail-safe
|
||||
Transformation Format of Unicode", RFC 1642,
|
||||
Taligent, Inc., July 1994.
|
||||
|
||||
[UNICODE] The Unicode Consortium, "The Unicode Standard --
|
||||
Worldwide Character Encoding -- Version 1.0", Addison-
|
||||
Wesley, Volume 1, 1991, Volume 2, 1992. UTF-8 is
|
||||
described in Unicode Technical Report #4.
|
||||
|
||||
[US-ASCII] Coded Character Set--7-bit American Standard Code for
|
||||
Information Interchange, ANSI X3.4-1986.
|
||||
|
||||
Author's Address
|
||||
|
||||
Francois Yergeau
|
||||
Alis Technologies
|
||||
100, boul. Alexis-Nihon
|
||||
Suite 600
|
||||
Montreal QC H4M 2P2
|
||||
Canada
|
||||
|
||||
Tel: +1 (514) 747-2547
|
||||
Fax: +1 (514) 747-2561
|
||||
EMail: fyergeau@alis.com
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Yergeau Informational [Page 6]
|
||||
|
||||
Loading…
Reference in a new issue