From ef239afc1c2e8d3b9d076295e8ae04451e8879d9 Mon Sep 17 00:00:00 2001 From: Mark Andrews Date: Wed, 18 Jun 2003 21:37:50 +0000 Subject: [PATCH] new draft --- ...-dnsext-dnssec-2535typecode-change-02.txt} | 137 +- .../draft-ietf-dnsext-wcard-clarify-00.txt | 888 +++++++++++ doc/draft/draft-jseng-idn-admin-01.txt | 1175 --------------- doc/draft/draft-jseng-idn-admin-03.txt | 1335 +++++++++++++++++ 4 files changed, 2319 insertions(+), 1216 deletions(-) rename doc/draft/{draft-ietf-dnsext-dnssec-2535typecode-change-01.txt => draft-ietf-dnsext-dnssec-2535typecode-change-02.txt} (73%) create mode 100644 doc/draft/draft-ietf-dnsext-wcard-clarify-00.txt delete mode 100644 doc/draft/draft-jseng-idn-admin-01.txt create mode 100644 doc/draft/draft-jseng-idn-admin-03.txt diff --git a/doc/draft/draft-ietf-dnsext-dnssec-2535typecode-change-01.txt b/doc/draft/draft-ietf-dnsext-dnssec-2535typecode-change-02.txt similarity index 73% rename from doc/draft/draft-ietf-dnsext-dnssec-2535typecode-change-01.txt rename to doc/draft/draft-ietf-dnsext-dnssec-2535typecode-change-02.txt index a28552912b..daea79c59a 100644 --- a/doc/draft/draft-ietf-dnsext-dnssec-2535typecode-change-01.txt +++ b/doc/draft/draft-ietf-dnsext-dnssec-2535typecode-change-02.txt @@ -1,9 +1,10 @@ -INTERNET-DRAFT Samuel Weiler -Expires: November 2003 May 22, 2003 +INTERNET-DRAFT Samuel Weiler +Expires: December 2003 June 12, 2003 +Updates: RFC 2535, [DS] Legacy Resolver Compatibility for Delegation Signer - draft-ietf-dnsext-dnssec-2535typecode-change-01.txt + draft-ietf-dnsext-dnssec-2535typecode-change-02.txt Status of this Memo @@ -17,7 +18,7 @@ Status of this Memo Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other - documents at any time. It is inappropriate to use Internet- Drafts + documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." @@ -43,6 +44,25 @@ Abstract these interactions be avoided by changing the type codes and mnemonics of the DNSSEC RRs (SIG, KEY, and NXT). +Changes between 01 and 02: + + SIG(0) still uses SIG, not RRSIG. Added 2931 reference. + + Domain names embedded in NSECs and RRSIGs are not compressible and + are not downcased. Added unknown-rrs reference. + + Simplified the last paragraph of section 3 (NSEC doesn't always + signal a negative answer). + + Changed the suggested type code assignments. + + Added 2119 reference. + + Added definitions of "unsecure delegation" and "unsecure referral", + since they're not clearly defined elsewhere. + + Moved 2065 to informative references, not normative. + 1. Introduction The DNSSEC protocol has been through many iterations whose syntax @@ -75,12 +95,23 @@ Abstract disincentive to sign zones with DS. The proposed solution allows for the incremental deployment of DS. -1.1 The Problem +1.1 Terminology - Delegation signer [DS] introduces new semantics for the NXT RR that + In this document, the term "unsecure delegation" means any + delegation for which no DS record appears at the parent. An + "unsecure referral" is an answer from the parent containing an NS + RRset and a proof that no DS record exists for that name. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + +1.2 The Problem + + Delegation Signer [DS] introduces new semantics for the NXT RR that are incompatible with the semantics in [RFC2535]. In [RFC2535], NXT records were only required to be returned as part of a - non-existence proof. In [DS], an unsecure referral returns, in + non-existence proof. With DS, an unsecure referral returns, in addition to the NS, a proof of non-existence of a DS RR in the form of an NXT and SIG(NXT). RFC 2535 didn't specify how a resolver was to interpret a response with both an NS and an NXT in the authority @@ -90,14 +121,14 @@ Abstract delegations being invisible to 2535-aware resolvers and violates the basic architectural principle that DNSSEC must do no harm -- the signing of zones must not prevent the resolution of unsecured - names. + delegations. 2. Possible Solutions This section presents several possible solutions. Section 3 recommends one and describes it in more detail. -2.1. Change SIG, KEY, and NXT +2.1. Change SIG, KEY, and NXT type codes To avoid the problem described above, legacy (RFC2535-aware) resolvers need to be kept from seeing unsecure referrals that @@ -177,49 +208,60 @@ Abstract 3. Protocol changes This document proposes changing the type codes of SIG, KEY, and - NXT. This solution is the cleanest and safest, largely because the - behavior of resolvers that receive unknown type codes is well - understood. This approach has also received the most testing. + NXT. This approach is the cleanest and safest of those discussed + above, largely because the behavior of resolvers that receive + unknown type codes is well understood. This approach has also + received the most testing. To avoid operational confusion, it's also necessary to change the mnemonics for these RRs. DNSKEY will be the replacement for KEY, with the mnemonic indicating that these keys are not for application use, per [RFC3445]. RRSIG (Resource Record SIGnature) - will replace SIG, and NSEC (Next SECure) will replace NXT. + will replace SIG, and NSEC (Next SECure) will replace NXT. These + new types completely replace the old types, except that SIG(0) + [RFC2931] will continue to use SIG. The new types will have exactly the same syntax and semantics as - specified for SIG, KEY, and NXT in [RFC2535] and [DS], and they - completely replace the old types. A resolver, if it receives the - old types, SHOULD treat them as unknown RRs, and SHOULD NOT assign - any special semantic value to them. It MUST NOT use them for - DNSSEC validations or other DNS operational decision making. For - example, a resolver MUST NOT use DNSKEYs to validate SIGs or use - KEYs to validate RRSIGs. Authoritative servers SHOULD NOT serve - SIG, KEY, or NXT records. If those records are included, they MUST - NOT receive special treatment. As an example, if a SIG is included - in a signed zone, there MUST be an RRSIG for it. + specified for SIG, KEY, and NXT in [RFC2535] and [DS] except for + the following: + + 1) Consistent with [UNKNOWN-RRs], domain names embedded in + RRSIG and NSEC RRs MUST NOT be compressed, + + 2) Embedded domain names in RRSIG and NSEC RRs are not downcased + for purposes of DNSSEC canonical form and ordering nor for + equality comparison, and + + 3) An RRSIG with a type covered field of zero has undefined + semantics. + + If a resolver receives the old types, it SHOULD treat them as + unknown RRs and SHOULD NOT assign any special semantic value to + them. It MUST NOT use them for DNSSEC validations or other DNS + operational decision making. For example, a resolver MUST NOT use + DNSKEYs to validate SIGs or use KEYs to validate RRSIGs. + Authoritative servers SHOULD NOT serve SIG, KEY, or NXT records. + If those records are included, they MUST NOT receive special + treatment. As an example, if a SIG is included in a signed zone, + there MUST be an RRSIG for it. + + As a clarification to previous documents, some positive responses, + particularly wildcard proofs and unsecure referrals, will contain + NSEC RRs. Resolvers MUST NOT treat answers with NSEC RRs as + negative answers merely because they contain an NSEC. - As a clarification to previous documents, many positive responses, - including wildcard proofs and insecure referrals, will contain NSEC - RRs. As a result, resolvers MUST NOT treat answers with NSEC RRs - as negative answers merely because they contain an NSEC. A - resolver SHOULD either ignore the NSEC, as a DNSSEC-unaware (or - 2535-aware) resolver would, or validate the NSEC and check its - applicability and interpretation as described in [RFC2535] and - [DS]. - 4. IANA Considerations This document updates the IANA registry for DNS Resource Record - Types by assigning types 46, 47, and 48 to the DNSKEY, RRSIG, and - NSEC RRs, respectively. + Types by assigning types 46, 47, and 48 to the RRSIG, NSEC, and + DNSKEY RRs, respectively. - Types 24, 25, and 30 (SIG, KEY, and NXT) should be marked as - Obsolete. + Types 24 (SIG) is retained for SIG(0) [RFC2931] use only. Types 25 + and 30 (KEY and NXT) should be marked as Obsolete. 5. Security Considerations - The change proposed here does not materially effect security. The + The change proposed here does not materially affect security. The implications of trying to use both new and legacy types together are not well understood, and attempts to do so would probably lead to unintended and dangerous results. @@ -235,9 +277,6 @@ Abstract 6. Normative references - [RFC2065] Eastlake, D. and C. Kaufman, "Domain Name System Security - Extensions", RFC 2065, January 1997. - [RFC2535] Eastlake, D., "Domain Name System Security Extensions", RFC 2535, March 1999. @@ -245,8 +284,17 @@ Abstract draft-ietf-dnsext-delegation-signer-14.txt, work in progress, May 2003. + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC2931] Eastlake, D., "DNS Request and Transaction Signatures + (SIG(0)s)", RFC 2931, September 2000. + 7. Informative References + [RFC2065] Eastlake, D. and C. Kaufman, "Domain Name System Security + Extensions", RFC 2065, January 1997. + [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", RFC 2671, August 1999. @@ -260,6 +308,11 @@ Abstract [RFC3445] Massey, D., and S. Rose. Limiting the Scope of the KEY Resource Record (RR). RFC 3445, December 2002. + [UNKNOWN-RRs] Gustafsson, A. Handling of Unknown DNS Resource + Record Types. draft-ietf-dnsext-unknown-rrs-05.txt + Publication as RFC pending. + + 8. Acknowledgments The proposed solution and the analysis of alternatives had many @@ -268,7 +321,7 @@ Abstract Bill Manning, and Suzanne Woolf. Thanks to Jakob Schlyter and Mark Andrews for identifying the - incompatibility described in section 1.1. + incompatibility described in section 1.2. In addition to the above, the author would like to thank Scott Rose, Olafur Gudmundsson, and Sandra Murphy for their substantive @@ -284,3 +337,5 @@ Abstract weiler@tislabs.com + + diff --git a/doc/draft/draft-ietf-dnsext-wcard-clarify-00.txt b/doc/draft/draft-ietf-dnsext-wcard-clarify-00.txt new file mode 100644 index 0000000000..e5cb2b27b2 --- /dev/null +++ b/doc/draft/draft-ietf-dnsext-wcard-clarify-00.txt @@ -0,0 +1,888 @@ +Internet Engineering Task Force B. Halley +Internet-Draft Nominum + E. Lewis + ARIN + +June 17, 2003 Expires: December 17, 2003 + + Clarifying the Role of Wild Card Domains + in the Domain Name System + + +Status of this Memo + + This document is an Internet-Draft and is in full conformance with all + provisions of Section 10 of RFC2026. + + Internet-Drafts are working documents of the Internet Engineering Task + Force (IETF), its areas, and its working groups. Note that other + groups may also distribute working documents as Internet-Drafts. + + Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress". + + The list of current Internet-Drafts can be accessed at + http://www.ietf.org/ietf/1id-abstracts.txt + + The list of Internet-Draft Shadow Directories can be accessed at + http://www.ietf.org/shadow.html. + +Abstract + +The definition of wild cards is recast from the original in RFC 1034, +in words that are more specific and in line with RFC 2119. This document +is meant to supplement the definition in RFC 1034 and to alter neither +the spirit nor intent of that definition. + +1 Introduction + +The first section of this document will give a crisp overview of what +is begin defined, as well as the motivation for what amounts to a +simple rewording of an original document. Examples are included to +help orient the reader. + +Wild card domain names are defined in Section 4.3.3. of RFC 1034 as +"instructions for synthesizing RRs." [RFC1034] The meaning of this is +that a specific, special domain name is used to construct responses in +instances in which the query name is not otherwise represented in a zone. + +A wild card domain name has a specific range of influence on query names +(QNAMEs) within a given class, which is rooted at the domain name +containing the wild card label, and is limited by explicit entries, zone +cuts and empty non-terminal domains (see section 1.3 of this document). + +Note that a wild card domain name has no special impact on the search +for a query type (QTYPE). If a domain name is found that matches the +QNAME (exact or a wild card) but the QTYPE is not found at that point, +the proper response is that there is no data available. The search +does not continue on to seek other wild cards that might match the QTYPE. +To illustrate, a wild card owning an MX RR does not 'cover' other names +in the zone that own an A RR. + +Why is this document needed? Empirical evidence suggests that the +words in RFC 1034 are not clear enough. There exist a number of +implementations that have strayed (each differently) from that definition. +There also exists a misconception of operators that the wild card can be +used to add a specific RR type to all names, such as the MX RR example +cited above. This document is also needed as input to efforts to extend +DNS, such as the DNS Security Extensions [RFC 2535]. Lack of a clear +base specification has proven to result in extension documents that +have unpredictable consequences. (This is true in general, not just +for DNS.) + +Another reason this clarification is needed is to answer questions +regarding authenticated denial of existence, a service introduced in the +DNS Security Extensions [RFC 2535]. Prior to the work leading up to this +document, it had been feared that a large number of proof records (NXTs) +might be needed in each reply because of the unknown number of potential +wild card domains that were thought to be applicable. One outcome of this +fear is a now discontinued document solving a problem that is now known +not to exist. I.e., this clarification has the impact of defending against +unwarranted protocol surgery. It is not "yet another" effort to just +rewrite the early specifications for the sake of purity. + +1.1 Document Limits + +This document limits itself to reinforcing the concepts in RFC 1034. +Any deviation from this should be brought to the attention of the editors. + +Two changes to the text of RFC 1034 that fall within the realm of +clarifying the wild card definition have been suggested. (Changes aren't +really clarifications.) The two suggestions are barring the ownership +by a wild card domain of an CNAME resource record set and barring the +ownership by a wild card domain of a NS resource record set. Both +of these have some merit, but do not belong in a document that has not +yet been reviewed by the working group. + +1.2 Existence + +The notion that a domain name 'exists' will arise numerous times in this +discussion. RFC 1034 raises the issue of existence in a number of places, +usually in reference to non-existence and often in reference to processing +involving wild card domain names. RFC 1034 does contain algorithms that +describe how domain names impact the preparation of an answer and does +define wild cards as a means of synthesizing answers. + +To help clarify the topic of wild cards, a positive definition of existence +is needed. Complicating matters, though, is the realization that existence +is relative. To an authoritative server, a domain name exists if the +domain name plays a role following the algorithms of preparing a response. +To a resolver, a domain name exists if there is any data available +corresponding to the name. The difference between the two is the synthesis +of records according to a wild card. + +For the purposes of this document, the point of view of an authoritative +server is adopted. A domain name is said to exist if it plays a role in +the execution of the algorithms in RFC 1034. + +1.3 An Example + +For example, consider this wild card domain name: *.example. Any query +name under example. is a candidate to be matched (answered) by this wild +card, i.e., to have an response returned that is synthesized from the wild +card's RR sets. Although any name is a candidate, not all queries will +match. + +To further illustrate this, consider this example: + + $ORIGIN example. + @ IN SOA + NS + NS + * TXT "this is a wild card" + MX 10 mailhost.example. + host1 A 10.0.0.1 + _ssh._tcp.host1 SRV + _ssh._tcp.host2 SRV + subdel NS + +The following queries would be synthesized from the wild card: + QNAME=host3.example. QTYPE=MX, QCLASS=IN + the answer will be a "host3.example. IN MX ..." + QNAME=host3.example. QTYPE=A, QCLASS=IN + the answer will reflect "no error, but no data" + because there is no A RR set at '*' + +The following queries would not be synthesized from the wild card: + QNAME=host1.example., QTYPE=MX, QCLASS=IN + because host1.example. exists + QNAME=_telnet._tcp.host1.example., QTYPE=SRV, QCLASS=IN + because _tcp.host1.example. exists (without data) + QNAME=_telnet._tcp.host2.example., QTYPE=SRV, QCLASS=IN + because host2.example. exists (without data) + QNAME=host.subdel.example., QTYPE=A, QCLASS=IN + because subdel.example. exists and is a zone cut + +To the server, the following domains are considered to exist in the zone: +*, host1, _tcp.host1, _ssh._tcp.host1, host2, _tcp.host2, _ssh._tcp.host2, +and subdel. To a resolver, many more domains appear to exist via the +synthesis of the wild card. + +1.4 Empty Non-terminals + +Empty non-terminals are domain names that own no data but have subdomains. +This is defined in section 3.1 of RFC 1034: + +# The domain name space is a tree structure. Each node and leaf on the +# tree corresponds to a resource set (which may be empty). The domain +# system makes no distinctions between the uses of the interior nodes and +# leaves, and this memo uses the term "node" to refer to both. + +The parenthesized "which may be empty" specifies that empty non-terminals +are explicitly recognized. According to the definition of existence in +this document, empty non-terminals do exist at the server. + +Carefully reading the above paragraph can lead to an interpretation that +all possible domains exist - up to the suggested limit of 255 octets for +a domain name [RFC 1035]. For example, www.example. may have an A RR, and +as far as is practically concerned, is a leaf of the domain tree. But the +definition can be taken to mean that sub.www.example. also exists, albeit +with no data. By extension, all possible domains exist, from the root on +down. As RFC 1034 also defines "an authoritative name error indicating +that the name does not exist" in section 4.3.1, this is not the intent +of the original document. + +RFC1034's wording is to be clarified by adding the following paragraph: + + A node is considered to have an impact on the algorithms of 4.3.2 + if it is a leaf node with any resource sets or an interior node, + with or without a resource set, that has a subdomain that is a leaf + node with a resource set. A QNAME and QCLASS matching an existing + node never results in a response return code of authoritative name + error. + +The terminology in the above paragraph is chosen to remain as close to +that in the original document. The term "with" is a alternate form for +"owning" in this case, hence "a leaf node owning resources sets, or an +interior node, owning or not owning any resource set, that has a leaf +node owning a resource set as a subdomain," is the proper interpretation +of the middle sentence. + +As an aside, an "authoritative name error" has been called NXDOMAIN in +some RFCs, such as RFC 2136 [RFC 2136]. NXDOMAIN is the mnemonic assigned +to such an error by at least one implementation of DNS. As this +mnemonic is specific to implementations, it is avoided in the remainder +of this document. + +1.3 Terminology + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", +"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this +document are to be interpreted as described in the document entitled +"Key words for use in RFCs to Indicate Requirement Levels." [RFC2119] + +Requirements are denoted by paragraphs that begin with with the following +convention: 'R'.. + +2 Defining the Wild Card Domain Name + +A wild card domain name is defined by having the initial label be: + + 0000 0001 0010 1010 (binary) = 0x01 0x2a (hexadecimal) + +This defines domain names that may play a role in being a wild card, that +is, being a source for synthesized answers. Domain names conforming to +this definition that appear in queries and RDATA sections do not have +any special role. These cases will be described in more detail in +following sections. + +R2.1 A domain name that is to be interpreted as a wild card MUST begin + with a label of '0000 0001 0010 1010' in binary. + +The first octet is the normal label type and length for a 1 octet long +label, the second octet is the ASCII representation [RFC 20] for the +'*' character. In RFC 1034, ASCII encoding is assumed to be the character +encoding. + +In the master file formats used in RFCs, a "*" is a legal representation +for the wild card label. Even if the "*" is escaped, it is still +interpreted as the wild card when it is the only character in the label. + +R2.2. A server MUST treat a wild card domain name as the basis of + synthesized answers regardless of any "escape" sequences in + the input format. + +RFC 1034 and RFC 1035 ignore the case in which a domain name might be +"the*.example.com." The interpretation is that this domain name in a +zone would only match queries for "the*.example.com" and not have any +other role. + +Note: By virtue of this definition, a wild card domain name may have a +subdomain. The subdomain (or sub-subdomain) itself may also be a wild +card. E.g., *.*.example. is a wild card, so is *.sub.*.example. +More discussion on this is given in Appendix A. + +3 Defining Existence + +As described in the Introduction, a precise definition of existence is +needed. + +R3.1 An authoritative server MUST treat a domain name as existing during + the execution of the algorithms in RFC 1034 when the domain name + conforms to the following definition. A domain name is defined + to exist if the domain name owns data and/or has a subdomain that + exists. + +Note that at a zone boundary, the domain name owns data, including the +NS RR set. At the delegating server, the NS RR set is not authoritative, +but that is of no consequence here. The domain name owns data, therefore, +it exists. + +R3.2 An authoritative server MUST treat a domain name that has neither + a resource record set nor an existing subdomain as non-existent when + executing the algorithm in section 4.3.2. of RFC 1034. + +A note on terminology. A domain transcends zones, i.e., all DNS data is +in the root domain but segmented into zones of control. In this document, +there are references to a "domain name" in the context of existing "in a +zone." In this usage, a domain name is the root of a domain, not the entire +domain. The domain's root point is said to "exist in a zone" if the zone +is authoritative for the name. RR sets existing in a domain need not be +owned by the domain's root domain name, but are owned by other domain +names in the domain. + +4 Impact of a Wild Card Domain In a Query Message + +When a wild card domain name appears in a question, e.g., the query name +is "*.example.", the response in no way differs from any other query. +In other words, the wild card label in a QNAME has no special meaning, +and query processing will proceed using '*' as a literal query name. + +R4.1 A wild card domain name acting as a QNAME MUST be treated as any + other QNAME, there MUST be no special processing accorded it. + +If a wild card domain name appears in the RDATA of a CNAME RR or any +other RR that has a domain name in it, the same rule applies. In the +instance of a CNAME RR, the wild card domain name is used in the same +manner of as being the original QNAME. For other RR's, rules vary +regarding what is done with the domain name(s) appearing in them, +in no case does the wild card hold special meaning. + +R4.2 A wild card domain name appearing in any RR's RDATA MUST be treated + as any other domain name in that situation, there MUST be no special + processing accorded it. + +5 Impact of a Wild Card Domain On a Response + +The description of how wild cards impact response generation is in RFC +1034, section 4.3.2. That passage contains the algorithm followed by a +server in constructing a response. Within that algorithm, step 3, part +'c' defines the behavior of the wild card. The algorithm is directly +quoted in lines that begin with a '#' sign. Commentary is interleaved. + +[Note that are no requirements specifically listed in this section. The +text here is explanatory and interpretative. There is no change to +the algorithm specified in RFC 1034.] + +The context of part 'c' is that the search is progressing label by label +through the QNAME. (Note that the data being searched is the authoritative +data in the server, the cache is searched in step 4.) Step 3's part 'a' +covers the case that the QNAME has been matched in full, regardless of the +presence of a CNAME RR. Step 'b' covers crossing a cut point, resulting +in a referral. All that is left is to look for the wild card. + +Step 3 of the algorithm also assumes that the search is looking in the +zone closest to the answer, i.e., in the same class as QCLASS and as +close to the authority as possible on this server. If the zone is not +the authority, then a referral is given, possibly one indicating lameness. + +# c. If at some label, a match is impossible (i.e., the +# corresponding label does not exist), look to see if a +# the "*" label exists. + +The above paragraph refers to finding the domain name that exists in the +zone and that most encloses the QNAME. Such a domain name will mark the +boundary of candidate wild card domain names that might be used to +synthesize an answer. (Remember that at this point, if the most enclosing +name is the same as the QNAME, part 'a' would have recorded an exact +match.) The existence of the enclosing name means that no wild card name +higher in the tree is a candidate to answer the query. + +Once the closest enclosing node is identified, there's the matter of what +exists below it. It may have subdomains, but none will be closer to the +QNAME. One of the subdomains just might be a wild card. If it exists, +this is the only wild card eligible to be used to synthesize an answer +for the query. Even if the closest enclosing node conforms to the syntax +rule in section 2 for being a wild card domain name, the closest enclosing +node is not eligible to be a source of a synthesized answer. + +The only wild card domain name that is a candidate to synthesize an answer +will be the "*" subdomain of the closest enclosing domain name. Three +possibilities can happen. The "*" subdomain does not exist, the "*" +subdomain does but does not have an RR set of the same type as the QTYPE, +or it exists and has the desired RR set. + +For the sake of brevity, the closest enclosing node can be referred to as +the "closest encloser." The closest encloser is the most important concept +in this clarification. Describing the closest encloser is a bit tricky, +but it is an easy concept. + +To find the closest encloser, you have to first locate the zone that is +the authority for the query name. This eliminates the need to be concerned +that the closest encloser is a cut point. In addition, we can assume too +that the query name does not exist, hence the closest encloser is not equal +to the query name. We can assume away these two cases because they are +handled in steps a and b of section 4.3.2.'s algorithm. + +What is left is to identify the existing domain name that would have been +up the tree (closer to the root) from the query name. Knowing that an +exact match is impossible, if there is a "*" label descending from the +unique closest encloser, this is the one and only wild card from which +an answer can be synthesized for the query. + +To illustrate, using the example in section 1.2 of this document, the +following chart shows QNAMEs and the closest enclosers. In Appendix A +there is another chart showing unusual cases. + + QNAME Closest Encloser Wild Card Source + host3.example. example. *.example. + _telnet._tcp.host1.example. _tcp.host1.example. no wild card + _telnet._tcp.host2.example. host2.example. no wild card + _telnet._tcp.host3.example. example. *.example. + _chat._udp.host3.example. example. *.example. + +Note that host1.subdel.example. is in a subzone, so the search for it ends +in a referral in part 'b', thus does not enter into finding a closest +encloser. + +The fact that a closest encloser will be the only superdomain that +can have a candidate wild card will have an impact when it comes to +designing authenticated denial of existence proofs. (This concept +is not introduced until DNS Security Extensions are considered in +upcoming sections.) + +# If the "*" label does not exist, check whether the name +# we are looking for is the original QNAME in the query +# or a name we have followed due to a CNAME. If the name +# is original, set an authoritative name error in the +# response and exit. Otherwise just exit. + +The above passage says that if there is not even a wild card domain name +to match at this point (failing to find an explicit answer elsewhere), +we are to return an authoritative name error at this point. If we were +following a CNAME, the specification is unclear, but seems to imply that +a no error return code is appropriate, with just the CNAME RR (or sequence +of CNAME RRs) in the answer section. + +# If the "*" label does exist, match RRs at that node +# against QTYPE. If any match, copy them into the answer +# section, but set the owner of the RR to be QNAME, and +# not the node with the "*" label. Go to step 6. + +This final paragraph covers the role of the QTYPE in the process. Note +that if no resource record set matches the QTYPE the result is that no data +is copied, but the search still ceases ("Go to step 6."). + +6 Authenticated Denial and Wild Cards + +In unsecured DNS, the only concern when there is no data to return to +a query is whether the domain name from which the answer comes exists or +not, whether or not a name error is indicated in the return code. In +either case the answer section is empty or contained just a sequence of +CNAME RR sets. + +In securing DNS, authenticated denial of existence is a service that is +provided. The chosen solution to provide this service is to generate +resource records indicating what is protected in a zone and to digitally +sign these. + +The resource records that do this, as defined in RFC 2535, are NXT RRs. + +There are three points to consider when clarifying the topic of wild card +domain names. One is the construction of the records. The second is +the inclusion of records in responses. The third is the interpretation +of the records in a response by the resolver. + +In short, authenticated denial has to be sure to prove that the closest +encloser does not equal the query name, whether there is a wild card +name directly under the closest encloser. + +6.1 Preparing Wild Card Domain Name Owned Non-existence Proofs + +During the creation of the authenticated denial records, the wild card +domain name plays no special role, in the same manner as the wild card +domain name playing no special role in a query. + +There are two considerations with regards to preparing non-existence +proofs. + +R6.1 Any mechanism used to provide authenticated denial MUST reveal the + closest enclosing existing domain name for the query. If this is not + provided, the resolver will not be able to ascertain the identity + of an appropriate wild card domain name. + +R6.2 If a zone is signed in such a way that offers authenticated denial + of existence, wild card domain name owned RR sets MUST be signed. + Otherwise the determination of the "closest encloser" is not possible. + +6.2 Role of Wild Cards in Answers + +There are three cases to address. The first is synthesizing from wild card +domain name with data, the second is negatively synthesizing from an +existing wild card, and the third is denying that neither an exact match, +referral, nor wild card exist to answer the query. + +6.2.1 Synthesizing From a Wild Card + +When preparing an answer from a wild card domain name, the answer needs +to include proof that the exact match of the QNAME and QCLASS does not +exist. This is needed because synthesis of the answer replaces the "*" +label with the QNAME without securing the result. The resolver will +realize that the answer was derived from a wild card, but cannot +detect whether an exact match was maliciously omitted. + +R6.3 When synthesizing a positive answer from a wild card domain name, the + answer MUST include proof that the exact match for the QNAME and + QCLASS does not exist. + +Note that a proof that the QTYPE does not exist at the QNAME and QCLASS is +not sufficient to justify synthesis from a wild card. + +6.2.2. Synthesizing Authoritative No Error, No Data From a Wild Card + +When synthesizing a negative answer that is derived from a wild card, +meaning that a wild card matched the QNAME (no exact match happened for +QNAME) but that there is no match for QTYPE there, at most two negative +answers are needed, possibly one. As in 6.2.1, a proof that the exact +match failed is needed. A second proof is needed to show that the wild +card domain name does not have the QTYPE. Depending on the method of +authenticated denial, these this could be possible with one statement. + +R6.4 When synthesizing a negative answer from a wild card domain name, the + answer MUST include proof that the exact match of the QNAME and + QCLASS does not exist and that the QTYPE matches no RR set at the + wild card. If this answer can be optimized, an implementation + SHOULD reduce the number of records included in the response. + +6.2.3. Answering With an Authoritative Name Error + +When answering with a result code of a name error, the answer needs to +provide proof that neither the exact match for QNAME and QCLASS exists +nor that a wild card domain name exists as a subdomain of the closest +enclosing domain name. + +R6.5 When preparing a reply with an authoritative name error, the answer + MUST include proof that the exact match for the QNAME and QCLASS + does not exist and that no wild card is available to provide a match. + +6.2.4. The Remaining Case (Authoritative No Error, No Data at QNAME) + +When answering negatively because there is a match for QNAME and QCLASS +but no match for the QTYPE, only a proof for that is needed. Just as +the search does not proceed onto a search for the wild card in this +case, neither does the construction of the negative answer proof. + +R6.6 When preparing a reply in which there is an exact match of the + QNAME and QCLASS, but there is no RR set matching the QTYPE, + the reply SHOULD NOT contain any proof regarding the wild card + domain name. + +6.3 Interpreting Negative Answers Involving Wild Cards + +There are three requirements for resolvers when it comes to handling +negative answers generated as described in section 6.2. + +R6.7 A resolver MUST confirm that the negative data relates to the + query submitted. + +It is incumbent upon the resolver to interpret the answer correctly. + +R6.8 A resolver MUST confirm that an answer synthesized from a wild + card domain name is done so only in an authoritative absence of + a domain name with the query name and query class. + +In the case of a wild card synthesized answer, the resolver has to +see that the query name and class has no node, proving that a synthesized +answer would be appropriate (subject to validation of it). + +R6.9 A resolver MUST confirm that an authoritative name error is + valid if there is proof that both domain name matching the query + name and class and if there is proof that the closest encloser + does not have a wild card domain name as an immediate descendent. + +Before concluding that an authoritative name error is justified, a +resolver has to determine that neither an exact match for the query +name and class exists nor an appropriate wild card domain name. + +6.4 Authenticated Denial, Wild Card Domain Names, and Opt-In + +When considering the Opt-In proposal [WIP], it is wise to not combine +a zone that adheres to both opt-in and that has a wild card domain +name. The reason is rooted in that the synthesis of an answer is done +by substituting the QNAME for the wild card domain name in the answer. +Because this is unsecured, and the is ambiguity regarding whether a +negative proof can be provided for the exact match (when it is outside +the opt-in secured area), a definitive proof of authenticated denial +is not possible. + +For a more complete discussion of this topic, please refer to the document +describing the Opt-In proposal, referenced above. + +7 Analytical Proof That NXT Names the Closest Encloser + +How does one know, and (more importantly) *prove* using NXT records, what +the closest encloser of a given QNAME is? This section answers that +question with a rigorous proof, because security is the topic. + +7.1 Background to the Proof + +We'd like to have empty non-terminals provably exist in secure zones. +In other words, if someone has: + + a.b.c 3600 IN A 10.0.0.1 + +in their zone, but does not have any records with owner names "c" or +"b.c", we'd like to be able to say (with proof) that "nodes 'c' and +'b.c' exist and yet have no RRs." + +We want this because it is the behavior mandated by the nameserver +algorithm in section 4.3.2 of RFC 1034, and because it is regarded by +most as a better, more "natural" behavior than the alternative of +treating such empty non-terminals as being non-existent. + +There are two ways to achieve this. One way is to instantiate all +the implied empty non-terminals, and then add NXT and SIG(NXT) to them. +This works, but is a burden to the server in storage and computation +resources. It especially complicates updates, since any deletion of +the last record at a node necessitates a computation to determine +which empty non-terminals are no longer relevant and thus must also be +deleted. + +The second way is to infer the existence of the empty non-terminals +from the names of the nodes with real data (i.e. the names in the NXT +chain). + +Using this technique, the "deepest existing ancestor" a.k.a. the "most +enclosing name" of any query name Q can be easily found, and proved to +exist. This allows great efficiency in the wild card matching +algorithm as well, since only one wild card possibility exists and must +subsequently be either proven to exist or proven not to exist. This +is a big improvement on the "empty non-terminals do not exist" +approach, which has many more possible candidate wild card names which +must be proven not to exist. + +7.2 Definitions and Preliminaries + +When we say "subdomain" anywhere below, we mean "is contained within the +domain (in the sense that RFC 1034 describes), or is equal to the domain". +I.e., we're treating it like "subset" in mathematics. + +X is a "superdomain" of Y iff. Y is a subdomain of X. + +A name is an "owner name in zone Z" if it is an owner name, is a subdomain +of the origin of zone Z, and is not glue (or otherwise beneath a zone cut +of zone Z). + +A name N is "directly in zone Z" iff. there is some owner name in Z equal +to N. + +A name N is "inferred to be in zone Z", if it is not directly in zone Z, +but is a superdomain of some direct name of Z and is still a subdomain of +Z. I.e., it is an "empty non-terminal" required to make the path from the +zone origin to some name directly in Z. + +A name is "in zone Z" if it is directly in zone Z, or is inferred to be in +zone Z. + +Let "<" denote the DNSSEC name order relation. + +The "greatest common superdomain" of names A and B, denoted GCS(A,B), is +the greatest (according to the DNSSEC ordering) name X such that X is a +superdomain of both A and B. I.e. it is the "deepest common ancestor" of +A and B. GCS(A,B) always exists, because the root name is a superdomain +of all names. + +Let Q be a name which is a subdomain of the origin of zone Z. + +7.3 Bounds of Q in Z + +There is always a name directly in Z, call it "GLB(Q,Z)", which is the +greatest lower bound of Q. I.e. GLB(Q,Z) <= Q, and for all N in Z where +N <= Q, N <= GLB(Q,Z). + +There may or may not be a name directly in Z, call it "LUB(Q,Z)", which is +the least upper bound of Q. If there is no N directly in Z such that +N >= Q, then there is no LUB(Q,Z). If there is some N directly in Z where +N >= Q, then there is an LUB(Q,Z) >= Q such that if N >= Q, then +LUB(Q,Z) <= N. + +So, GLB(Q,Z) <= Q < LUB(Q,Z), if the least upper bound exists. + +GLB(Q,Z) will have a NXT record which: + + If GLB(Q,Z) = Q, proves that Q is directly in Z + + If GLB(Q,Z) != Q, proves that Q is not directly in Z + +The "next domain name" field of this NXT record is the LUB, unless it is +the zone origin (the DNSSEC "end of chain" marker) and Q != the origin of +Z, in which case there is no LUB. + +THEOREM 1: Let A, B, and Q be subdomains of Z. Let A <= B and B <= Q. Then + + GCS(Q, A) <= GCS(Q, B) + +Proof: + +Assume GCS(Q, A) > GCS(Q, B). Then A must have more labels in common with +Q than B, but since A and B are less than Q, that means that A > B by the +DNSSEC ordering, which is a contradiction since A <= B. + +THEOREM 2: Let A, B, and Q be subdomains of Z. Let A >= B and B >= Q. Then + + GCS(Q, A) <= GCS(Q, B) + +Proof: + +Assume GCS(Q, A) > GCS(Q, B). Then A must have more labels in common with +Q than B, but since A and B are greater than Q, that means that A < B by +the DNSSEC ordering, which is a contradiction since A >= B. + +7.4 Greatest Ancestor of Q in Z + +The "greatest ancestor of Q in Z", denoted GA(Q,Z), is the greatest N in Z, +directly or inferred, such that Q is a subdomain of N. GA(Q,Z) is also +called the "most enclosing name of Q in Z" or the "deepest ancestor of +Q in Z". + +GA(Q,Z) always exists. Since Q is a subdomain of the origin of Z, and the +origin of Z is "directly in zone Z", so there's always at least one N in Z +such that Q is a subdomain of N. + +THEOREM 3: Let Q be a subdomain of the origin of zone Z. If LUB(Q,Z) +exists, then: + + GA(Q,Z) = the greater of GCS(Q, GLB(Q,Z)) and GCS(Q, LUB(Q,Z)) + +otherwise + + GA(Q,Z) = GCS(Q, GLB(Q,Z)) + +Proof: + +We can eliminate the trivial case where Q is directly in Z, since in that +case GA(Q,Z) is obviously Q. + +For notational convenience, let + + L = GCS(Q, GLB(Q,Z)) + U = GCS(Q, LUB(Q,Z)) + +Assume L and U both exist. Assume there is an M in Z that is greater than +both L and U, and is a superdomain of Q. + +If M is directly in Z, then M > GLB(Q,Z). This is because if M were +<= GLB(Q,Z), then GCS(Q,M) would be <= L by Theorem 1. If M is directly +in Z, it cannot be >= Q since it is a superdomain of Q and M != Q. So, +we have GLB(Q,Z) < M < Q, which implies that GLB(Q,Z) is not the greatest +lower bound, which is a contradiction. + +If M is inferred to be in Z, then there is some N directly in Z and M is a +superdomain of N. Either N < Q or N > Q (since Q is not directly in Z). + +If N < Q, then N > GLB(Q,Z). If N were <= GLB(Q,Z), then the GCS(Q,N) +would be <= L by Theorem 1, but GCS(Q,N) = M, and M > L. We thus have a +contradiction, since this implies that GLB(Q,Z) is not the greatest lower +bound. + +If N > Q, then N < LUB(Q,Z). If N were >= LUB(Q,Z), then the GCS(Q,N) +would be <= U by Theorem 2, but GCS(Q,N) = M, and M > U. We thus have a +contradiction, since this implies that LUB(Q,Z) is not the least upper bound. + +Now we deal with the case where U doesn't exist. Again, assume M in Z that +is greater than L, and is a superdomain of Q. + +The cases where M is directly in Z, or where M is inferred and N < Q are as +above. Now we deal with the case where N > Q. First we note that since < +is a well-ordering of the names in Z, if there are any upper bounds to Q in +Z, then there must be a least upper bound. Now, if N existed, it would be +an upper bound of Q in Z, and hence a least upper bound would have to exist, +but there is no least upper bound of Q in Z by assumption, so we again have +a contradiction. + +Q.E.D. + +7.5 Conclusion of the Proof + +We've shown how to find the "closest encloser" of any given QNAME by looking +at the QNAME along with the owner name and "next domain name" field of the +NXT record which proves the QNAME doesn't exist. The technique works even +when the closest encloser is an inferred name. + +Knowing the closest encloser lets us do very simple wild card checking in +secure zones, since the only possible matching wild card is + + *. + +We simply lookup that name, and if found, proceed accordingly. If not, we +add the NXT record which proves it doesn't exist to the authority section. + +8 Security Considerations + +This document is refining the specifications to make it more likely that +security can be added to DNS. No functional additions are being made, just +refining what is considered proper to allow the DNS, security of the DNS, and +extending the DNS to be more predictable. + +9 References + +Normative References + +[RFC 20] ASCII Format for Network Interchange, V.G. Cerf, Oct-16-1969 +[RFC 1034] Domain Names - Concepts and Facilities, P.V. Mockapetris, + Nov-01-1987 +[RFC 1035] Domain Names - Implementation and Specification, P.V + Mockapetris, Nov-01-1987 +[RFC 2119] Key Words for Use in RFCs to Indicate Requirement Levels, S + Bradner, March 1997 + +Non-normative References + +[RFC 2136] Dynamic Updates in the Domain Name System (DNS UPDATE), P. Vixie, + Ed., S. Thomson, Y. Rekhter, J. Bound, April 1997 +[RFC 2535] Domain Name System Security Extensions, D. Eastlake, March 1999 +[WIP] DNSSEC Opt-In, Internet Draft, R. Arends, M. Kosters, D. Blacka, 2002 + +10 Others Contributing to This Document + +Others who have directly caused text to appear in the document: Paul Vixie +and Olaf Kolkman. Many others have indirect influences on the content. + +11 Editors + +Name: Bob Halley +Affiliation: Nominum, Inc. +Address: 2385 Bay Road, Redwood City, CA 94063 USA +Phone: +1-650-381-6016 +EMail: Bob.Halley@nominum.com + +Name: Edward Lewis +Affiliation: ARIN +Address: 3635 Concorde Pkwy, Suite 200, Chantilly, VA 20151 USA +Phone: +1-703-227-9854 +Email: edlewis@arin.net + +Appendix A: Subdomains of Wild Card Domain Names + +In reading the definition of section 2 carefully, it is possible to +rationalize unusual names as legal. In the example given, *.example. +could have subdomains of *.sub.*.example. and even the more direct +*.*.example. (The implication here is that these domain names own +explicit resource records sets.) Although defining these names is not +easy to justify, it is important that implementions account for the +possibility. This section will give some further guidence on handling +these names. + +The first thing to realize is that by all definitions, subdomains of +wild card domain names are legal. In analyzing them, one realizes +that they cause no harm by their existence. Because of this, they are +allowed to exist, i.e., there are no special case rules made to disallow +them. The reason for not preventing these names is that the prevention +would just introduce more code paths to put into implementations. + +The concept of "closest enclosing" existing names is important to keep in +mind. It is also important to realize that a wild card domain name can +be a closest encloser of a query name. For example, if *.*.example. is +defined in a zone, and the query name is a.*.example., then the closest +enclosing domain name is *.example. Keep in mind that the closest +encloser is not eligible to be a source of synthesized answers, just the +subdomain of it that has the first label "*". + +To illustrate this, the following chart shows some matches. Assume that +the names *.example., *.*.example., and *.sub.*.example. are defined +in the zone. + + QNAME Closest Encloser Wild Card Source + a.example. example. *.example. + b.a.example. example. *.example. + a.*.example. *.example. *.*.example. + b.a.*.example. *.example. *.*.example. + b.a.*.*.example. *.*.example. no wild card + a.sub.*.example. sub.*.example. *.sub.*.example. + b.a.sub.*.example. sub.*.example. *.sub.*.example. + a.*.sub.*.example. *.sub.*.example. no wild card + *.a.example. example. *.example. + a.sub.b.example. example. *.example. + +Recall that the closest encloser itself cannot be the wild card. Therefore +the match for b.a.*.*.example. has no applicable wild card. + +Finally, if a query name is sub.*.example., any answer available will come +from an exact name match for sub.*.example. No wild card synthesis is +performed in this case. + +Full Copyright Statement + + Copyright (C) The Internet Society 2003. All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published and + distributed, in whole or in part, without restriction of any kind, + provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of developing + Internet standards in which case the procedures for copyrights defined + in the Internet Standards process must be followed, or as required to + translate it into languages other than English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT + NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN + WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + +-- diff --git a/doc/draft/draft-jseng-idn-admin-01.txt b/doc/draft/draft-jseng-idn-admin-01.txt deleted file mode 100644 index c0998e4b64..0000000000 --- a/doc/draft/draft-jseng-idn-admin-01.txt +++ /dev/null @@ -1,1175 +0,0 @@ -INTERNET DRAFT Editors: James SENG -draft-jseng-idn-admin-01.txt John KLENSIN -18th Oct 2002 Authors: K. KONISHI -Expires 18th April 2003 K. HUANG, H. QIAN, Y. KO - - Internationalized Domain Names Registration and Administration - Guideline for Chinese, Japanese and Korean - -Status of this Memo - - This document is an Internet-Draft and is in full conformance - with all provisions of Section 10 of RFC2026 except that the - right to produce derivative works is not granted. - - Internet-Drafts are working documents of the Internet - Engineering Task Force (IETF), its areas, and its working - groups. Note that other groups may also distribute working - documents as Internet-Drafts. - - Internet-Drafts are draft documents valid for a maximum of - six months and may be updated, replaced, or obsoleted by other - documents at any time. It is inappropriate to use Internet- - Drafts as reference material or to cite them other than as - "work in progress." - - The list of current Internet-Drafts can be accessed at - http://www.ietf.org/ietf/1id-abstracts.txt - - The list of Internet-Draft Shadow Directories can be accessed at - http://www.ietf.org/shadow.html. - -Abstract - -Achieving internationalized access to domain names raises many complex -issues. These include not only associated with basic protocol design -(i.e., how the names are represented on the network, compared, and -converted to appropriate forms) but also issues and options for -deployment, transition, registration and administration. - -The IETF IDN working group focused on the development of a standards -track specification for access to domain names in a broader range of -scripts than the original ASCII. It became clear during its efforts -that there was great potential for confusion, and difficulties in -deployment and transition, due to characters with similar appearances -or interpretations and that those issues could best be addressed -administratively, rather than through restrictions embedded in the -protocols. - -This document provides guidelines for zone administrators (including -but not limited to registry operators and registrars), and information -for all domain names holders, on the administration of those domain -names which contain characters drawn from Chinese, Japanese and Korean -scripts (CJK). Other language groups are encouraged to develop their -own guidelines as needed, based on these guideline if that is helpful. - -Comments on this document can be sent to the authors at -idn-admin@jdna.jp. - -Table of Contents - -0. Pre-Note for ASCII-version of this document 2 - -1. Introduction 3 - -2. Definitions 5 - -3. Administrative Framework 6 -3.1. Principles underlying these Guidelines 7 -3.2. Registration of IDL 8 -3.2.1. Language character variant table 9 -3.2.2 Formal syntax 10 -3.2.3. Registration Algorithm 10 -3.3. Deletion and Transfer of IDL and IDL Package 12 -3.4. Activation and De-activation of IDN variants 13 -3.5. Adding/Deleting language(s) association 13 -3.6. Versioning of the language character variant tables 13 - -4. Example of Guideline Adoption 14 - -i. Notes 17 - -ii. Acknowledgements 17 - -iii. Authors 18 - -iv. Appendex A 18 - -v. Normative References 19 - -vi. Non-normative References 19 - -vii. Other Issues 19 - - - -0. Pre-Note for ASCII-version of this document - -In order to make meanings clear, especially in examples, Han ideographs -are used in several places in this document. Of course, these -ideographs do not appear in its ASCII form of this document. So, for -the convenience of readers of the ASCII format and some readers not -familiar with recognizing and distinguishing Chinese characters, each -use of a particular character will be associated with both its Unicode -code point and an "asterisk tag" with its corresponding Chinese -Romanization [ISO7098] with the tone mark represented by a number 1 to -4. Those tags have no meaning outside this document; they are intended -simply to provide a quick visual and reading reference to facilitate -the combinations and transformations of characters in the guideline and -table excerpts. Appendix A would provide the Romanization of the -ideographs in Japanese (ISO 3602) and Korean (ISO 11941). - -1. Introduction - -Defining and specifying protocols for Internationalized Domain Names -has been one of the most controversial tasks initiated by the IETF in -recent years. Domain names are the fundamental naming architecture of -the Internet; many Internet protocols and applications rely on the -stability, continuity, and absence of ambiguity of the DNS. - -The introduction of internationalized domain names (IDN) amplifies the -difficulty of putting names into identifiers and the confusion between -scripts and languages. It impacts many internet protocols and -applications and creates more complexity in technical administration -and services. - -While the IETF IDN working group [IDN-WG] focused on the technical -problems of IDN, administrative guidelines are also important in order -to reduce unnecessary user confusion and domain name disputes among -domain name holders. - -The IDN working group has completed working group last call for the -following internet-drafts: - -1. Preparation of Internationalized Strings [STRINGPREP] -2. Internationalizing Host Names In Applications [IDNA] -3. Punycode version 0.3.3 [PUNYCODE] -4. A Stringprep Profile for Internationalized Domain Names [NAMEPREP] - -These drafts specify that the intersystem protocols that make up the -domain name system infrastructure remain unchanged. Instead, they -introduce internationalization (I18N) [Note1] in client software -(particularly via the IDNA protocol) using an ASCII Compatible Encoding -(ACE) known as Punycode. - -The domain name protocols [STD13] also specify that characters are to -be interpreted so that upper and lower case Latin-based characters are -considered equivalent. But with the introduction of Unicode characters -beyond US-ASCII, and the possibility to represent a single character in -multiple ways in ISO10646/Unicode [UNICODE], a normalization process, -known as Nameprep, has been proposed to handle the more complex -problems of character-matching for those additional characters. -Nameprep is also executed by client software as described in IDNA. - -While Nameprep normalizes domain names so that the users have an -improved chance of getting the right domain name from information -provided in other forms, as required for I18N, Nameprep does not handle -any localization (L10N). - -This becomes significant when a domain name holder attempts to use a -Unicode string forming a "name", "word", or "phrase" that may have -certain meaning in a certain language or when used as a domain name. -Such Unicode string may have different variants in the context of the -language or culture. - -Generally, these localized variants in CJK can be classified into four -categories, as described by Halpern et al. [C2C]: [Note2] - -a. Character (or Code) variants - -Character (or Code) variants refer to variants that are generated by -character-by-character (or code-by-code) substitution. - -An example in English would be "A" or "a" (U+0041 or U+0061). -Two examples in Chinese would be U+98DB *fei1* or U+98DE *fei1* -and U+6A5F *ji1* or U+673A *ji1*. - -Note that this does not mean the choice between U+6A5F and U+673A is -always symmetric like the one between "A" and "a" -- it is a choice only -for Chinese but not for Japanese. - -The variants for particular characters may be just to drop them. For -example, points and vowels characters in Hebrew (U+05B0 to U+05C4) and -Arabic (U+064B to U+0652) are optional; the variants for strings -containing them are constructed by simply dropping those points and -vowels. - -Code variants may also occur when different code points are assigned to -what visually or abstractly are the "same" character, possibility due -to compatibility issues, type face differences or script range. For -example, LATIN CAPITAL LETTER A (U+0041) normally has an appearance -identical to GREEK CAPTIAL LETTER A (U+0391). CJK scripts have font -variants for compatibility (either U+4E0D or U+F967 may be used) and -"zVariant" (e.g. U+5154 and U+514E). - -The difficulty lies in defining which characters are the "same" and -which are not. - -b. Orthographic variants - -Orthographic variants refer to variants that are generated by word-by- -word substitution. - -An example in English would be "color" and "colour". - -It is possible for some of these orthographic variants to be generated -by character variants. For example "airplane" in Chinese may be either -U+98DB U+6A5F *fei1 ji1* or U+98DE U+673A *fei1 ji1*. - -Other orthographic variants may not be generated by character variants. -For example, in Chinese, both U+767C *fa1* and U+9AEE *fa4* -are related to U+53D1 *fa1 or fa4* depending on the word. For hair, -U+5934 U+53D1 *tou2 fa4*, the variant should be U+982D U+9AEE -*tou2 fa4* but not U+982D U+767C *tou2 fa1*. - -c. Lexemic variants - -Lexemic variants refer to variants that can be generated when language -is considered, by word-by-word substitution. - -An example in English would be cab, taxi, or taxicab. - -An example in Chinese would be U+8CC7 U+8A0A *zi1 xun4* or -U+4FE1 U+606F *xin4 xi1*. - -Note that there is no relationship between U+8CC7 and U+4FE1 or U+8A0A -and U+606F, i.e., the sequence U+8CC7 U+606F *zi1 xi1* does not -exist in Chinese. - -d. Contextual variants - -Contextual variants refer to variants that are generated by word-by- -word substitutions with context considered. - -In English, the word "plane" has different meanings and could be -replaced by with different equivalent words (synonyms) such as -"airplane" or "plane" (as in a flat-surface or device for smoothing -wood) depending on context. And, of course, "plain", which is -pronounced the same way, and indistinguishable in speech-to-text -contexts such as computer input systems for the visually impaired, is a -different word entirely. - -Similarly, the word U+6587 U+4EF6 *wen2 jian4* could be either -document U+6587 U+4EF6 *wen2 jian4* or data file U+6A94 U+6848 -*dang3 an4* depending on context. - -Although domain names were designed to be identifiers without any -language context, users have not been prevented from using strings in -domain names and interpreting them as "words" or "names". It is likely -that users will do this with IDN as well. Therefore, given the added -complications of using a much broader range of characters, precautions -will be required when deploying IDN to minimize confusion and fraud. - -The intention of these guidelines is to provide advice about the -deployment of IDNs, with language consideration, but focusing only on -the category of character variants to increase the possibility of -successful resolution and reduced confusion while accepting inherent -DNS limitations. - -2. Definitions - -Unless otherwise stated, the definitions of the terms used in this -document are consistent with "Terminology Used in Internationalization -in the IETF" [I18NTERMS]. - -"FQDN" refers to a fully-qualified domain name and "domain name label" -refers to a label of a FQDN. - -RFC3066 [RFC3066] defines a system for coding and representing -languages. - -ISO/IEC 10646 is a universal multiple-octet coded character set that is -a product of ISO/IEC JTC1/SC2/WG2, Work Item JTC1.02.18 (ISO/IEC 10646). -It is a multi-part standard: Part 1, published as ISO/IEC 10646- -1:2000(E) covering the Architecture and Basic Multilingual Plane; Part -2, published as ISO/IEC 10646-2:2001(E) covers the supplementary -(additional) planes. - -The Unicode Consortium publishes "The Unicode Standard -- Version 3.0", -ISBN 0-201-61633-5. In March 2002, Unicode Consortium published Unicode -Standard Annex #28. That annex defines Version 3.2 of The Unicode -Standard, which is fully synchronized with ISO/IEC 10646-1:2000 (with -Amendment 1). - -The term "Unicode character" is used here to refer to characters chosen -from The Unicode Standard Version 3.2 (and hence from ISO/IEC 10646). -In this document, the characters are identified by their positions (or -"code points"). The notation U+12AB, for example, indicates the -character at the position 12AB (hexadecimal) in the Unicode 3.2 table. - -Similarly, "Unicode string" refers to a string of Unicode characters. -The Unicode string is identify by the sequence of the Unicode -characters regardless of the encoding scheme. - -The term "IDN" is often used to refer to many different things: (a) an -abbreviation for "Internationalized Domain Name" (b) a fully-qualified -domain name that contains at least one label that contains characters -not appearing in ASCII (c) a label of a domain name that contains at -least one character beyond ASCII (d) a Unicode string to be processed -by Nameprep (e) an IDN Package (in this document context) (f) a -Nameprep processed string (g) a Nameprep and Punycode processed string -(h) the IETF IDN Working Group (g) ICANN IDN Committee (h) other IDN -activities in other companies/organizations etc. - -Because of the potential confusion, this document shall use the term -"IDN" as an abbreviation for "Internationalized Domain Name" only. - -And also, this document provides a guideline to be applied on a per -zone basis, one label at a time, the term "Internationalized Domain -Name Label" or "IDL" will be used instead. - -In this document, the term "registration" refers to the process by -which a potential domain name holder requests that a label be placed in -the DNS, either as an individual name within a domain or as a sub- -domain delegation from another domain name holder. A successful -registration would then lead to the label or delegation records being -placed in the relevant zone file. The guidelines presented here are -recommended for all zones, at any hierarchy level, in which CJK -characters are to appear, not just domains at the first or second level. - -CJK characters are characters commonly used in Chinese, Japanese or -Korean language including but not limited to ASCII (U+0020 to U+007F, -Han Ideograph (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo -(U+3100 to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF), Jamo -(U+1100 to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF and -U+3130 to U+318F) and its respective compatibility forms. - -3. Administrative Framework - -Zone administrators are responsible for the administration of the -domain name labels under their control. A zone administrator might be -responsible for a large zone such as a Top Level Domain (TLD), generic -or country code, or a smaller one such as a typical second or third -level domain. A large zone would often be more complex then a smaller -one (sometimes it is just larger). However, normally, actual technical -administrative tasks -- such as addition, deletion, delegation and -transfer of zones between domain name holders -- are similar for all -zones. - -At the same time, different zones may have different policies and -processes. For example, a pay-per-domain policy and registry/registrar -model for .COM may not be applicable to such domains as .SG or .IBM.COM. -The latter, for example, has very restricted policies about who is -permitted to have a domain name label under IBM.COM, the types of -string that are permitted, and different procedures for obtaining those -string. - -This document only provides guidelines for how CJK characters should be -handled within a zone, how language issues should be considered and -incorporated, and how domain name labels containing CJK characters -should be administered (including registration, deletion and transfer -of labels). It does not provide any guidance for handling of non-CKJ -characters or languages in zones. - -Other IDN policies, as the creation of new TLDs, or the cost structure -for registrations, are outside the scope of this document. Such -discussions should be conducted in forums outside the IETF as well. - -Technical implementation issues are not discussed here either. For -example, the decision as to whether various of the guidelines should be -implemented as registry or registrar actions is left to zone -administrators, possibly differing from zone to zone. - -3.1. Principles underlying these Guidelines - -In many places, this document would assumes "First-Come-First-Serve" -(FCFS) as a conflict policy in the event of a dispute although FCFS is -not listed as one of the principles. If other policies dominate -priorities and "rights", one can use these guidelines by replacing uses -of FCFS in this document by appropriate other policy rules specific to -the zone. In other cases, some of these guidelines may not be -applicable although, some alternatives for determining rights to labels --- such as use of UDRP or mutual exclusion -- might have little impact -on other aspects of these guidelines. - -(a) Each IDL to be registered should be associated with one or more -languages. - -Although some Unicode strings may be pure identifiers made up of an -assortment of characters from many languages and scripts, IDLs are -likely to be names or phrases that have certain meaning in some -language. While a zone administration might or might not require -"meaning" as a registration criterion, the possibility of meaning -provides a useful tool when trying to avoid user confusion. - -Zone administrators should administratively associate one or more -language with each IDL. These associations should either be pre- -determined by the zone administrator and applied to the entire zone or -chosen by the registrants on a per-IDL basis. The latter may be -necessary for some zones, but will make administration more difficult -and will increase the likelihood of conflicts in variant forms. - -A given zone might have multiple languages associated with it, or have -no language specified at all, but doing so may provide additional -opportunities for user confusion, and is therefore not recommended. - -The zone administrator must also verify the validity of the IDL -requested by using information associated with the chosen language and -possibly other rules as appropriate. - -(b) When an IDL is registered, all of the character variants for the -associated language(s) should be reserved for the registrant. Each -language associated with the IDL will lead to different character -variants. - -IDL reservations of the type described here normally do not appear in -the distributed DNS zone file. In other words, these reserved IDLs do -not resolve. Domain name holders could request these reserved IDLs to -be placed in the zone file and made active and resolvable as, e.g., -aliases or synonyms. - -Since different languages may imply different sets of variants, the -IDLs reserved for one IDL may overlap those reserved for another. In -this case, the reserved IDLs should be bound to one registration or the -other, or excluded from both, according to the applicable registration -or dispute resolution policy for the zone. - -(c) For a given base language, the IDL may have one or more recommended -variants that should be suggested to the domain name holder for active -registration as synonyms. - -Some language rules may prefer certain variants over others. To -increase the likelihood of correct and predictable resolution of the -IDL by end-users, the recommended variants should be active. - -(d) The IDL and its reserved variants with the language(s) association -must be atomic. - -The IDL and its reserved variants for the associated language(s) are to -be considered as a single unit -- an "IDL Package". For a given IDL, -that IDL package is defined by these guidelines and created upon -registration. - -The IDL Package is atomic: Transfer and deletion of IDL are performed -on the IDL Package as a whole. IDL, either active or reserved, within -the IDL Package must not be transferred or deleted individually. I.e., -any re-registration, transfers, or other actions that impact the IDL -should also impact the reserved variants. Separate registration or -other actions for the variants are not possible if these guidelines are -to accomplish their purpose. - -Conflict policy of the zone may result in violation of the IDL Package -atomicity. In such case, the conflict policy would take precedence. - -3.2. Registration of IDL - -Conforming to the principles described in 3.1, the registration of an -IDL would require at least two components, i.e., the character variant -tables for the language and the registration algorithm. - -3.2.1. Language character variant table - -Any lines starting with, or portions of lines after, the hash -symbol("#") are treated as comments. Comments have no significance in -the processing of the tables, nor are there any syntax requirements -between the hash symbol and the end of the line. Blank lines in the -tables are ignored completely. - -Every language should have a character variant table provided by a -relevant group (or organization or other body) and based on established -standards. The group that defines a particular character variant table -should document references to the appropriate standards in beginning of -table, tagged with the word "Reference" followed by an integer (the -reference number) followed by the description of the reference. For -example, - -Reference 1 CP936 (commonly known as GBK) -Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt -Reference 3 List of Simplified character Table (Simplified column) -Reference 4 zSimpVariant in Unihan.txt -Reference 5 variant that exists in GB2312, common simplified hanzi - -Each language character variant table must have a version number. This -is tagged with the word "Version" followed by an integer then followed -by the date in the format YYYYMMDD, where YYYY is the 4 digit Year, MM -is the 2 digit Month and DD is the 2 digit Day of the publication date -of the table - -Version 1 20020701 # July 2002 Version 1 - -The table has three fields, separated by semicolons. The fields are: -"valid code point"; "recommended variant(s)"; and "character -variant(s)". - -Only code points listed in the "valid code point" field are allowed to -be registered as part of a IDL associated with that language. - -There can be one or more "recommended variant(s)" (i.e., entries in the -"recommended variant(s)" column). If the "recommended variant(s)" -column is empty, then there is no corresponding variant. - -The "character variant(s)" column contains all variants of the code -point, including but not limited to the code point itself and the -"recommended variant(s)". - -If the variant is composed of a sequence of code points, then sequence -of code points is listed separated by a space in the "recommended -variant(s)" or "character variant(s)". - -If there are multiple variants, each variant must be separated by a -comma in the "recommended variant(s)" or "character variant(s)". - -Any code point listed in the "recommended variant(s)" column must be -allowed, by the rules for the relevant language, to be registered. -However, this is not a requirement for the entries in the "character -variant(s)" column; it is possible that some of those entries may not -be allowed to be registered. - -Every code point in the table should have a corresponding reference -number (associated with the references) specified to justify the entry. -The reference number is placed in parentheses after the code point. If -there is more than one reference, then the numbers are placed within a -single set of parentheses and separated by commas. - -3.2.2. Formal syntax - -This section uses the IETF "ABNF" metalanguage [ABNF] - -LanguageCharacterVariantTable = 1*ReferenceLine VersionLine 1*EntryLine -ReferenceLine = "Reference" SP RefNo SP RefDesciption [ Comment ] CRLF -RefNo = 1*DIGIT -RefDesciption = *[VCHAR] -VersionLine = "Version" SP VersionNo SP VersionDate [ Comment ] CRLF -VersionNo = 1*DIGIT -VersionDate = YYYYMMDD -EntryLine = VariantEntry/Comment CRLF -VariantEntry = ValidCodePoint [ "(" RefList ") ] ;" RecommendedVariant -";" CharacterVariant [ Comment ] -ValidCodePoint = CodePoint -RefList = RefNo 0*( "," RefNo ) -RecommendedVariant = CodePointSet 0*( "," CodePointSet ) -CharacterVariant = CodePointSet 0*( "," CodePointSet ) -CodePointSet = CodePoint 0* ( SP CodePoint ) -CodePoint = 4DIGIT [DIGIT] [DIGIT] -Comment = "#" *VCHAR - -YYYYMMDD is an integer representing a date where YYYY is the 4 digit -year, MM is the 2 digit month and DD is the 2 digit day. - -3.2.3. Registration Algorithm - -(An explanation of these steps follows them) - -1. IN <= IDL to be registered and - {L} <= Set of languages associated with IN -2. {V} <= Set of version numbers of the language character - variant tables derived from {L} -3. NP(IN) <= Nameprep processed IN and - check availability of NP(IN). - If not available, route to conflict policy. -4. For each AL in {L} -4.1. Check validity of NP(IN) in AL. If failed, stop processing. -4.2. PV(IN,AL) <= Set of available Nameprep processed recommended - variants of NP(IN) in AL -4.3. RV(IN,AL) <= Set of available Nameprep processed character - variants of NP(IN) in AL -4.4. End of Loop -5. {PV} <= Set of all PV(IN,AL) with optional processing. -6. {ZV} <= {PV} set-union NP(IN) -7. {RV} <= Set of all RV(IN,AL) set-minus {ZV} -8. Create IDL Package for IN using IN, {L}, {V}, {ZV} and {RV} -9. Put {ZV} into zone file - -Explanation - -Step 1 takes the IDL to be registered and the associated language(s) as -input to the process. - -Step 2 extract the set of version numbers of the associated language(s) -tables. - -Step 3 Nameprep processed the IDL. If the Nameprep processed IDL is -already registered or reserved, then the conflict policy is applied -here. For example, if FCFS is used, the registration process would stop -here. - -Step 4 goes through all languages associated with the proposed IDL, -checks for validity in each language, and generates the recommended -variants and the reserved variants. - -In step 4.1, IDL validation is done by checking that every code point -in the Nameprep processed IDL is a code point allowed by the "valid -code point" column of the character variant table for the language. If -one or more code points are invalid, the registration process must stop -here. - -Step 4.2 generates the list of recommended variants of the IDL by doing -a combination of all possible variants listed in "recommend variant(s)" -column for each code point in the Nameprep processed IDL. Generated -variants must be processed with Nameprep. If any of the recommended -variants of the IDL is registered or reserved, then the conflict policy -will be applied although this does not prevent the IDL from being -registered. For example, if FCFS is used, then the conflicting -variant(s) will be removed from the list. - -Step 4.3 generates the list of reserved variants by doing a combination -of all the possible variants listed in "character variant(s)" column -for each code point in the Nameprep processed IDL. Generated variants -must be Nameprep processed. If any of the variants are registered or -reserved, then the conflict policy will apply here although this does -not prevent the IDL from being registered. For example, if FCFS is -used, then the conflict variants will be removed from the list. - -The "combination" in Step 4.2 and Step 4.3 could achieve by a recursive -function similar to the following pseudo code: - -Function Combination(Str) - F <= first codepoint of Str - SStr <= Substring of Str, without the first code point - NSC <= {} - - If SStr is empty Then - For each V in (Variants of code point F) - NSC = NSC set-union (the string with the code point V) - End of Loop - Else - SubCom = Combination(SStr) - For each V in (Variants of code point F) - For each SC in SubCom - NSC = NSC set-union (the string with the - first code point V followed by the string SC) - End of Loop - End of Loop - Endif - - Return NSC - - -Step 5 generates the list of all recommended variants for all language. -Optionally, the algorithm may reduce the list of recommended variants -by prompting the user to select the recommended variants. - -Step 6 generates the list of variants including the Nameprep processed -IDL which to be activated and Step 7 generates the list of reserved -variants. - -Then an "IDL Package" for IDL is created in Step 8 with the original -IDL, the associated language(s), all the list of activated IDLs and the -list of variants. The version numbers of the language character -variants tables are also stored in the IDL Package. - -Lastly, the activated IDLs are converted using ToASCII [IDNA] with -UseSTD13ASCIIRules on and then put into the zone file. If the IDL is a -subdomain name, it will be delegated. The activated IDLs may be -delegated to a different domain name server so long it is owned by the -same domain name holder. - -3.3. Deletion and Transfer of IDL and IDL Package - -In normal domain administration, every domain name label is independent -of all other domain name labels. Registration, deletion and transfer -of domain name labels is done on a per domain name label basis. -Depending on the zone's administrative policies, aliases (e.g., "CNAME" -entries) may be bound to particular labels with rules about whether one -can be changed without the other. Current policies in gTLDs generally -prohibit registration of such aliases, in part to avoid needing to form -and enforce policies about these change (or binding) rules. - -However, with internationalization, each IDL is bound to a list of -variant IDLs (with the list depending on the associated language), -bound together in an IDL Package. - -Because all variants of the IDL should belong to a single domain name -holder, the IDL Package should be treated as a single entity. -Individual IDL, either active or reserved, within the IDL Package must -not be deleted or transferred independently of the other IDLs. -Specifically, if an IDL is to be deleted or transferred, that action -must be taken only as part of an action that affects the entire IDL -Package. - -If the local conflict policy requires IDL to be transferred and deleted -independently of the IDL Package, the conflict policy would take -precedence. In such event, the conflict policy should be associated -with a transfer or delete procedure taking IDL Package into -consideration. - -When an IDL Package is deleted, all the active and reserved variants -would be available again. IDL Package deletion does not change any -other IDL Packages, including IDL Packages that have variants that -conflict with the variants in the deleted IDL Package. This is to be -consistent with the atomicity and predictability of the IDL Package. - -3.4. Activation and De-activation of IDL variants - -As there are active IDLs and inactive IDLs within an IDL Package, -processes are required to activate or de-activate IDL variants in an -IDL Package. - -The activation algorithm is described below: - -1. IN <= IDL to be activated & PA <= IDL Package -2. NP(IN) <= Nameprep processed IN -3. If NP(IN) not in {RV} then stop -4. {RV} <= {RV} set-minus NP(IN) and {ZV} <= {ZV} set-union NP(IN) -5. Put {ZV} into the zone file - -Similarly, the deactivation algorithm: -1. IN <= IDL to be deactivated & PA <= IDL Package -2. NP(IN) <= Nameprep processed IN -3. If NP(IN) not in {ZV} then stop -4. {RV} <= {RV} set-union NP(IN) and {ZV} <= {ZV} set-minus NP(IN) -5. Put {ZV} into the zone file - -3.5. Adding/Deleting language(s) association - -The list of variants is generated from the IDL and tables for the -associated languages. If the language associations are changed, then -the lists of variants have to be updated. On the other hand, the IDL -Package is atomic and the list of variants must not be changed after -creation. - -Therefore, this document recommends deleting the IDL Package followed -by a registration with the new set of languages rather than attempting -to add or delete language(s) association within the IDL Package. Zone -administrators may find it desirable to devise procedures to prevent -other parties from capturing the labels in the IDL Package during these -operations. - -3.6. Versioning of the language character variant tables - -Language character variants tables are subjected to changes over time -and the changes may or may not be backward compatible. It is possible -that different version of the language character variants tables may -produce a different set of recommended variants and reserved variants. - -New IDL Packages should use the latest version of the language -character variants tables. - -Existing IDL Packages created using previous version of language -character variants tables are not affected when there a new version of -the character variants table is released. - -4. Example of Guideline Adoption - -To provide a meaningful example, some language character variant tables -have to be defined. Assume, then, that the following four language -character variants tables are defined (note that these tables are not a -representation of the actual table and they do not contain sufficient -entries to be used in any actual implementation): - -a) language character variants tables for zh-cn and zh-sg - -Reference 1 CP936 (commonly known as GBK) -Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt -Reference 3 List of Simplified character Table (Simplified column) -Reference 4 zSimpVariant in Unihan.txt -Reference 5 variant that exists in GB2312, common simplified hanzi - -Version 1 20020701 # July 2002 - -56E2(1);56E2(5);5718(2) # sphere, ball, circle; mass, lump -5718(1);56E2(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump -60F3(1);60F3(5); # think, speculate, plan, consider -654E(1);6559(5);6559(2) # teach -6559(1);6559(5);654E(2) # teach, class -6DF8(1);6E05(5);6E05(2) # clear -6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful -771E(1);771F(5);771F(2) # real, actual, true, genuine -771F(1);771F(5);771E(2) # real, actual, true, genuine -8054(1);8054(3);806F(2) # connect, join; associate, ally -806F(1);8054(3);8054(2),8068(2) # connect, join; associate, ally -96C6(1);96C6(5); # assemble, collect together - - -b) language variants table for zh-tw - -Reference 1 CP950 (commonly known as BIG5) -Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt -Reference 3 List of Simplified Character Table (Traditional column) -Reference 4 zTradVariant in Unihan.txt - -Version 1 20020701 # July 2002 - -5718(1);5718(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump -60F3(1);60F3(1); # think, speculate, plan, consider -6559(1);6559(1);654E(2) # teach, class -6E05(1);6E05(1);6DF8(2) # clear, pure, clean; peaceful -771F(1);771F(1);771E(2) # real, actual, true, genuine -806F(1);806F(3);8054(2),8068(2) # connect, join; associate, ally -96C6(1);96C6(1); # assemble, collect together - -c) language variants table for ja - -Reference 1 CP932 (commonly known as Shift-JIS) -Reference 2 zVariant in Unihan.txt -Reference 3 variant that exists in JIS X0208, commonly used Kanji - -Version 1 20020701 # July 2002 - -5718(1);5718(3);56E3(2) # sphere, ball, circle; mass, lump -60F3(1);60F3(3); # think, speculate, plan, consider -654E(1);6559(3);6559(2) # teach -6559(1);6559(3);654E(2) # teach, class -6DF8(1);6E05(3);6E05(2) # clear -6E05(1);6E05(3);6DF8(2) # clear, pure, clean; peaceful -771E(1);771E(1);771F(2) # real, actual, true, genuine -771F(1);771F(1);771E(2) # real, actual, true, genuine -806F(1);806F(1);8068(2) # connect, join; associate, ally -96C6(1);96C6(3); # assemble, collect together - -d) language variants table for ko - -Reference 1 CP949 (commonly known as EUC-KR) -Reference 2 zVariant in Unihan.txt - -Version 1 20020701 # July 2002 - -5718(1);56E2(1);56E3(2) # sphere, ball, circle; mass, lump -60F3(1);60F3(1); # think, speculate, plan, consider -654E(1);6559(1);6559(2) # teach -6DF8(1);6E05(1);6E05(2) # clear -771E(1);771F(1);771F(2) # real, actual, true, genuine -806F(1);8054(1);8068(2) # connect, join; associate, ally -96C6(1);96C6(1); # assemble, collect together - -Example 1: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* - {L} = {zh-cn, zh-sg, zh-tw} - -NP(IN) = (U+6E05 U+771F U+6559) -PV(IN,zh-cn) = (U+6E05 U+771F U+6559) -PV(IN,zh-sg) = (U+6E05 U+771F U+6559) -PV(IN,zh-tw) = (U+6E05 U+771F U+6559) -{ZV} = {(U+6E05 U+771F U+6559)} -{RV} = {(U+6E05 U+771E U+6559), - (U+6E05 U+771E U+654E), - (U+6E05 U+771F U+654E), - (U+6DF8 U+771E U+6559), - (U+6DF8 U+771E U+654E), - (U+6DF8 U+771F U+6559), - (U+6DF8 U+771F U+654E)} - -Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* - {L} = {ja} - -NP(IN) = (U+6E05 U+771F U+6559) -PV(IN,ja) = (U+6E05 U+771F U+6559) -{ZV} = {(U+6E05 U+771F U+6559)} -{RV} = {(U+6E05 U+771E U+6559), - (U+6E05 U+771E U+654E), - (U+6E05 U+771F U+654E), - (U+6DF8 U+771E U+6559), - (U+6DF8 U+771E U+654E), - (U+6DF8 U+771F U+6559), - (U+6DF8 U+771F U+654E)} - -Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* - {L} = {zh-cn, zh-sg, zh-tw, ja, ko} - -NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* -Invalid registration because U+6E05 is invalid in L = ko - -Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718) - *lian2 xiang3 ji2 tuan2* - {L} = {zh-cn, zh-sg, zh-tw} - -NP(IN) = (U+806F U+60F3 U+96C6 U+5718) -PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) -PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) -PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718) -{ZV} = {(U+8054 U+60F3 U+96C6 U+56E2), - (U+806F U+60F3 U+96C6 U+5718)} -{RV} = {(U+8054 U+60F3 U+96C6 U+56E3), - (U+8054 U+60F3 U+96C6 U+5718), - (U+806F U+60F3 U+96C6 U+56E2), - (U+806f U+60F3 U+96C6 U+56E3), - (U+8068 U+60F3 U+96C6 U+56E2), - (U+8068 U+60F3 U+96C6 U+56E3), - (U+8068 U+60F3 U+96C6 U+5718) - -Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2) - *lian2 xiang3 ji2 tuan2* - {L} = {zh-cn, zh-sg} - -NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) -PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) -PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) -{ZV} = {(U+8054 U+60F3 U+96C6 U+56E2)} -{RV} = {(U+8054 U+60F3 U+96C6 U+56E3), - (U+8054 U+60F3 U+96C6 U+5718), - (U+806F U+60F3 U+96C6 U+56E2), - (U+806f U+60F3 U+96C6 U+56E3), - (U+806F U+60F3 U+96C6 U+5718), - (U+8068 U+60F3 U+96C6 U+56E2), - (U+8068 U+60F3 U+96C6 U+56E3), - (U+8068 U+60F3 U+96C6 U+5718)} - -Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2) - *lian2 xiang3 ji2 tuan2* - {L} = {zh-cn, zh-sg, zh-tw} - -NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) -Invalid registration because U+8054 is invalid in L = zh-tw - -Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718) - *lian2 xiang3 ji2 tuan2* - {L} = {ja,ko} - -NP(IN) = (U+806F U+60F3 U+96C6 U+5718) -PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718) -PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718) -{ZV} = {(U+806F U+60F3 U+96C6 U+5718)} -{RV} = {(U+806F U+60F3 U+96C6 U+56E3), - (U+8068 U+60F3 U+96C6 U+5718), - (U+8068 U+60F3 U+96C6 U+56E3)} - -i. Notes - -1. The terms "i18n" and "l10n", sometimes used in upper-case form (i.e., -"I18N" and "L10N"), have become popular in international standards -usage as abbreviations for "internationalization" and "localization", -respectively. The abbreviations were derived by using the first and -last letters of the words, with the number of characters that appear -between them. I.e., in "internationalization", there are 18 characters -between the initial "i" and the terminal "n". - -2. Every human language is unique and therefore, every linguistic and -localization issue is also unique. It is difficult or impossible to -make comparisons across multiple languages or to classify them into -categories. And any cross-language analogies are, by their very nature, -imperfect at best. - -For example, to classify Traditional Chinese/Simplified Chinese as -upper/lower case makes as much sense as to classify TC/SC as "spelling -variant" like "color" and "colour". Both comparisons are potentially -useful but neither is completely correct. - -3. The variants in CJK are very complex and require many different -layers of solution. This guideline is a one of the solution components, -but not sufficient, by itself, to solve the whole problem. - -ii. Acknowledgements - -The authors gratefully acknowledge the contributions of: - -V.CHEN, N.HSU, H.HOTTA, S.TASHIRO, Y.YONEYA and other Joint Engineering -Team members at the JET meeting in Bangkok. - -Yves Arrouye, an observer at the JET meeting, for his contribution on -the IDL Package. - -Soobok LEE -L.M TSENG -Patrik FALTSTROM -Paul HOFFMAN -Erin CHEN -LEE Xiaodong -Harald ALVESTRAND - -iii. Author(s) - -James SENG -PSB Certification -3 Science Park Drive -#03-12 PSB Annex -Singapore 118233 -Phone: +65 6885-1657 -Email: jseng@pobox.org.sg - -Kazunori KONISHI -JPNIC -Kokusai-Kougyou-Kanda Bldg 6F -2-3-4 Uchi-Kanda, Chiyoda-ku -Tokyo 101-0047 -JAPAN -Phone: +81 49-278-7313 -Email: konishi@jp.apan.net - -Kenny HUANG -TWNIC -3F, 16, Kang Hwa Street, Taipei -Taiwan -TEL : 886-2-2658-6510 -Email: huangk@alum.sinica.edu - -QIAN Hualin -CNNIC -No.6 Branch-box of No.349 Mailbox, Beijing 100080 -Peoples Republic of China -Email: Hlqian@cnnic.net.cn - -KO YangWoo -PeaceNet -Yangchun P.O. Box 81 Seoul 158-600 -Korea -Email: newcat@peacenet.or.kr - -John C KLENSIN -1770 Massachusetts Ave, No. 322 -Cambridge, MA 02140 -USA -Email: Klensin+ietf@jck.com - -iv. Appendix A - -[How to read the Han Ideograph provided in this document. -- Will -complete this section in next revision] - -v. Normative References - -[ABNF] Augmented BNF for Syntax Specifications: ABNF, RFC 2234, D. - Crocker and P. Overell, Eds., November 1997. - -[I18NTERMS] Terminology Used in Internationalization in the IETF, - draft-hoffman-i18n-terms-07.txt, September 2002, - Paul Hoffman, work in progress - -[RFC3066] Tags for the Identification of Languages, RFC3066, - Jan 2001, H. Alvestrand - -[IDNA] Internationalizing Domain Names in Applications, - draft-ietf-idn-idna, Feb 2002, Patrik Faltstrom, - Paul Hoffman, Adam M. Costella, work in progress - -[PUNYCODE] Punycode: An encoding of Unicode for use with IDNA, - draft-ietf-idn-punycode, Feb 2002, Adam M. Costello, - work in progress - -[STRINGPREP]Preparation of Internationalized Strings, - draft-hoffman-stringprep, Feb 2002, Paul Hoffman, - Marc Blanchet, work in progress - -[NAMEPREP] Nameprep: A Stringprep Profile for Internationalized - Domain Names, work in progress, draft-ietf-idn-nameprep, - Feb 2002, Paul Hoffman, Marc Blanchet, work in progress - -[UNIHAN] Unicode Han Database, Unicode Consortium - ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt - -[UNICODE] The Unicode Consortium, "The Unicode Standard -- Version - 3.0", ISBN 0-201-61633-5. Unicode Standard Annex #28, - (http://www.unicode.org/unicode/reports/tr28/) defines - Version 3.2 of The Unicode Standard. - -[ISO7098] ISO 7098;1991 Information and documentation -- Romanization - of Chinese, ISO/TC46/SC2. - -vi. Non-normative References - -[IDN-WG] IETF Internationalized Domain Names Working Group, - idn@ops.ietf.org, James Seng, Marc Blanchet. - http://www.i-d-n.net/ - -[STD13] Paul Mockapetris, "Domain names - concepts and facilities" - (RFC 1034) and "Domain names - implementation and - specification" (RFC 1035), STD 13, November 1987. - -[C2C] Pitfalls and Complexities of Chinese to Chinese Conversion, - http://www.cjk.org/cjk/c2c/c2c.pdf, Jack Halpern, Jouni - Kerman - -vii. Other Issues - -It is possible that many variants generated may have no meaning in the -associated language or languages. The intention is not to generate -meaningful "words" but to generate similar variants to be reserved. - -The language Character Variants tables are critical to the success of -the guideline. A badly designed table may either generate too many -meaningless variants or may not generate enough meaningful variants. -The principles to be used to generate the tables are not within the -scope of this document, nor are the tables themselves. - -This document recommends against registration of IDL in a particular -language until the language character variants table for that language -is available. - -Outstanding Issues - -(1) Erin suggested (if I (JcK) correctly understood her) that, if -multiple languages are associated with a given name, the recommended -variant list for a given code point be treated as the intersection of -the variant lists for each of the languages, not the union. As I -understand the current algorithm, it effectively takes the union. -Taking the intersection has the technical advantage that it would -significantly reduce the number of variant strings that must be -reserved. It also has the policy advantage of discouraging people -from registering with multiple languages if they don't need to - -otherwise, we will have everyone trying to register in all of the -possibly-relevant languages, which would make this effort a good deal -less effective than it might be. - -Taking the intersection is also consistent with a rule that appears to -exist now. As shown in Example 3, if an attempt is made to register a -name and associate it with multiple languages, it must be valid in all -of those languages or the registration attempt will fail. So we -intersect the validity criteria on a language basis, and should -probably intersect the variants. - -But that is an algorithm change, since we have to extract the variant -lists for each code point for each language, take the intersection, -and then process against that, rather than against each language in -turn. - -[JS - I disagree in taking the intersection of the set. No doubt by -doing intersection we will reduce the abuse of specifying multiple -language to increase the set of reserved variants, our goal is -precisely to reserve as much variants as possible for the domain name -holder, not vice versa. - -Suppose we have a string ABC with variants ABD ACD ABF in Chinese, ABE -ACD in Japanese and CBD ACD in Korean. - -Assuming a registrant register ABC in CJK, right now he will get the -reserved set of {ABC, ACD, ABF, ABE, CBD}. - -On the other hand, if we do intersection, this set will be reduced to -{ACD}, leaving other variants like ABF, ABE and CBD open for potential -conflict. And the only way he can protect this confusion is to -individually register ABF, ABE and CBD manually individually, -something we trying to prevent.] - -[Further explanation by Erin: - -I'm sorry maybe my previous suggestion is not clear enough. - -I mean if multiple languages are associated with a given nanme, the -range of valid code point sould be the intersection of all the -associated languages. - -But, if multiple languages are associated with a given nanme, the -recommended variants should be take the union and put into zone file. -The same, the character variant code also sould be take the union for -each of the languages.] - -(2) A note went by indicating that the plan was to drop the Han -characters from the IETF-submission version of this document. We can -post I-Ds in PDF and publish RFCs in PDF and/or Postscript, as long as -we provide ASCII. I find having the Han characters very useful, and -trust that those of you who can read them find them even more so. So -I would suggest that we hand off the pair of an ASCII document (with -the Han characters removed) and a PDF document (that looks like the -Word text we have been looking it) to the I-D editor. I've got full -Acrobat here and can presumably produce the thing if needed. - -(3) We still need to sort out the issue of whether reserving a -variant that may (in a current or future table) conflict with another -character, with the possibility of activating it is an invitation to -cybersquatting and other abuses. That isn't clear, let me try an -illustration: suppose we have a character X, with variants A, B, and C, -and a character Y, with variants D and C. Now, if Y is registered -first, then its package includes {Y*, D, C}, using the symbol "*" to -denote an active name. When X is registered, its package consists of -{X, A, B}. X's owner can't reserve or activate C, since it was -reserved to Y. But much of the reason for doing all of this work was -the concern that C can be confused with either Y or X. So doesn't -this create an opportunity for Y to threaten, or extort money from, X -by threatening to activate C? - -[JS -- The conflict of X & Y over C in this case could be resolved by -existing conflict policy. The revised guideline now makes it possible -to modify the IDL Package in the event of dispute] - -That problem gets worse, I think, if Erin's suggestion in (1) is not -adopted. And I continue to believe that the only solution that will -work is to prevent anyone from activating C. Or, more generally, at -any given time, there will be a set of language variant tables that -will be considered valid by the administrator of a particular zone. -The zone administrator would take the union of all of those tables, -using the 'valid code point' as the key as usual, and then permanently -reserve any character that appeared most than once in a variant column. -Small matter of programming. - -(4) In page 9, on the paragraph starting with "The character -variant(s) column contains ..." - -Page: 21 -This seems to be saying that the code points listed in the third -column will always be a proper superset of the union of the first and -second columns. If that is correct, it violates a fundamental -principle that I was taught about good programming and systems design --- minimization of duplication of information, since such duplicates -are error-prone. And, if I have not interpreted the intent correctly, -the text needs to be fixed. Somehow. - -[JS -- correct, it is duplicated. The duplication is bad from -system design view but it makes it 'complete' and easy to explain.] diff --git a/doc/draft/draft-jseng-idn-admin-03.txt b/doc/draft/draft-jseng-idn-admin-03.txt new file mode 100644 index 0000000000..24e66a2fdb --- /dev/null +++ b/doc/draft/draft-jseng-idn-admin-03.txt @@ -0,0 +1,1335 @@ +INTERNET DRAFT Editors: James SENG +draft-jseng-idn-admin-03.txt John C KLENSIN, Wendy RICKARD +16 June 2003 Authors: K. KONISHI +Expires December 2003 K. HUANG, H. QIAN, Y. KO + + Internationalized Domain Names Registration and Administration + Guideline for Chinese, Japanese, and Korean + +Status of This Memo + +This document is an Internet Draft and is in full conformance +with all provisions of Section 10 of RFC2026 except that the +right to produce derivative works is not granted. + + Internet Drafts are working documents of the Internet + Engineering Task Force (IETF), its areas, and its working + groups. Note that other groups may also distribute working + documents as Internet Drafts. + + Internet Drafts are draft documents valid for a maximum of + six months and may be updated, replaced, or rendered obsolete by + other documents at any time. It is inappropriate to use Internet + Drafts as reference material or to cite them other than as + "works in progress." + + The list of current Internet Drafts can be accessed at + http://www.ietf.org/ietf/1id-abstracts.txt. + + The list of Internet Draft Shadow Directories can be accessed at + http://www.ietf.org/shadow.html. + +Abstract + +Achieving internationalized access to domain names raises many complex +issues. These are associated not only with basic protocol design--such +as how names are represented on the network, compared, and converted to +appropriate forms--but also with issues and options for deployment, +transition, registration, and administration. + +The IETF Internationalized Domain Name (IDN) Working Group focused its +efforts on the development of a standards-track specification for access +to domain names in a range of scripts that is broader in scope than the +original ASCII. During its efforts, it became clear that the appearance +of characters with similar appearances and/or interpretations created +potential for confusion, as well as difficulties in deployment and +transition, and that those issues could best be addressed +administratively rather than through restrictions embedded in the +protocols. + +This document is an effort of the Joint Engineering Team (JET), a group +composed of members of CNNIC, TWNIC, KRNIC, and JPNIC as well as other +individual experts. It offers guidelines for zone administrators -- +including but not limited to registry operators and registrars -- and +information for all domain names holders on the administration of domain +names that contain characters drawn from Chinese, Japanese, and Korean +scripts. Other language groups are encouraged to develop their own +guidelines as needed, based on these guidelines if that is helpful. + +Table of Contents + +1. Introduction + +2. Definitions, Context, and Notation +2.1. Definitions and Context +2.2. Notation for Ideographs and Other Non-ASCII CJK Characters + +3. Scope of the Administrative Guidelines +3.1. Principles Underlying These Guidelines +3.2. Registration of IDL +3.2.1. Using the Language Variant Table +3.2.2. IDL Package +3.2.3. Procedure for Registering IDLs +3.3. Deletion and Transfer of IDL and IDL Package +3.4. Activation and Deactivation of IDL Variants +3.4.1. Activation Algorithm +3.4.2. Deactivation Algorithm +3.5. Managing Changes in Language Associations +3.6. Managing Changes to Language Variant Tables + +4. Examples of Guideline Use in Zones + +5. Syntax Description for the Language Variant Table +5.1 ABNF Syntax +5.2. Comments and Explanation of Syntax + +6. Security Considerations + +7. Index to Terminology + +8. Acknowledgments + +9. Authors’ Addresses + +10. Normative References + +11. Nonnormative References + +1. Introduction + +Domain names form the fundamental naming architecture of the Internet. +Countless Internet protocols and applications rely on them, not just for +stability and continuity, but also to avoid ambiguity. They were +designed to be identifiers without any language context. However, as +domain names have become visible to end users through Web URLs and +e-mail addresses, the strings in domain-name labels are being +increasingly interpreted as names, words, or phrases. It is likely that +users will do the same with languages of differing character sets--such +as Chinese, Japanese and Korean (CJK)--in which many words or concepts +are represented using short sequences of characters. + +The introduction of what are called Internationalized Domain Names (IDN) +amplifies both the difficulty of putting names into identifiers and the +confusion that exists between scripts and languages. It also affects a +number of Internet protocols and applications and creates additional +layers of complexity in terms of technical administration and services. +Given the added complications of using a much broader range of +characters than the original small ASCII subset, precautions are +necessary in the deployment of IDNs in order to minimize confusion and +fraud. + +The IETF IDN Working Group [IDN-WG] addressed the problem of handling +the encoding and decoding of Unicode strings into and out of Domain Name +System (DNS) labels with the goal that its solution would not put the +operational DNS at any risk. Its work resulted in one primary protocol +and three supporting ones, respectively: + +1. Internationalizing Host Names in Applications [IDNA] +2. Preparation of Internationalized Strings [STRINGPREP] +3. A Stringprep Profile for Internationalized Domain Names [NAMEPREP] +4. Punycode [PUNYCODE] + +IDNA--which calls on the others--normalizes and transforms strings that +are intended to be used as IDNs. In combination, the four provide the +minimum functions required for internationalization, such as performing +case mappings, eliminating character differences that would cause severe +problems, and specifying matching (equality). They also convert between +the resulting Unicode code points and an ASCII-based form that is more +suitable for storing in actual DNS labels. In this way, the IDNA +transformations improve a user’s chances of getting to the correct IDN. + +Addressing the issues around differing character sets, a primary +consideration and administrative challenge involves region-specific +definitions, interpretations, and the semantics of strings to be used in +IDNs. A Unicode string may have a specific meaning as a name, word, or +phrase in a particular language but that meaning could vary depending on +the country, region, culture, or other context in which the string is +used. It might also have different interpretations in different +languages that share some or all of the same characters. Therefore, +individual zones and zone administrators may find it necessary to impose +restrictions and procedures to reduce the likelihood of confusion--and +instabilities of reference--within their own environments. + +Over the centuries, the evolution of CJK characters--and the differences +in their use in different languages and even in different regions where +the same language is spoken--has given rise to the idea of "variants", +wherein one conceptual character can be identified with several +different Code Points in character sets for computer use. This document +provides a framework for handling such variants while minimizing the +possibility of serious user confusion in the obtaining or use of domain +names. However, the concept of variants is complex and may require many +different layers of solution, this guideline offers only one of the +solution components. It is not sufficient by itself to solve the whole +problem, even with zone-specific tables as described below. + +Additionally, because of local language or writing-system differences, +it is impossible to create universally accepted definitions for which +potential variants are the same and which are not the same. It is even +more difficult to define a technical algorithm to generate variants that +are linguistically accurate--that is, that the variant forms produced +make as much sense in the language as the originally specified forms. +It is also possible that variants generated may have no meaning in the +associated language or languages. The intention is not to generate +meaningful "words" but to generate similar variants to be reserved. So +even though the method described in this document may not always be +linguistically accurate--or need to be--it increases the chances of +getting the right variants while accepting the inherent limitations of +the DNS and the complexities of human language. + +This document outlines a model for such conventions for zones in which +labels that contain CJK characters are to be registered and a system for +implementing that model. It provides a mechanism that allows each zone +to define its own local rules for permitted characters and sequences and +the handling of IDNs and their variants. + +2. Definitions, Context, and Notation + +2.1. Definitions and Context + +This document uses a number of special terms. In this section, +definitions and explanations are grouped topically. Some readers may +prefer to skip over this material, returning, perhaps via the index to +terminology in section 7, when needed. + +2.1.1. IDN: The term "IDN" has a number of different uses: (a) as an +abbreviation for "Internationalized Domain Name"; (b) as a fully +qualified domain name that contains at least one label that contains +characters not appearing in ASCII, specifically not in the subset of +ASCII recommended for domain names (the so-called "hostname" or "LDH" +subset, see RFC1035 [STD13]); (c) as a label of a domain name that +contains at least one character beyond ASCII; (d) as a Unicode string to +be processed by Nameprep; (e) as a string that is an output from +Nameprep; (f) as a string that is the result of processing through both +Nameprep and conversion into Punycode; (g) as the abbreviation of an IDN +(more properly, IDL) Package, in the terminology of this document; (h) +as the abbreviation of the IETF IDN Working Group; (g) as the +abbreviation of the ICANN IDN Committee; and (h) as standing for other +IDN activities in other companies/organizations. + +Because of the potential confusion, this document uses the term "IDN" as +an abbreviation for Internationalized Domain Name and, specifically, in +the second sense described in (b) above. It uses "IDL," defined +immediately below, to refer to Internationalized Domain Labels. + +2.1.2. IDL: This document provides a guideline to be applied on a +per-zone basis, one label at a time. Therefore, the term +"Internationalized Domain Label" or "IDL" will be used instead of the +more general term "IDN" or its equivalents. The processing +specifications of this document may be applied, in some zones, to ASCII +characters also, if those characters are specified as valid in a +Language Variant Table (see below). Hence, in some zones, an IDL may +contain or consist entirely of "LDH" characters. + +2.1.3. FQDN: A fully qualified domain name, one that explicitly +contains all labels, including a Top-Level Domain (TLD) name. In this +context, a TLD name is one whose label appears in a nameserver record in +the root zone. The term "Domain Name Label" refers to any label of a +FQDN. + +2.1.4. Registration: In this document, the term "registration" refers +to the process by which a potential domain name holder requests that a +label be placed in the DNS either as an individual name within a domain +or as a subdomain delegation from another domain name holder. In the +case of a successful registration, the label or delegation records are +placed in the relevant zone file, or, more specifically, they are +"activated" or made "active" and additional IDLs may be reserved as part +of an "IDL Package" (see below). The guidelines presented here are +recommended for all zones--at any hierarchy level--in which CJK +characters are to appear and not just domains at the first or second +level. + +2.1.5. RFC3066: A system, widely used in the Internet, for coding and +representing names of languages. It is based on an International +Organization for Standardization (ISO) standard for coding language +names [ISO639], but expands it to provide additional precision. + +2.1.6. ISO/IEC 10646: The international standard universal +multiple-octet coded character set ("UCS") [IS10646]. The Code Point +definitions of this standard are identical to those of corresponding +versions of the Unicode standard (see below). Consequently, the +characters and their coding are often referred to as "Unicode +characters." + +2.1.7. Unicode Character: The term "Unicode character" is used here in +reference to characters chosen from the Unicode Standard Version 3.2 +[UNICODE] (and hence from ISO/IEC 10646). In this document, the +characters are identified by their positions, or "Code Points." The +notation U+12AB, for example, indicates the character at the position +12AB (hexadecimal) in the Unicode 3.2 table. For characters in +positions above FFFF—i.e., requiring more than sixteen bits to +represent--a five to eight-character string is used, such as U+112AB for +the character in position 12AB of plane 1. + +2.1.8. Unicode String: "Unicode string" refers to a string of Unicode +characters. The Unicode string is identified by the sequence of the +Unicode characters regardless of the encoding scheme. + +2.1.9. CJK Characters: CJK characters are characters commonly used in +the Chinese, Japanese, or Korean languages, including but not limited to +those defined in the Unicode Standard as ASCII (U+0020 to U+007F), Han +ideographs (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo (U+3100 +to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF), Jamo (U+1100 +to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF and U+3130 to +U+318F), and the respective compatibility forms. The particular +characters that are permitted in a given zone are specified in the +Language Variant Table(s) for that zone. + +2.1.10. Label String: A generic term referring to a string of +characters that is a candidate for registration in the DNS or such a +string, once registered. A label string may or may not be valid +according to the rules of this specification and may even be invalid for +IDNA use. The term "label", by itself, refers to a string that has been +validated and may be formatted to appear in a DNS zone file. + +2.1.11. Language Variant Table: The key mechanisms of this +specification utilize a three-column table, called a Language Variant +Table, for each language permitted to be registered in the zone. Those +columns are known, respectively, as "Valid Code Point", "Preferred +Variant", and "Character Variant", which are defined separately below. +The Language Variant Tables are critical to the success of the guideline +described in this document. However, the principles to be used to +generate the tables are not within the scope of this document and should +be worked out by each registry separately (perhaps by adopting or +adapting the work of some other registry). In this document, "Table" +and "Variant Table" are used as short forms for Language Variant Table. + +2.1.12. Valid Code Point: In a Language Variant Table, the list of Code +Points that is permitted for that language. Any other Code Points, or +any string containing them, will be rejected by this specification. The +Valid Code Point list appears as the first column of the Language +Variant Table. + +2.1.13. Preferred Variant: In a Language Variant Table, a list of Code +Points corresponding to each Valid Code Point and providing possible +substitutions for it. These substitutions are "preferred" in the sense +that the variant labels generated using them are normally registered in +the zone file, or "activated." The Preferred Code Points appear in +column 2 of the Language Variant Table. "Preferred Code Point" is used +interchangeably with this term. + +2.1.14. Character Variant: In a Language Variant Table, a second list +of Code Points corresponding to each Valid Code Point and providing +possible substitutions for it. Unlike the Preferred Variants, +substitutions based on Character Variants are normally reserved but not +actually registered (or "activated"). Character Variants appear in +column 3 of the Language Variant Table. The term "Code Point Variants" +is used interchangeably with this term. + +2.1.15. Preferred Variant Label: A label generated by use of Preferred +Variants (or Preferred Code Points). + +2.1.16. Character Variant Label: A label generated by use of Character +Variants. + +2.1.17. Zone Variant: A Preferred or Character Variant Label that is +actually to be entered (registered) into the DNS--that is, into the zone +file for the relevant zone. Zone Variants are also referred to as Zone +Variant Labels or Active (or Activated) Labels. + +2.1.18. IDL Package: A collection of IDLs as determined by these +Guidelines. All labels in the package are "reserved", meaning they +cannot be registered by anyone other than the holder of the Package. +These reserved IDLs may be "activated", meaning they are actually +entered into a zone file as a "Zone Variant". The IDL Package also +contains identification of the language(s) associated with the +registration process. The IDL and its variant labels form a single, +atomic unit. + +2.2 Notation for Ideographs and Other Non-ASCII CJK Characters. + +For purposes of clarity, particularly in regard to examples, Han +ideographs appear in several places in this document. However, they do +not appear in the ASCII version of this document. For the convenience +of readers of the ASCII version--and some readers not familiar with +recognizing and distinguishing Chinese characters--most uses of these +characters will be associated with both their Unicode Code Points and an +"asterisk tag" with its corresponding Chinese Romanization [ISO7098], +with the tone mark represented by a number from 1 to 4. Those tags have +no meaning outside this document; they are a quick visual and reading +reference to help facilitate the combinations and transformations of +characters in the guideline and table excerpts. + +3. Scope of the Administrative Guidelines + +Zone administrators are responsible for the administration of the domain +name labels under their control. A zone administrator might be +responsible for a large zone, such as a top-level domain (TLD)--whether +generic or country code--or a smaller one, such as a typical second- or +third-level domain. A large zone is often more complex than its smaller +counterpart. However, actual technical administrative tasks--such as +addition, deletion, delegation, and transfer of zones between domain +name holders--are similar for all zones. + +This document provides guidelines for the ways CJK characters should be +handled within a zone, for how language issues should be considered and +incorporated, and for how Domain Name Labels containing CJK characters +should be administered (including registration, deletion, and transfer +of labels). It does not provide any guidance for the handling of +non-CKJ characters or languages in zones. + +Other IDN policies--such as the creation of new top-level domains +(TLDs), the cost structure for registrations, and how the processes +described here get allocated between registrar and registry if the zone +makes that distinction--also are outside the scope of this document. + +Technical implementation issues are not discussed here either. For +example, deciding which guidelines should be implemented as registry +actions and which should be registrar actions is left to zone +administrators, with the possibility that it will differ from zone to +zone. + +3.1. Principles Underlying These Guidelines + +In many places, in the event of a dispute over rights to a name (or, +more accurately, DNS label string), this document assumes "first-come, +first-served" (FCFS) as a resolution policy even though FCFS is not +listed below as one of the principles for this document. If policies +are already in place governing priorities and "rights", one can use the +guidelines here by replacing uses of FCFS in this document with policies +specific to the zone. Some of the guidelines here may not be applicable +to other policies for determining rights to labels. Still other +alternatives--such as use of UDRP [WIPO-UDRP] or mutual exclusion--might +have little impact on other aspects of these guidelines. + +(a) Although some Unicode strings may be pure identifiers made up of an +assortment of characters from many languages and scripts, IDLs are +likely to be "words" or "names" or "phrases" that have specific meaning +in a language. While a zone administration might or might not require +"meaning" as a registration criterion, meaning could prove to be a +useful tool for avoiding user confusion. + + Each IDL to be registered should be associated administratively + with one or more languages. + +Language associations should either be predetermined by the zone +administrator and applied to the entire zone or be chosen by the +registrants on a per-IDL basis. The latter may be necessary for some +zones, but it will make administration more difficult and will increase +the likelihood of conflicts in variant forms. + + A given zone might have multiple languages associated with it or + it may have no language specified at all. Omitting specification + of a language may provide additional opportunities for user + confusion and is therefore NOT recommended. + +(b) Each language uses only a subset of Unicode characters. Therefore, +if an IDL is associated with a language, it is not permitted to contain +any Unicode character that is not within the valid subset for that +language. + + Each IDL to be registered must be verified against the valid subset + of Unicode for the language(s) associated with the IDL. That subset + is specified by the list of characters appearing in the first column + of the language and zone-specific tables as described later in this + document. + +If the IDL fails this test for any of its associated languages, the IDL +is not valid for registration. + +Note that this verification is not necessarily linguistically accurate, +because some languages have special rules. For example, some languages +impose restrictions on the order in which particular combinations of +characters may appear. Characters that are valid for the language--and +hence permitted by this specification--might still not form valid words +or even strings in the language. + +(c) When an IDL is associated with a language, it may have Character +Variants that depend on that language associated with it in addition to +any Preferred Variants. These variants are potential sources of +confusion with the Code Points in the original label string. +Consequently, the labels generated from them should be unavailable to +registrants of other names, words, or phrases. + + During registration, all labels generated from the Character + Variants for the associated language(s) of the IDL should be + reserved. + +IDL reservations of the type described here normally do not appear in +the distributed DNS zone file. In other words, these reserved IDLs may +not resolve. Domain name holders could request that these reserved IDLs +be placed in the zone file and made active and resolvable. + +Zones will need to establish local policies about how they are to be +made active. Specifically, many zones, especially at the top level, +have prohibited or restricted the use of "CNAME"s--DNS +aliases--especially CNAMEs that point to nameserver delegation records +(NS records). And long-term use of long-term aliases for domain +hierarchies, rather than single names ("DNAME records") are considered +problematic because of the recursion they can introduce into DNS +lookups. + +(d) When an IDL is a "name", "word", or "phrase", it will have Character +Variants depending on the associated language. Furthermore, one or more +of those Character Variants will be used more often than others for +linguistic, political, or other reasons. These more commonly used +variants are distinguished from ordinary Character Variants and are +known as Preferred Variant(s) for the particular language. + + To increase the likelihood of correct and predictable resolution of + the IDN by end users, all labels generated from the Preferred + Variants for the associated language(s) should be resolvable. + +In other words, the Preferred Variant Labels should appear in the +distributed DNS zone file. + +(e) IDLs associated with one or more languages may have a large number +of Character Variant Labels or Preferred Variant Labels. Some of these +labels may include combinations of characters that are meaningless or +invalid linguistically. It may therefore be appropriate for a zone to +adopt procedures that include only linguistically-acceptable labels in +the IDL Package. + + A zone administrator may impose additional rules and other + processing activities to limit the number of Character Variant + Labels or Preferred Variant Labels that are actually reserved or + registered. + +These additional rules and other processing activities are based on +policies and/or procedures imposed on a per-zone basis and therefore are +not within the scope of this document. Such policies or procedures +might be used, for example, to restrict the number of Preferred Variant +Labels actually reserved or to prevent certain words from being +registered at all. + +(f) There are some Character Variant Labels and Preferred Variant Labels +that are associated with each IDL. These labels are considered +"equivalent" to each another. To avoid confusion, they all should be +assigned to a single domain name holder. + + The IDL and its variant labels should be grouped together into a + single atomic unit, known in this document as an "IDL Package". + +The IDL Package is created upon registration and is atomic: Transfer and +deletion of an IDL is performed on the IDL Package as a whole. That is, +an IDL within the IDL Package may not be transferred or deleted +individually; any re-registration, transfers, or other actions that +impact the IDL should also affect the other variants. + +The name-conflict resolution policy associated with this zone could +result in a conflict with the principle of IDL Package atomicity. In +such a case, the policy must be defined to make the precedence clear. + +3.2. Registration of IDL + +To conform to the principles described in 3.1, this document introduces +two concepts: the Language Variant Table and the IDL Package. These are +described in the next two subsections, followed by a description of the +algorithm that is used to interpret the table and generate variant +labels. + +3.2.1. Using the Language Variant Table + +For each zone that uses a given language, each language should have its +own Language Variant Table. The table consists of a header section that +identifies references and version information, followed by a section +with one row for each Code Point that is valid for the language and +three columns.. + +a) The first column contains the subset of Unicode characters that is +valid to be registered ("Valid Code Point"). This is used to verify the +IDL to be registered (see 3.1b). As in the registration procedure +described later, this column is used as an index to examine characters +that appear in a proposed IDL to be processed. The collection of Valid +Code Points in the table for a particular language can be thought of as +defining the script for that language, although the normal definition of +a script would not include, for example, ASCII characters with CJK ones. + +b) The second column contains the Preferred Variant(s) of the +corresponding Unicode character in column one ("Valid Code Point"). +These variant characters are used to generate the Preferred Variant +Labels for the IDL. Those labels should be resolvable (see 3.1d). +Under normal circumstances, all of those Preferred Variant Labels will +be activated in the relevant zone file so that they will resolve when +the DNS is queried for them. + +c) The third column contains the Character Variant(s) for the +corresponding Valid Code Point. These are used to generate the +Character Variant Labels of the IDL, which are then to be reserved (see +3.1c). Registration--or activation--of labels generated from Character +Variants will normally be a registrant decision, subject to local +policy. + +Each entry in a column consists of one or more Code Points, expressed as +a numeric character number in the Unicode table and optionally followed +by a parenthetical reference. The first column--or Valid Code Point-- +may have only one Code Point specified in a given row. The other +columns may have more than one. + +Any row may be terminated with an optional comment, starting in "#". + +The formal syntax of the table and more-precise definitions of some of +its organization appear in Section 5. + +The Language Variant Table should be provided by a relevant group, +organization, or body. However, the question of who is relevant or has +the authority to create this table and the rules that define it is +beyond the scope of this document. + +3.2.2. IDL Package + +The IDL Package is created on successful registration and consists of: + +a) the IDL registered + +b) the language(s) associated with the IDL + +c) the reserved IDLs + +d) active IDLs--that is, "Zone Variant Labels" that are to appear in + the DNS zone file + +3.2.3. Procedure for Registering IDLs + +An explanation follows each step. + +Step 1. IN <= IDL to be registered and + {L} <= Set of languages associated with IN + +Start the process with the label string (prospective IDL) to be +registered and the associated language(s) as input. + +Step 2. Generate the Nameprep-processed version of the IN, applying + all mappings and canonicalization required by IDNA. + +The prospective IDL is processed by using Nameprep to apply the +normalizations and exclusions globally required to use IDNA. If the +Nameprep processing fails, then the IDL is invalid and the registration +process must stop. + +Step 2.1. NP(IN) <= Nameprep processed IN +Step 2.2. Check availability of NP(IN). + If not available, route to conflict policy. + +The Nameprep-processed IDL is then checked against the contents of the +zone file and previously created IDL Packages. If it is already +registered or reserved, then a conflict exists that must be resolved by +applying whatever policy is applicable for the zone. For example, if +FCFS is used, the registration process terminates unless the conflict +resolution policy provides another alternative. + +Step 3. Process each language. + For each language (AL} in {L} + +Step 3 goes through all languages associated with the proposed IDL and +checks each character (after Nameprep has been applied) for validity in +each of them. It then applies the Preferred Variants (column 2 values) +and the Character Variants (column 3 values) to generate candidate +labels. + +Step 3.1. Check validity of NP(IN) in AL. If failed, stop processing. + +In step 3.1, IDL validation is done by checking that every Code Point in +the Nameprep-processed IDL is a Code Point allowed by the "Valid Code +Point" column of the Character Variant Table for the language. This is +then repeated for any other languages (and hence, Language Variant +Tables) specified in the registration. If one or more Code Points are +not valid, the registration process terminates. + +Step 3.2. PV(IN,AL) <= Set of available Nameprep-processed Preferred + Variants of NP(IN) in AL + +Step 3.2 generates the list of Preferred Variant Labels of the IDL by +doing a combination (see Step 3.2A below) of all possible variants +listed in the "Preferred Variant(s)" column for each Code Point in the +Nameprep-processed IDL. The generated Preferred Variant Labels must be +processed through Nameprep. If the Nameprep processing fails for any +Preferred Variant Label (this is unlikely to occur if the Preferred +Variants [Code Points] are processed through Nameprep before being +placed in the table), then that variant label will be removed from the +list. The remaining Preferred Variant Labels in the list are then +checked to see whether they are already registered or reserved. If any +are registered or reserved, then the conflict resolution policy will +apply. In general, this will not prevent the originally requested IDL +from being registered unless the policy prevents such registration. For +example, if FCFS is applied, then the conflicting variants will be +removed from the list, but the originally requested IDL and any +remaining variants will be registered (see steps 5 and 8 below). + +Step 3.2A Generating variant labels from Variant Code Points. + +Steps 3.2 and 3.3 require that the Preferred Variants and Character +Variants be combined with the original IDL to form sets of variant +labels. Conceptually, one starts with the original, Nameprep-processed, +IDL and examines each of its characters in turn. If a character is +encountered for which there is a corresponding Preferred Variant or +Character Variant, a new variant label is produced with the Variant Code +Point substituted for the original one. If variant labels already exist +as the result of the processing of characters that appeared earlier in +the original IDL, then the substitutions are made in them as well, +resulting in additional generated variant labels. This operation is +repeated separately for the Preferred Variants (in Step 3.2) and +Character Variants (in Step 3.3). Of course, equivalent results could +be achieved by processing the original IDL’s characters in order, +building the Preferred Variant Label set and Character Variant Label set +in parallel. + +This process will sometimes generate a very large number of labels. For +example, if only two of the characters in the original IDL are +associated with Preferred Variants and if the first of those characters +has three Preferred Variants and the second has two, one ends up with 12 +variant labels to be placed in the IDL Package and, normally, in the +zone file. Repeating the process for Character Variants, if any exist, +would further increase the number of labels. And if more than one +language is specified for the original IDL, then repetition of the +process for additional languages (see step 4, below) might further +increase the size of the set. + +For illustrative purposes, the "combination" process could be achieved +by a recursive function similar to the following pseudocode: + +Function Combination(Str) + F <= first codepoint of Str + SStr <= Substring of Str, without the first code point + NSC <= {} + + If SStr is empty then + For each V in (Variants of code point F) + NSC = NSC set-union (the string with the code point V) + End of Loop + Else + SubCom = Combination(SStr) + For each V in (Variants of code point F) + For each SC in SubCom + NSC = NSC set-union (the string with the first code point V + followed by the string SC) + End of Loop + End of Loop + Endif + + Return NSC + +Step 3.3. CV(IN,AL) <= Set of available Nameprep-processed Character + Variants of NP(IN) in AL + +This step generates the list of Character Variant Labels by doing a +combination (see Step 3.2A above) of all the possible variants listed in +the "Character Variant(s)" column for each Code Point in the +Nameprep-processed original IDL. As with the Preferred Variant Labels, +the generated Character Variant Labels must be processed by, and +acceptable to, Nameprep. If the Nameprep processing fails for a +Character Variant Label, then that variant label will be removed from +the list. The remaining Character Variant Labels are then checked to be +sure they are not registered or reserved. If one or more are, then the +conflict resolution policy is applied. As with Preferred Variant +Labels, a conflict that is resolved in favor of the earlier registrant +does not, in general, prevent the IDL from being registered, nor the +remaining variants from being reserved in step 6 below. + +Step 3.4. End of Loop + +Step 4. Let PVall be the set-union of all PV(IN,AL) + +Step 4 generates the Preferred Variants Label for all languages. +In this step, and again in step 6 below, the zone administrator may +impose additional rules and processing activities to restrict the number +of Preferred (tentatively to be reserved and activated) and Character +(tentatively to be reserved) Label Variants. These additional rules and +processing activities are zone policy specific and therefore are not +specified in this document. + +Step 5. {ZV} <= PVall set-union NP(IN) + +Step 5 generates the initial Zone Variants. The set includes all +Preferred Variants for all languages and the original Nameprep-processed +IDL. Unless excluded by further processing, these Zone Variants will be +activated--that is, placed into the DNS zone. Note that the "set-union" +operation will eliminate any duplicates. + +Step 6. Let CVall be the set-union of all CV(IN,AL), set-minus {ZV} + +Step 6 generates the Reserved Label Variants (the Character Variant +Label set). These labels are normally reserved but not activated. The +set includes all Character Variant Labels for all languages, but not the +Zone Variants defined in the previous step. The set-union and set-minus +operations eliminate any duplicates. + +Step 7. Create IDL Package for IN using IN, {L}, {ZV} and CVall + +In Step 7, the "IDL Package" is created using the original IDL, the +associated language(s), the Zone Variant Labels, and the Reserved +Variant Labels. If zone-specific additional processing or filtering is +to be applied to eliminate linguistically inappropriate or other forms, +it should be applied before the IDL Package is actually assembled. + +Step 8. Put {ZV} into zone file + +The activated IDLs are converted via ToASCII with UseSTD13ASCIIRules +[IDNA] before being placed into the zone file. This conversion results +in the IDLs being in the actual IDNA ("Punycode") form used in zone +files, while the IDLs have been carried in Unicode form up to this +point. If ToASCII fails for any of the activated IDLs, that IDL must +not be placed into the zone file. If the IDL is a subdomain name, it +will be delegated. + +3.3. Deletion and Transfer of IDL and IDL Package + +In traditional domain administration, every Domain Name Label is +independent of all other Domain Name Labels. Registration, deletion, +and transfer of labels is done on a per-label basis. However, with the +guidelines discussed here, each IDL is associated with specific +languages, with all label variants--both active (zone) and reserved-- +together in an IDL Package. This quite deliberately prohibits labels +that contain sufficient mixtures of characters from different scripts +to make them impossible as words in any given language. If a zone +chooses to not impose that restriction--that is, to permit labels to +be constructed by picking characters from several different languages +and scripts--then the guidelines described here would be inappropriate. + +As stated earlier, the IDL package should be treated as a single atomic +unit and all variants of the IDL should belong to a single domain-name +holder. If the local policy related to the handling of disagreements +requires a particular IDL to be transferred and deleted independently of +the IDL Package, the conflict policy would take precedence. In such an +event, the conflict policy should include a transfer or delete procedure +that takes the nature of IDL Packages into consideration. + +When an IDL Package is deleted, all of the Zone and Reserved Label +Variants again become available. The deletion of one IDL Package does +not change any other IDL Packages. + +3.4. Activation and Deactivation of IDL variants + +Because there are active (registered) IDLs and inactive (reserved but +not registered) IDLs within an IDL package, processes are required to +activate or deactivate IDL variants within an IDL Package. + +3.4.1. Activation Algorithm + +Step 1. IN <= IDL to be activated and PA <= IDL Package + +Start with the IDL to be activated and the IDL Package of which it is a +member. + +Step 2. NP(IN) <= Nameprep processed IN + +Process the IDL through Nameprep. This step should never cause a +problem, or even a change, since all labels that become part of the IDL +Package are processed through Nameprep in Step 3.2 or 3.3 of the +Registration procedure (section 3.2.3). + +Step 3. If NP(IN) not in {RV} then stop + +Verify that the Nameprep-processed version of the IDL appears as a +still-unactivated label in the IDL Package, i.e., in the list of +Reserved Label Variants, {RV}. It might be a useful "sanity check" to +also verify that it does not already appear in the zone file. + +Step 4. {RV} <= {RV} set-minus NP(IN) and {ZV} <= {ZV} set-union NP(IN) + +Within the IDL Package, remove the Nameprep-processed version of the IDL +from the list of Reserved Label Variants and add it to the list of +active (zone) label variants. + +Step 5. Put {ZV} into the zone file + +Actually register (activate) the Zone Variant Labels. + + +3.4.2. Deactivation Algorithm + +Step 1. IN <= IDL to be deactivated and PA <= IDL Package + +As with activation, start with the IDL to be deactivated and the IDL +Package of which it is a member. + +Step 2. NP(IN) <= Nameprep processed IN + +Get the Nameprep-processed version of the name (see discussion in the +previous section). + +Step 3. If NP(IN) not in {ZV} then stop + +Verify that the Nameprep-processed version of the IDL appears as an +activated (zone) label variant in the IDL Package. It might be a useful +"sanity check" at this point to also verify that it actually appears in +the zone file. + +Step 4. {RV} <= {RV} set-union NP(IN) and {ZV} <= {ZV} set-minus NP(IN) + +Within the IDL Package, remove the Nameprep-processed version of the IDL +from the list of Active (Zone) Label Variants and add it to the list of +Reserved (but inactive) Label Variants. + +Step 5. Put {ZV} into the zone file + +3.5. Managing Changes in Language Associations + +Since the IDL package is an atomic unit and the associated list of +variants must not be changed after creation, this document does not +include a mechanism for adding and deleting language associations within +the IDL package. Instead, it recommends deleting the IDL package +entirely, followed by a registration with the new set of languages. +Zone administrators may find it desirable to devise procedures that +prevent other parties from capturing the labels in the IDL Package +during these operations. + +3.6. Managing Changes to the Language Variant Tables + +Language Variant Tables are subject to changes over time, and these +changes may or may not be backward compatible. It is possible that +updated Language Variant Tables may produce a different set of Preferred +Variants and Reserved Variants. + +In order to preserve the atomicity of the IDL Package, when the Language +Variant Table is changed, IDL Packages created using the previous +version of the Language Variant Table must not be updated or affected. + +4. Examples of Guideline Use in Zones + +To provide a meaningful example, some Language Variant Tables must be +defined. Assume, then, for the purpose of giving examples, that the +following four Language Variant Tables are defined: + +Note: these tables are not a representation of the actual tables, and +they do not contain sufficient entries to be used in any actual +implementation. + +a) Language Variant Table for zh-cn and zh-sg + +Reference 1 CP936 (commonly known as GBK) +Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt +Reference 3 List of Simplified character Table (Simplified column) +Reference 4 zSimpVariant in Unihan.txt +Reference 5 variant that exists in GB2312, common simplified hanzi + +Version 1 20020701 # July 2002 + +56E2(1);56E2(5);5718(2) # sphere, ball, circle; mass, lump +5718(1);56E2(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump +60F3(1);60F3(5); # think, speculate, plan, consider +654E(1);6559(5);6559(2) # teach +6559(1);6559(5);654E(2) # teach, class +6DF8(1);6E05(5);6E05(2) # clear +6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful +771E(1);771F(5);771F(2) # real, actual, true, genuine +771F(1);771F(5);771E(2) # real, actual, true, genuine +8054(1);8054(3);806F(2) # connect, join; associate, ally +806F(1);8054(3);8054(2),8068(2) # connect, join; associate, ally +96C6(1);96C6(5); # assemble, collect together + + +b) Language Variant Table for zh-tw + +Reference 1 CP950 (commonly known as BIG5) +Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt +Reference 3 List of Simplified Character Table (Traditional column) +Reference 4 zTradVariant in Unihan.txt + +Version 1 20020701 # July 2002 + +5718(1);5718(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump +60F3(1);60F3(1); # think, speculate, plan, consider +6559(1);6559(1);654E(2) # teach, class +6E05(1);6E05(1);6DF8(2) # clear, pure, clean; peaceful +771F(1);771F(1);771E(2) # real, actual, true, genuine +806F(1);806F(3);8054(2),8068(2) # connect, join; associate, ally +96C6(1);96C6(1); # assemble, collect together + +c) Language Variant Table for ja + +Reference 1 CP932 (commonly known as Shift-JIS) +Reference 2 zVariant in Unihan.txt +Reference 3 variant that exists in JIS X0208, commonly used Kanji + +Version 1 20020701 # July 2002 + +5718(1);5718(3);56E3(2) # sphere, ball, circle; mass, lump +60F3(1);60F3(3); # think, speculate, plan, consider +654E(1);6559(3);6559(2) # teach +6559(1);6559(3);654E(2) # teach, class +6DF8(1);6E05(3);6E05(2) # clear +6E05(1);6E05(3);6DF8(2) # clear, pure, clean; peaceful +771E(1);771E(1);771F(2) # real, actual, true, genuine +771F(1);771F(1);771E(2) # real, actual, true, genuine +806F(1);806F(1);8068(2) # connect, join; associate, ally +96C6(1);96C6(3); # assemble, collect together + +d) Language Variant Table for ko + +Reference 1 CP949 (commonly known as EUC-KR) +Reference 2 zVariant and K-source in Unihan.txt + +Version 1 20020701 # July 2002 + +5718(1);5718(1);56E3(2) # sphere, ball, circle; mass, lump +60F3(1);60F3(1); # think, speculate, plan, consider +654E(1);654E(1);6559(2) # teach +6DF8(1);6DF8(1);6E05(2) # clear +771E(1);771E(1);771F(2) # real, actual, true, genuine +806F(1);806F(1);8068(2) # connect, join; associate, ally +96C6(1);96C6(1); # assemble, collect together + +Example 1: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* + {L} = {zh-cn, zh-sg, zh-tw} + +NP(IN) = (U+6E05 U+771F U+6559) +PV(IN,zh-cn) = (U+6E05 U+771F U+6559) +PV(IN,zh-sg) = (U+6E05 U+771F U+6559) +PV(IN,zh-tw) = (U+6E05 U+771F U+6559) +{ZV} = (U+6E05 U+771F U+6559)} +CVall = (U+6E05 U+771E U+6559), + (U+6E05 U+771E U+654E), + (U+6E05 U+771F U+654E), + (U+6DF8 U+771E U+6559), + (U+6DF8 U+771E U+654E), + (U+6DF8 U+771F U+6559), + (U+6DF8 U+771F U+654E)} + +Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* + {L} = {ja} + +NP(IN) = (U+6E05 U+771F U+6559) +PV(IN,ja) = (U+6E05 U+771F U+6559) +{ZV} = (U+6E05 U+771F U+6559)} +CVall = (U+6E05 U+771E U+6559), + (U+6E05 U+771E U+654E), + (U+6E05 U+771F U+654E), + (U+6DF8 U+771E U+6559), + (U+6DF8 U+771E U+654E), + (U+6DF8 U+771F U+6559), + (U+6DF8 U+771F U+654E)} + +Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* + {L} = {zh-cn, zh-sg, zh-tw, ja, ko} + +NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4* +Invalid registration because U+6E05 is invalid in L = ko + +Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718) + *lian2 xiang3 ji2 tuan2* + {L} = {zh-cn, zh-sg, zh-tw} + +NP(IN) = (U+806F U+60F3 U+96C6 U+5718) +PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) +PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) +PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718) +{ZV} = (U+8054 U+60F3 U+96C6 U+56E2), + (U+806F U+60F3 U+96C6 U+5718)} +CVall = (U+8054 U+60F3 U+96C6 U+56E3), + (U+8054 U+60F3 U+96C6 U+5718), + (U+806F U+60F3 U+96C6 U+56E2), + (U+806f U+60F3 U+96C6 U+56E3), + (U+8068 U+60F3 U+96C6 U+56E2), + (U+8068 U+60F3 U+96C6 U+56E3), + (U+8068 U+60F3 U+96C6 U+5718) + +Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2) + *lian2 xiang3 ji2 tuan2* + {L} = {zh-cn, zh-sg} + +NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) +PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2) +PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2) +{ZV} = (U+8054 U+60F3 U+96C6 U+56E2)} +CVall = (U+8054 U+60F3 U+96C6 U+56E3), + (U+8054 U+60F3 U+96C6 U+5718), + (U+806F U+60F3 U+96C6 U+56E2), + (U+806f U+60F3 U+96C6 U+56E3), + (U+806F U+60F3 U+96C6 U+5718), + (U+8068 U+60F3 U+96C6 U+56E2), + (U+8068 U+60F3 U+96C6 U+56E3), + (U+8068 U+60F3 U+96C6 U+5718)} + +Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2) + *lian2 xiang3 ji2 tuan2* + {L} = {zh-cn, zh-sg, zh-tw} + +NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2) +Invalid registration because U+8054 is invalid in L = zh-tw + +Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718) + *lian2 xiang3 ji2 tuan2* + {L} = {ja,ko} + +NP(IN) = (U+806F U+60F3 U+96C6 U+5718) +PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718) +PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718) +{ZV} = (U+806F U+60F3 U+96C6 U+5718)} +CVall = (U+806F U+60F3 U+96C6 U+56E3), + (U+8068 U+60F3 U+96C6 U+5718), + (U+8068 U+60F3 U+96C6 U+56E3)} + +5. Syntax Description for the Language Variant Table + +The formal syntax for the Language Variant Table is as follows, using +the IETF "ABNF" metalanguage [ABNF]. Some comments on this syntax +appear immediately after it. + +5.1 ABNF Syntax + +LanguageVariantTable = 1*ReferenceLine VersionLine 1*EntryLine +ReferenceLine = "Reference" SP RefNo SP RefDesciption [ Comment ] CRLF +RefNo = 1*DIGIT +RefDesciption = *[VCHAR] +VersionLine = "Version" SP VersionNo SP VersionDate [ Comment ] CRLF +VersionNo = 1*DIGIT +VersionDate = YYYYMMDD +EntryLine = VariantEntry/Comment CRLF +VariantEntry = ValidCodePoint ";" + PreferredVariant ";" CharacterVariant [ Comment ] +ValidCodePoint = CodePoint +RefList = RefNo 0*( "," RefNo ) +PreferredVariant = CodePointSet 0*( "," CodePointSet ) +CharacterVariant = CodePointSet 0*( "," CodePointSet ) +CodePointSet = CodePoint 0*( SP CodePoint ) +CodePoint = 4*8DIGIT [ "(" Reflist ")" ] +Comment = "#" *VCHAR + +YYYYMMDD is an integer, in alphabetic form, representing a date, where +YYYY is the 4-digit year, MM is the 2-digit month, and DD is the 2-digit +day. + +5.2. Comments and Explanation of Syntax + +Any lines starting with, or portions of lines after, the hash +symbol("#") are treated as comments. Comments have no significance in +the processing of the tables; nor are there any syntax requirements +between the hash symbol and the end of the line. Blank lines in the +tables are ignored completely. + +Every language should have its own Language Variant Table provided by a +relevant group, organization, or other body. That table will normally +be based on some established standard or standards. The group that +defines a Language Variant Table should document references to the +appropriate standards at the beginning of the table, tagged with the +word "Reference" followed by an integer (the reference number) followed +by the description of the reference. For example: + +Reference 1 CP936 (commonly known as GBK) +Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt +Reference 3 List of Simplified Character Table (Simplified column) +Reference 4 zSimpVariant in Unihan.txt +Reference 5 Variant that exists in GB2312, common simplified Hanzi + +Each Language Variant Table must have a version number and its release +date. This is tagged with the word "Version" followed by an integer +then followed by the date in the format YYYYMMDD, where YYYY is the +4-digit year, MM is the 2-digit month, and DD is the 2-digit day of the +publication date of the table. + +Version 1 20020701 # July 2002 Version 1 + +The table has three columns, separated by semicolons: "Valid Code +Point"; "Preferred Variant(s)"; and "Character Variant(s)". + +The "Valid Code Point" is the subset of Unicode characters that are +valid to be registered. + +There can be more than one Preferred Variant; hence there could be +multiple entries in the "Preferred Variant(s)" column. If the +"Preferred Variant(s)" column is empty, then there is no corresponding +Preferred Variant; in other words, the Preferred Variant is null. +Unless local policy dictates otherwise, the procedures above will result +in only those labels that reflect the valid code point being activated +(registered) into the zone file. + +The "Character Variant(s)" column contains all Character Variants of the +Code Point. Since the Code Point is always a variant of itself, to +avoid redundancy, the Code Point is assumed to be part of the "Character +Variant(s)" and need not be repeated in the "Character Variant(s)" +column. + +If the variant in the "Preferred Variant(s)" or the "Character +Variant(s)" column is composed of a sequence of Code Points, then +sequence of Code Points is listed separated by a space. + +If there are multiple variants in the "Preferred Variant(s)" or the +"Character Variant(s)" column, then each variant is separated by a +comma. + +Any Code Point listed in the "Preferred Variant(s)" column must be +allowed by the rules for the relevant language to be registered. +However, this is not a requirement for the entries in the "Character +Variant(s)" column; it is possible that some of those entries may not be +allowed to be registered. + +Every Code Point in the table should have a corresponding reference +number (associated with the references) specified to justify the entry. +The reference number is placed in parentheses after the Code Point. If +there is more than one reference, then the numbers are placed within a +single set of parentheses and separated by commas. + +6. Security Considerations + +As discussed in the Introduction, substantially-unrestricted use of +international (non-ASCII) characters in domain name labels may cause +user confusion and invite various types of attacks. In particular, in +the case of CJK languages, an attacker has an opportunity to divert or +confuse users as a result of different characters (or, more +specifically, assigned code points) with identical or similar semantics. +These Guidelines provide a partial remedy for those risks by supplying +a framework for prohibiting inappropriate characters from being +registered at all and for permitting "variant" characters to be grouped +together and reserved, so that they can only be registered in the DNS by +the same owner. However, the system it suggests is no better or worse +than the per-zone and per-language tables whose format and use this +document specifies. Specific tables, and any additional local +processing, will reflect per-zone decisions about the balance between +risk and flexibility of registrations. And, of course, errors in +construction of those tables may significantly reduce the quality of +protection provided. + +7. Index to Terminology + +As a convenience to the reader, this section lists all of the special +terminology used in this document, with a pointer to the section in +which it is defined. + +Activated Label 2.1.17 +Activation 2.1.4 +Active Label 2.1.17 +Character Variant 2.1.14 +Character Variant Label 2.1.16 +CJK Characters 2.1.9 +Code point 2.1.7 +Code Point Variant 2.1.14 +FQDN 2.1.3 +Hostname 2.1.1 +IDL 2.1.2 +IDL Package 2.1.18 +IDN 2.1.1 +Internationalized Domain Label 2.1.2 +ISO/IEC 10646 2.1.6 +Label String 2.1.10 +Language name codes 2.1.5 +Language Variant Table 2.1.11 +LDH Subset 2.1.1 +Preferred Code Point 2.1.13 +Preferred Variant 2.1.13 +Preferred Variant Label 2.1.15 +Registration 2.1.4 +Reserved 2.1.18 +RFC3066 2.1.5 +Table 2.1.11 +UCS 2.1.6 +Unicode Character 2.1.7 +Unicode String 2.1.8 +Valid Code Point 2.1.12 +Variant Table 2.1.11 +Zone Variant 2.1.17 + + +8. Acknowledgments + +The authors gratefully acknowledge the contributions of: + +- V. CHEN, N. HSU, H. HOTTA, S. TASHIRO, Y. YONEYA, and other Joint +Engineering Team members at the JET meeting in Bangkok, Thailand. + +- Yves Arrouye, an observer at the JET meeting in Bangkok, for his +contribution on the IDL Package. + +- Those who commented on, and made suggestions about, earlier versions, +including Harald ALVESTRAND, Erin CHEN, Patrik FALTSTROM, Paul HOFFMAN, +Soobok LEE, LEE Xiaodong, MAO Wei, Erik NORDMARK, and L.M. TSENG. + +9. Authors’ Addresses + +James SENG +Infocomm Development Authority +8 Temasek Boulevard +#14-00 Suntec Tower Three +Singapore 038988 +Phone: +65 9638-7085 +E-mail: jseng@pobox.org.sg + +Kazunori KONISHI +JPNIC +Kokusai-Kougyou-Kanda Bldg 6F +2-3-4 Uchi-Kanda, Chiyoda-ku +Tokyo 101-0047 +Japan +Phone: +81 49-278-7313 +E-mail: konishi@jp.apan.net + +Kenny HUANG +TWNIC +3F, 16, Kang Hwa Street, Taipei +Taiwan +TEL : 886-2-2658-6510 +E-mail: huangk@alum.sinica.edu + +QIAN Hualin +CNNIC +No.6 Branch-box of No.349 Mailbox, Beijing 100080 +Peoples Republic of China +E-mail: Hlqian@cnnic.net.cn + +KO YangWoo +PeaceNet +Yangchun P.O. Box 81 Seoul 158-600 +Korea +E-mail: newcat@peacenet.or.kr + +John C KLENSIN +1770 Massachusetts Avenue, No. 322 +Cambridge, MA 02140 +U.S.A. +E-mail: Klensin+ietf@jck.com + +Wendy RICKARD +The Rickard Group +16 Seminary Ave +Hopewell, NJ 08525 +USA +E-mail: rickard@rickardgroup.com + +10. Normative References + +[ABNF] Augmented BNF for Syntax Specifications: ABNF, RFC 2234, D. + Crocker and P. Overell, eds., November 1997. + +[STD13] Paul Mockapetris, "Domain names--concepts and facilities" + (RFC 1034) and "Domain names--implementation and + specification" (RFC 1035), STD 13, November 1987. + + +[RFC3066] Tags for the Identification of Languages, RFC3066, + Jan 2001, H. Alvestrand. + +[IDNA] Internationalizing Domain Names in Applications (IDNA), + RFC 3490, March 2003, Patrik Faltstrom, Paul Hoffman, + Adam M. Costello. + +[PUNYCODE] Punycode: A Bootstring encoding of Unicode for + Internationalized Domain Names in Applications (IDNA), + RFC 3492, March 2003, Adam M. Costello. + +[STRINGPREP]Preparation of Internationalized Strings ("stringprep"), + RFC 3454, December 2002, P. Hoffman, M. Blanchet. + +[NAMEPREP] Nameprep: A Stringprep Profile for Internationalized + Domain Names, RFC 3491, March 2003, P. Hoffman, M. Blanchet. + +[IS10646] A product of ISO/IEC JTC1/SC2/WG2, Work Item JTC1.02.18 + (ISO/IEC 10646). It is a multipart standard: Part 1, + published as ISO/IEC 10646-1:2000(E), covers the + Architecture and Basic Multilingual Plane, and Part 2, + published as ISO/IEC 10646-2:2001(E), covers the + supplementary (additional) planes. + +[UNIHAN] Unicode Han Database, Unicode Consortium + ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt. + +[UNICODE] The Unicode Consortium, "The Unicode Standard--Version + 3.0," ISBN 0-201-61633-5. Unicode Standard Annex #28 + (http://www.unicode.org/unicode/reports/tr28/) defines + Version 3.2 of the Unicode Standard, which is definitive + for IDNA and this document. + +[ISO7098] ISO 7098;1991 Information and documentation--Romanization + of Chinese, ISO/TC46/SC2. + +11. Nonnormative References + +[IDN-WG] IETF Internationalized Domain Names Working Group, + idn@ops.ietf.org, James Seng, Marc Blanchet. + http://www.i-d-n.net/. + +[IESG-IDN] "IESG Statement on IDN", Internet Engineering Steering Group, + IETF, 11 February 2003, + http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt. + +[ISO639] "ISO 639:1988 (E/F)--Code for the representation of names + of languages"--International Organization for + Standardization, 1st edition, 1988-04-01.