From a831ffc8fec3a487183a381a6ae799116552f78b Mon Sep 17 00:00:00 2001
From: Andreas Gustafsson <source@isc.org>
Date: Thu, 15 Nov 2001 23:46:00 +0000
Subject: [PATCH] new draft

---
 doc/draft/draft-hall-dm-idns-00.txt | 2739 +++++++++++++++++++++++++++
 1 file changed, 2739 insertions(+)
 create mode 100644 doc/draft/draft-hall-dm-idns-00.txt

diff --git a/doc/draft/draft-hall-dm-idns-00.txt b/doc/draft/draft-hall-dm-idns-00.txt
new file mode 100644
index 0000000000..d3bc4b4e0d
--- /dev/null
+++ b/doc/draft/draft-hall-dm-idns-00.txt
@@ -0,0 +1,2739 @@
+
+
+  INTERNET-DRAFT                                      Eric A. Hall, Editor 
+  Document: draft-hall-dm-idns-00.txt                           Consultant 
+  Expires: May 2002                                          November 2001 
+      
+      
+                  The Internationalized Domain Name System 
+      
+      
+     Status of this Memo 
+      
+     This document is an Internet-Draft and is in full conformance with 
+     all provisions of Section 10 of RFC2026. 
+      
+     Internet-Drafts are working documents of the Internet Engineering 
+     Task Force (IETF), its areas, and its working groups. Note that 
+     other groups may also distribute working documents as Internet-
+     Drafts. 
+      
+     Internet-Drafts are draft documents valid for a maximum of six 
+     months and may be updated, replaced, or obsoleted by other 
+     documents at any time. It is inappropriate to use Internet-Drafts 
+     as reference material or to cite them other than as "work in 
+     progress." 
+      
+     The list of current Internet-Drafts can be accessed at 
+     http://www.ietf.org/ietf/1id-abstracts.txt. 
+      
+     The list of Internet-Draft Shadow Directories can be accessed at 
+     http://www.ietf.org/shadow.html. 
+      
+      
+  1.      Abstract 
+      
+     The principle intention of this specification is to facilitate the 
+     deployment of a completely internationalized domain name syntax 
+     and service which new protocols, applications and host systems can 
+     use, but without disrupting the existing infrastructure. Towards 
+     that end, this document describes a series of elective 
+     encapsulation services and protocol extensions which cumulatively 
+     allow internationalized domain names to be stored and transmitted 
+     in the existing DNS message and within application data streams, 
+     according to the compliance level of the participating systems. 
+      
+   
+   
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+      
+     Table of Contents 
+      
+     1.   Abstract..................................................1 
+     2.   Definitions and Terminology...............................3 
+     3.   Introduction..............................................4 
+       3.1.  Background.............................................4 
+       3.2.  Objectives.............................................5 
+       3.3.  Common Usage Scenarios.................................7 
+       3.4.  User Audiences.........................................9 
+       3.5.  Service Overview......................................11 
+       3.6.  Process Example.......................................13 
+     4.   The Internationalized Namespace..........................19 
+       4.1.  Internationalized Domain Names and Labels.............20 
+       4.2.  Internationalized Host Identifiers....................27 
+       4.3.  STD13 Domain Names....................................28 
+       4.4.  STD13 Host Identifiers................................29 
+     5.   Transfer Encodings and Label Types.......................30 
+       5.1.  The EDNS/UTF-8 Label Type.............................31 
+       5.2.  The STD13 Legacy Label Type...........................33 
+     6.   Application Guidelines...................................36 
+       6.1.  Input and Output Charsets.............................37 
+       6.2.  Protocol and Application Data.........................38 
+       6.3.  DNS Lookups and Resolver Calls........................40 
+     7.   Resolver Guidelines......................................42 
+       7.1.  Resolver APIs.........................................42 
+       7.2.  Query Processing Services.............................44 
+       7.3.  The Hosts Database....................................48 
+     8.   Server Guidelines........................................49 
+       8.1.  Internationalized Zones...............................50 
+       8.2.  Namespace Visibility Restrictions.....................51 
+       8.3.  The Master File Format................................52 
+     9.   Caching Guidelines.......................................53 
+     10.  Security Considerations..................................53 
+     11.  IANA Considerations......................................54 
+     12.  References...............................................54 
+     13.  Acknowledgements.........................................55 
+     14.  Editor's Address.........................................55 
+      
+   
+  Hall                    I-D Expires: May 2002               [page 2] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+      
+  2.      Definitions and Terminology 
+      
+     This document unites, enhances and clarifies several pre-existing 
+     technologies. Readers are expected to be familiar with the 
+     following specifications: 
+      
+          [AMC-ACE-Z] <draft-ietf-idn-amc-ace-z>, "AMC-ACE-Z version 
+            0.3.1" 
+      
+          [NAMEPREP] <draft-ietf-idn-nameprep>, "Preparation of 
+            Internationalized Host Names" 
+      
+          [STD13] (RFC 1034) "Domain names - concepts and facilities", 
+            (RFC 1035) "Domain names - implementation and 
+            specification" 
+      
+          [STD3] (RFC 1122) "Requirements for Internet Hosts -- 
+            Communication Layers", (RFC1123) "Requirements for Internet 
+            Hosts -- Application and Support" 
+      
+          [BCP18] (RFC 2277) "IETF Policy on Character Sets and 
+            Languages" 
+      
+          [RFC2279] "UTF-8, a transformation format of ISO 10646" 
+      
+          [RFC2671] "Extension Mechanisms for DNS (EDNS0)" 
+      
+      
+     The following abbreviations are used throughout this document: 
+      
+          UCS (Universal Character Set) “ The ISO/IEC 10646 character 
+            set repertoire, as represented by the Unicode 3.1 
+            specification. 
+      
+          ACE (ASCII-Compatible Encoding) “ A transfer encoding which 
+            encodes UCS character codes into a seven-bit codespace 
+            which is compatible with US-ASCII. 
+      
+          UTF-8 (UCS Transformation Format, Eight-Bit) “ A transfer 
+            encoding which encodes UCS characters into an eight-bit 
+            codespace which is compatible with DNS message formats. 
+      
+     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 
+     NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" 
+     in this document are to be interpreted as described in RFC 2119. 
+   
+  Hall                    I-D Expires: May 2002               [page 3] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+      
+      
+  3.      Introduction 
+      
+     The domain name system (DNS) [STD13] currently defines a message, 
+     namespace and protocol. Although the DNS message is capable of 
+     transferring eight-bit character codes as protocol data, 
+     applications are currently limited to a subset of US-ASCII when 
+     they interact with the DNS namespace, and this restricted syntax 
+     is enforced by almost every TCP/IP application and protocol which 
+     utilizes domain names as embedded data (including, surprisingly, 
+     the DNS protocol). 
+      
+     In order to allow for the use of a larger range of characters in 
+     the namespace, this document extends and clarifies a variety of 
+     Internet specifications so that characters from the Universal 
+     Character Set (UCS) [ISO10646] may be used in domain names. This 
+     document also extends the DNS message structure to allow for the 
+     use of UTF-8 [RFC2279] encoded characters for the purpose of 
+     transferring these domain names, but also provides an ASCII-
+     compatible encoding (ACE) [AMC-ACE-Z] of these character codes 
+     which existing protocols and applications can use to access the 
+     internationalized domain names, and also provides identification 
+     mechanisms which allow the end-point systems to downwardly 
+     negotiate when needed. Finally, this document defines behavior for 
+     DNS systems which implement this architecture, including the end-
+     point applications which generate and store DNS domain names, and 
+     the resolvers, caches and servers which process them. 
+      
+     The mechanisms presented here are elective. Developers, zone 
+     administrators and network operators who wish to make use of the 
+     internationalized domain names may do so according to their own 
+     schedule. Those developers, administrators and operators who 
+     cannot or prefer not to implement the specified extensions can 
+     continue to use their legacy systems, and will still be able to 
+     access resources from the internationalized domain name system. 
+      
+      
+  3.1.    Background 
+      
+     From one perspective, DNS is already an "eight-bit clean" system, 
+     in that the structured DNS message is capable of storing and 
+     transmitting eight-bit data without any additional effort. 
+     However, this perspective only considers one particular facet of 
+     the domain name system, and ignores the more critical aspect of 
+   
+  Hall                    I-D Expires: May 2002               [page 4] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     the DNS namespace, which has rules that are entirely different 
+     from those which govern the message format. 
+      
+     The DNS namespace (or more appropriately, the view of the 
+     namespace which applications use and enforce) is governed by rules 
+     set forth in RFC952 [RFC952], STD3 [STD3], and STD13, which 
+     collectively define the characters that are eligible for use with 
+     host names. These rules are meant to provide a common template 
+     which may be applied to either the DNS namespace or a local hosts 
+     database, such that a query for "host.example.com" can be 
+     processed through either system. The range of valid characters 
+     currently defined are the letters, numbers and hyphen characters 
+     from US-ASCII [ASCII] (additional rules also govern the valid 
+     order and length of a host name). Character code values outside of 
+     this range are valid in domain name messages, but are undefined 
+     when used in the namespace, and are subject to interpretation by 
+     the applications which generate them. 
+      
+     The host name rules are enforced by almost every application and 
+     protocol which uses DNS to identify a host or system. This 
+     includes network utilities such as ping and traceroute which 
+     simply identify systems by name, and complex protocols such as 
+     SMTP which use domain names to determine message-routing paths. 
+     Portions of the DNS protocol itself are also affected by these 
+     restrictions, such as the domain names which may be used for NS 
+     resource records with sub-domain delegation operations (since 
+     these servers are connection targets, they are also required to be 
+     compliant with the host name rules). 
+      
+     Because these domain names are so pervasive throughout the 
+     Internet (and even within proprietary applications that run on 
+     private networks), it is not possible to declare a "flag day" at 
+     which eight-bit domain names will be considered valid encodings of 
+     a particular character set. Instead, an extended namespace with a 
+     larger set of charset rules must be defined, an extended DNS 
+     protocol capable of supporting these domain names must be 
+     deployed, and a transitional mechanism which allows the old and 
+     new systems to interact must be established. This document 
+     attempts to meet these objectives. 
+      
+      
+  3.2.    Objectives 
+      
+     In broad terms, this document has one overall goal, which is to 
+     facilitate the creation and use of an internationalized domain 
+     name system around a UCS namespace, a collection of UTF-8 and 
+   
+  Hall                    I-D Expires: May 2002               [page 5] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     legacy-compatible encodings which are suitable for transferring 
+     internationalized domain names within DNS and the affected 
+     application data streams, and a negotiation mechanism which allows 
+     end-point systems to identify the encoding that they will use for 
+     a particular operation. 
+      
+     One of the objectives stated above is to internationalize the 
+     existing DNS namespace, by allowing UCS characters to be used in 
+     host names and sub-domain delegations in old and new zones 
+     equally. As such, this document does not define a new namespace, 
+     but instead defines mechanisms by which leaf-nodes and sub-domains 
+     may be created within the existing hierarchy. 
+      
+     UTF-8 was chosen as the primary transfer encoding of these domain 
+     names for several reasons. For one, there is a wide availability 
+     of tools and expertise surrounding UTF-8, and it is already widely 
+     deployed within development environments, operating systems and 
+     applications. Furthermore, BCP18 [BCP18] requires that new 
+     application protocols be able to use UTF-8 as application data, 
+     and for many applications, this specifically means domain names 
+     which are passed as data. All signs indicate that UTF-8 is 
+     currently and will continue to be the preferred eight-bit encoding 
+     on the Internet, and this specification embraces this position in 
+     its design. 
+      
+     However, most of the network services currently in use are bound 
+     by the legacy host naming restrictions, and those applications and 
+     protocols will also need to be able to interact with resources 
+     from the internationalized namespace, even though they will not be 
+     compliant with the UTF-8 encoding mechanisms defined in this 
+     document. In order to allow these systems to participate, this 
+     specification also embraces the use of ACE as a seven-bit 
+     backwards-compatible encoding for legacy systems to use. 
+      
+     Note that even though a single encoding could have been specified 
+     by this document, past and present requirements would not have 
+     been satisfied by a single choice. For example, supporting UTF-8 
+     alone would mean isolating legacy systems from resources in the 
+     UCS namespace, while supporting ACE alone would not have provided 
+     a truly internationalized namespace (the ACE encoded domain names 
+     still appear in user data quite frequently). By allowing the UTF-8 
+     and ACE encodings to coexist, the existing and emerging 
+     communities can both be served. 
+      
+     Because both encodings will be active during the same time period, 
+     this document also defines DNS protocol extensions which allow the 
+   
+  Hall                    I-D Expires: May 2002               [page 6] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     end-point systems to detect the encoding that is in use for a 
+     particular query/response pair. Note that these negotiation 
+     mechanisms not only allow new and legacy systems to interoperate, 
+     but they also provide a transition service for developers, zone 
+     administrators and end-users, in that ACE encoded domain names can 
+     be initially deployed within existing applications and DNS 
+     systems, while individual elements of the infrastructure can be 
+     upgraded without disturbing other components. 
+      
+      
+  3.3.    Common Usage Scenarios 
+      
+     Discussion of the mechanism provided by this document depends upon 
+     the usage context of the domain names themselves. Domain names are 
+     extremely pervasive, and are used by almost every TCP/IP protocol 
+     and application in one form or another. However, most usages fall 
+     under one or more of the following scenarios: 
+      
+        *   Connection identifiers “ Domain names are most commonly 
+            used as host-specific identifiers for outbound connection 
+            requests, whether this be for a command-line application 
+            such as ping, or as a host name which is stored in an 
+            application's configuration file. Another common usage 
+            scenario for connection identifiers is with reverse 
+            lookups, where a server is logging incoming connections by 
+            the corresponding domain name, or where a program such as 
+            netstat is displaying all of the application sessions which 
+            are currently active on a host. In both of these cases, 
+            domain names are passed through applications to a resolver, 
+            resulting in DNS queries and responses which eventually 
+            provide the requested DNS data. 
+      
+            A related use (but one which does not generate DNS 
+            messages) is determining the host name of the local system. 
+            This is commonly found with applications and protocols that 
+            need to display the domain name of the local system as part 
+            of a protocol operation (such as an SMTP greeting banner) 
+            or as application data. 
+      
+            Connection identifiers (and lookups in general) are 
+            probably the largest single use of domain names today, and 
+            this is likely to be the case with internationalized domain 
+            names as well. This document fully supports the use of 
+            internationalized domain names for lookup operations, as 
+            long as the calling application, the stub resolver, the 
+            local caching servers, and the authoritative servers for 
+   
+  Hall                    I-D Expires: May 2002               [page 7] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            the specified domain name are compliant with this 
+            specification. If any of these components are not capable 
+            of supporting internationalized domain names in this 
+            manner, the ACE equivalent domain name will be negotiated 
+            for the operation at hand. 
+      
+        *   Protocol data “ Some application protocols exchange domain 
+            names as protocol data, with those domain names either 
+            determining or altering a service-specific operation. 
+            Examples of this usage include SMTP envelopes ("RCPT TO 
+            <user@domain.dom>") where the domain name is used to 
+            determine whether or not a particular email message should 
+            be accepted for delivery, the HTTP HOST header field which 
+            identifies a specific document tree on a shared server, 
+            BOOTP/DHCP options, WHOIS input, and more. 
+      
+            Because these protocols treat domain names as protocol 
+            data, most of these protocols also have specific formatting 
+            requirements which must be addressed before UTF-8 domain 
+            names can be used by these protocols directly. This 
+            document is intended to facilitate the use of UTF-8 encoded 
+            domain names in this manner, although it is expected that 
+            most of the protocol development groups will need to 
+            develop negotiation mechanisms before these protocols can 
+            use internationalized domain names directly. Until such 
+            work is completed, ACE equivalent domain names can be used 
+            to provide these protocols with access to the 
+            internationalized namespace. 
+      
+        *   Structured application data “ Structured application data 
+            is similar to protocol data in that it can trigger or 
+            affect some protocol action, although this will not always 
+            occur. For example, a web browser can process an embedded 
+            IMG link which may be present in a web page, while a user 
+            can manually follow an embedded email link which is also 
+            stored in the same web page; even though both usage models 
+            share the same structured data format (URLs), they are 
+            processed differently by the application. Similarly, email 
+            messages typically contain multiple domain names as 
+            structured data in the message headers, and some of these 
+            domain names will directly affect subsequent protocol 
+            operations, while others will not. 
+      
+            Because of this ambiguity, this document defines no 
+            specific treatment for structured application data. In some 
+            cases, no additional mechanisms will be required, while 
+   
+  Hall                    I-D Expires: May 2002               [page 8] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            other scenarios will require negotiation mechanisms before 
+            an internationalized domain name can be used in the 
+            structured data (with ACE being required as the interim 
+            format). Each protocol development group is encouraged to 
+            analyze each usage independently, to classify the usage as 
+            a connection identifier, protocol data, or unstructured 
+            application data, and to determine the appropriate course 
+            of action for each usage accordingly. 
+      
+        *   Unstructured application data “ Many application protocols 
+            provide free-text data which can contain domain names, but 
+            with those domain names existing as unstructured data. For 
+            example, an email message which is provided as a text/plain 
+            MIME body part may contain a domain name which identifies a 
+            system or service in the context of a specific application, 
+            but in an unstructured form ("your files were moved from 
+            server1 to server2"). Similarly, an email address may be 
+            provided in WHOIS output, but as unstructured data which 
+            does not affect the protocol. 
+      
+            Given the application-specific nature of this data, it 
+            cannot be managed by any global protocol or process. Where 
+            a protocol has rules or restrictions on the data itself, 
+            then those rules are maintained, but some formatting rules 
+            may need to be extended before internationalized domain 
+            names (or their equivalents) can be encoded in the 
+            application data. For example, internationalized domain 
+            names in email messages may need to be converted to a 
+            preferred display charset, while ACE equivalents may be 
+            necessary for protocols which only support US-ASCII. 
+      
+     Each of the above scenarios represent distinct handling cases 
+     where internationalized domain names may or may not be used 
+     directly. In some cases, the internationalized domain names may be 
+     used as soon as the applications and resolvers are configured to 
+     use them, while in other cases, measured and cautious deployment 
+     is required in order to prevent undue breakage. In the latter 
+     cases, however, the backwards-compatible ACE encoding is available 
+     so that the internationalized domain names can be used. 
+      
+      
+  3.4.    User Audiences 
+      
+     Another perspective on the changes which will result from 
+     deploying the mechanisms described in this document can be seen by 
+     analyzing how any such changes will affect the different 
+   
+  Hall                    I-D Expires: May 2002               [page 9] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     "audiences" who work with domain names, and who have their own 
+     unique context-specific usage requirements and objectives. The 
+     three main audiences discussed in this document are: 
+      
+        *   Developers. Protocol and application developers need to be 
+            able to incorporate internationalized domain names into 
+            their systems as easily as possible, although there are 
+            many factors which will affect such usage, including the 
+            input and output charsets and encodings which are available 
+            to the applications and protocols. Where feasible, this 
+            specification allows developers to choose any charset or 
+            encoding which may be required and suitable for use, 
+            although in most cases, a recommendation is also made for 
+            the use of UTF-8 in particular. 
+      
+            Developers may adopt internationalized domain names for 
+            connection identifiers and lookup operations fairly 
+            quickly, such that users can use those system as soon as 
+            they have compliant systems (and they have a target domain 
+            name to communicate with). Implementing support for 
+            internationalized domain names in protocols and application 
+            data will require additional effort by the affected 
+            development groups. 
+      
+            Support for ACE will be harder to implement, since it is a 
+            relatively new and untested encoding syntax, with no 
+            existing developer tools. This will likely be the largest 
+            hurdle to overcome when developing applications for use 
+            with this service. 
+      
+        *   Zone administrators. Organizations that wish to deploy 
+            internationalized domain names should be able to do so 
+            easily, at a reasonable cost, and without suffering 
+            excessive pre-conditions. Towards this objective, the 
+            mechanisms described by this document allow organizations 
+            to deploy and use internationalized domain names within any 
+            zone immediately, without requiring any other zone to have 
+            been updated beforehand (although there are specific and 
+            strong suggestions for upgrading the Internet's high-load 
+            servers as soon as possible). 
+      
+            If an organization wishes to publish internationalized 
+            domain names for users to access and utilize, the 
+            authoritative servers for the affected zone must be 
+            compliant with the naming rules and message formats 
+            described by this document, which will almost certainly 
+   
+  Hall                    I-D Expires: May 2002              [page 10] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            require the administrators of that zone to upgrade their 
+            servers. However, organizations may also choose to only 
+            deploy ACE encoded domain names if an immediate migration 
+            is not feasible, with the caveat that internationalized 
+            domain names in their native form will not be available 
+            from those zones. 
+      
+        *   Network operators. The systems and human users which 
+            generate DNS lookups are another area of concern, as these 
+            protocols, programs and users will expect these lookups to 
+            succeed, and will also expect that the visible namespace 
+            will be compatible with the capabilities of the requesting 
+            system at a minimum investment. This is a broad range of 
+            requirements. 
+      
+            At a minimum, applications must be capable of generating 
+            and accepting the internationalized domain names if they 
+            are to use those domain names (see the "Developers" 
+            discussion above for the application requirements). 
+            Similarly, the local resolvers, caches and forwarders on 
+            the user's network must also support the message formats if 
+            they are to relay internationalized domain names between 
+            their local applications and the remote zones being 
+            queried. If the applications, resolvers and caches do not 
+            support these requirements, intermediary systems will 
+            perform the down-level negotiation automatically on their 
+            behalf such that additional effort is not required on the 
+            user's part. 
+      
+     In summary, the developers, zone administrators and end-users can 
+     immediately participate in the internationalized namespace at no 
+     additional expense if they are content with using ACE encoded 
+     domain names, and can use internationalized domain names in their 
+     native form if they are willing to make the necessary investments. 
+     Furthermore, since the native and backwards-compatible encodings 
+     are not mutually exclusive, implementers of this specification 
+     have the option of adopting ACE for immediate use and then 
+     transitioning to internationalized domain names on a per-system, 
+     per-zone, or per-application basis, according to their schedule. 
+      
+      
+  3.5.    Service Overview 
+      
+     This document specifies a variety of extensions to several 
+     different protocols and services in order to facilitate the use of 
+     internationalized domain names anywhere this support exists or can 
+   
+  Hall                    I-D Expires: May 2002              [page 11] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     be implemented, and to provide a legacy-compatible domain name in 
+     all other situations. 
+      
+     More specifically, this document defines or clarifies behavior for 
+     the following elements: 
+      
+        *   Host name character restrictions. Legacy protocols and 
+            applications are currently restricted to the legacy host 
+            naming rules, which only allow for a subset of US-ASCII 
+            characters (letters, digits and the hyphen character). This 
+            document redefines the characters which are valid within a 
+            host name so that system identifiers, domain name parts of 
+            host names, and new network services can use most of the 
+            characters from the UCS. 
+      
+        *   DNS message format. This document defines an extended label 
+            format based on the extended label services provided by 
+            RFC2671 (Extension Mechanisms for DNS - EDNS0) [RFC2671], 
+            with this label format being used to encapsulate UTF-8 
+            encoded internationalized domain names in DNS messages. Any 
+            DNS message which carries the UTF-8 encoded domain names is 
+            required to use the EDNS/UTF-8 label type defined in this 
+            document. Any DNS message which carries legacy domain names 
+            (including the ACE encoded equivalent domain names) is 
+            required to use the traditional message format. 
+      
+        *   Application handling rules. Applications can use 
+            internationalized domain names immediately for lookup 
+            operations that do not directly affect external services or 
+            protocols, and can use ACE encoding sequences to specify 
+            internationalized domain names in legacy protocol 
+            operations, and can use them both at the same time. 
+      
+        *   Stub resolvers. Stub resolvers will most likely need to 
+            provide a series of internationalized APIs in order to 
+            fully support applications that generate internationalized 
+            domain name lookups. For example, these APIs will almost 
+            certainly be required in order for the resolver to 
+            determine that the calling application is compliant with 
+            the host name requirements defined by this document, and 
+            that the domain names should be encoded in the proper label 
+            format. Although this specification does not dictate these 
+            APIs, it encourages their use, and provides some guidance 
+            on the issues surrounding their use. 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 12] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+        *   Forwarders, resolving servers and caches. The user-side 
+            servers which process internationalized domain names have 
+            several protocol-specific requirements, including the 
+            negotiated fall-back service when UTF-8 queries fail. 
+      
+        *   Authoritative servers. A key part of this specification is 
+            the simultaneous support for internationalized and legacy 
+            compatible domain names in the UCS namespace, thereby 
+            allowing a domain name to be entered into an authoritative 
+            zone database once, and for the appropriate response to be 
+            generated by a server according to the label encoding from 
+            the associated query. In order for this to work, this 
+            specification requires authoritative servers which serve 
+            internationalized domain names to comply with specific 
+            conditions. This specification also allows existing servers 
+            to serve ACE equivalent domain names when the authoritative 
+            servers cannot be upgraded, although this typically results 
+            in lower levels of functionality. 
+      
+     The elements listed above collectively define a completely 
+     internationalized domain name system, which is capable of 
+     servicing internationalized domain names in all compliant systems, 
+     and which is also capable of providing ACE encoded equivalent 
+     domain names when any component from the internationalized service 
+     is not available. 
+      
+      
+  3.6.    Process Example 
+      
+     This section illustrates a series of query/response transactions 
+     under which the processes and protocols defined in this document 
+     function. This example uses a reverse lookup for the PTR resource 
+     record associated with the "14.2.0.192.in-addr.arpa." domain name 
+     (forward lookups work similarly, but the issues are more fully 
+     demonstrated by PTR lookups). Each of the various technologies 
+     shown below are described in later sections of this document. The 
+     sole purpose of this example is to provide an illustration of 
+     these mechanisms in order to facilitate better discussion. 
+      
+     Note that this illustration represents a worst-case scenario 
+     (thereby exercising most of the functionality provided by this 
+     specification), and does not represent a typical scenario. 
+      
+        a.  First, a PTR resource record for 14.2.0.192.in-addr.arpa. 
+            is added to the internationalized zone database on the 
+            replication master server for the 2.0.192.in-addr.arpa. 
+   
+  Hall                    I-D Expires: May 2002              [page 13] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            zone, with the resource record data value of 
+            "host.<idn>.example.com." (where <idn> is an 
+            internationalized domain name compliant with the host 
+            naming rules provided in this document). Both of these 
+            domain names have a primary representation consisting of 
+            UCS characters in some local encoding, but are also 
+            available as UTF-8 and ACE encoded data so they can be 
+            encapsulated within DNS queries and responses. 
+      
+            Once the zone is reloaded and is replicated by the other 
+            authoritative servers for that zone, the domain names can 
+            be processed. 
+      
+        b.  An application on a remote system generates a DNS lookup 
+            for the PTR resource record associated with the 
+            14.2.0.192.in-addr.arpa. domain name. 
+      
+            If this is a legacy application, it issues the lookup using 
+            the only method it knows, which is to pass the domain name 
+            to the legacy resolver API. This would result in the 
+            resolver issuing a legacy DNS query for the PTR resource 
+            record associated with the specified domain name. 
+      
+            If this application is compliant with this specification, 
+            it performs the following steps: 
+      
+            1.   Verify that the resolver is capable of processing 
+                 queries for UTF-8 domain names by probing for an 
+                 internationalized API. If this step failed, then the 
+                 domain name would be converted to the legacy STD13 
+                 octet encoding in step 3.6.b.3 and passed to the 
+                 resolver's legacy API. 
+      
+            2.   Convert the domain name from its generated encoding to 
+                 the canonical UCS characters, and then normalize and 
+                 case-convert the UCS characters. 
+      
+            3.   Convert the normalized and lowercased UCS characters 
+                 to the charset or encoding used by the resolver's 
+                 internationalized API. 
+      
+            4.   Issue a lookup for the PTR resource record associated 
+                 with the internationalized domain name, via the 
+                 resolver's internationalized API. 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 14] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+                 Note that even though the domain name is compatible 
+                 with the legacy host name rules, the domain name is 
+                 passed through the internationalized API so that 
+                 servers can tell whether or not the original 
+                 application is UTF-8 compliant, and can determine the 
+                 format of any internationalized domain names which are 
+                 to be returned in the response messages. This is 
+                 required in case the queried resource record includes 
+                 internationalized domain names as resource record data 
+                 (as would be the case with PTR resource records), and 
+                 is also required for the proper handling of any SOA or 
+                 NS resource records which may be returned as 
+                 additional data in the response. 
+      
+            For the purpose of this example, we will assume that each 
+            of these steps were successfully performed. 
+      
+        c.  The client's stub resolver generates the query, with the 
+            Question Section of the query containing the UTF-8 encoded 
+            domain name encapsulated in an EDNS/UTF-8 extended label. 
+      
+        d.  The stub resolver sends the query to one of its configured 
+            resolving servers. 
+      
+        e.  The resolving server will either answer the query from its 
+            cache or forward the query to a name server which is 
+            authoritative for the namespace hierarchy, as per the 
+            normal query-resolution procedure. For the purpose of this 
+            example, we will assume that the server has no information 
+            about the specified domain name, so it forwards the query 
+            to one of the root zone's authoritative servers in order to 
+            begin the iterative resolution process. 
+      
+        f.  The queried server responds with a referral, providing 
+            delegation data for a zone in the path to the queried 
+            domain name. For the purposes of this example, we will use 
+            192.in-addr.arpa. as the delegation domain specified in the 
+            referral message. 
+      
+            The specific format of the referral will depend on whether 
+            or not the queried server understands the EDNS/UTF-8 label 
+            encoding. If the server is compliant with this 
+            specification (which it is, or else it wouldn't have 
+            answered with a referral), then the referral will also 
+            provide ENDS/UTF-8 encoded domain names in the Authority 
+            and Additional-Data Sections of the referral. If the server 
+   
+  Hall                    I-D Expires: May 2002              [page 15] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            was not compliant with this specification, it would return 
+            an error upon seeing the extended label type, which would 
+            cause the resolving server to restart the query using the 
+            legacy label type. 
+      
+        g.  The resolving server decodes the UTF-8 encoded domain names 
+            to their UCS character representation, caches the resource 
+            records in their UCS form, and sends the query to one of 
+            the authoritative servers for the referral zone. Note that 
+            the cache did not normalize or case-convert the UCS 
+            characters; only the end-systems perform this work. 
+      
+        h.  In this case, the queried server does not understand the 
+            EDNS/UTF-8 label format, and has returned a FORMERR 
+            response code. 
+      
+        i.  When these errors are encountered, the current resolver 
+            (whether this is the client's stub resolver or a caching 
+            server in the query path) must convert the query domain 
+            name from its current form to a legacy-compatible encoding 
+            (either ACE or STD13 octet sequences, depending on the UCS 
+            characters which have been encoded), and then has to 
+            reissue the query in that format. 
+      
+            In this case, the domain name only contains printable 
+            characters from US-ASCII, so the STD13 octet encoding is 
+            used for the fall-back query. Because the UCS domain name 
+            was normalized and lowercased before it was passed to the 
+            client's stub resolver, the legacy domain name will also be 
+            in this format (although it will be compared in a case-
+            neutral form by the recipient server). 
+      
+            Note that once this conversion takes place, the legacy 
+            label format is used for the remainder of the current query 
+            chain (this prevents excessive delays from multiple fall-
+            back operations, which could result in timeouts at the 
+            original resolver or application).  
+      
+   
+  Hall                    I-D Expires: May 2002              [page 16] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+        j.  The queried server returns a delegation referral for the 
+            2.0.192.in-addr.arpa. zone. Since the query arrived in the 
+            STD13 octet encoding, the server has no indicator of the 
+            client's capabilities, so the referral NS resource records 
+            will also be returned in legacy compatible form (either as 
+            STD13 octet sequences or as ACE encoded data, depending on 
+            the character codes provided in each label from each of the 
+            associated domain names). 
+      
+            Note that even though these NS resource records will be 
+            restricted to legacy-compatible host names and label types, 
+            they may contain and reference ACE domain names. In this 
+            regard, a legacy server in the delegation path does not 
+            prevent internationalized domain names from being delegated 
+            or resolved, but only prevents them from being processed as 
+            EDNS/UTF-8 extended labels. 
+      
+            Also note that once the authoritative servers for a zone 
+            have been discovered and cached, any subsequent UTF-8 
+            queries which are generated for the resources in that zone 
+            will be sent directly to one of those servers, bypassing 
+            the delegation hierarchy. As such, subsequent queries which 
+            are provided in EDNS/UTF-8 labels can be processed directly 
+            by the zone's authoritative servers, without the delegation 
+            servers disrupting the process. 
+      
+        k.  The resolving server decodes the STD13 octet sequences and 
+            ACE encoded domain names to their UCS character 
+            representations, caches the resource records, and resends 
+            the query to one of the authoritative servers for the 
+            referral zone. 
+      
+        l.  The queried server processes the request. Since this query 
+            arrived as an STD13 octet sequence, the server must compare 
+            the seven-bit characters from the domain name (which is all 
+            of them, in this example) in a case-neutral form. Note that 
+            if the query had arrived as ACE or UTF-8 encoded domain 
+            names, the server would have decoded the specified domain 
+            name to its canonical UCS characters and performed a case-
+            exact match against the resulting characters. 
+      
+        m.  The queried server responds with the requested data. Note 
+            that the query was submitted in the legacy label form due 
+            to the fall-back processing which occurred in step 3.6.i, 
+            so the server will only respond to this query with STD13 
+   
+  Hall                    I-D Expires: May 2002              [page 17] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            octet sequences or ACE encoded domain names, using the 
+            STD13 legacy label. 
+      
+        n.  The resolving server decodes the STD13 octet sequences and 
+            ACE encoded domain names to their UCS character 
+            representations, and caches the resource records. Since the 
+            query was originally received as an internationalized 
+            domain name (as indicated by the EDNS/UTF-8 extended label 
+            from the original query), the resolving server has to 
+            encode the answer data as UTF-8 before passing it back to 
+            the client's stub resolver. However, since the input was 
+            not provided in an encoded UCS form, the server has to 
+            normalize and case-convert the STD13 octet sequence in 
+            order to provide a valid internationalized domain name. 
+      
+        o.  The stub resolver decodes the UTF-8 encoded domain names 
+            which have been provided in the response message to their 
+            UCS character representation, and passes the data to the 
+            original calling application using the charset or encoding 
+            favored by the resolver. 
+      
+        p.  The application validates the received domain name by 
+            decoding the internationalized domain name to its canonical 
+            UCS characters, normalizing and down-casing the resulting 
+            domain name, and comparing the results with the answer data 
+            which was provided by the resolver. 
+      
+     As can be seen, the UTF-8 name resolution process is identical to 
+     the current resolution process, with the addition of a single 
+     fall-back query in step 3.6.i which resulted in one extra 
+     query/response pair (roughly equivalent to adding one extra 
+     delegation referral into the query path), and with several 
+     different encoding conversions, as required by the participating 
+     systems and services. This example also illustrates the 
+     requirements which are placed on developers, zone administrators, 
+     and network operators in order for typical connection identifier 
+     services to function with UTF-8 domain names. 
+      
+     However, if each system and service had used UTF-8 for encoding 
+     purposes (including everything between the stub resolver's APIs 
+     and the authoritative servers for the target zone), then no 
+     additional queries or conversions would have been required (other 
+     than the direct UCS conversions required for validation and 
+     caching, the latter of which can be performed separately without 
+     affecting the processing path). In this regard, the example above 
+     illustrates how this system can function even when only a portion 
+   
+  Hall                    I-D Expires: May 2002              [page 18] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     of the participating systems utilize UTF-8, and also illustrates 
+     how effective the entire operation would be if all of the 
+     recommendations and requirements provided in this specification 
+     were adopted. 
+      
+     It is also important to reiterate here that any such costs 
+     associated with this compliance are entirely elective by the 
+     affected parties. If they want to streamline the process, the 
+     option is available to them, although the system also works when 
+     very few optimizations are implemented. 
+      
+      
+  4.      The Internationalized Namespace 
+      
+     In simple terms, this specification defines an internationalized 
+     namespace which consists of domain names and labels that contain 
+     UCS character codes, and also specifies a series of encoding 
+     formats which may be used whenever the UCS values need to be 
+     encapsulated for transmission within DNS messages or application 
+     data streams. 
+      
+     In this regard, the internationalized namespace is the UCS 
+     representation of the domain names and labels as they are used for 
+     comparison operations once a domain name arrives for processing, 
+     while the transfer encodings ensure that a domain name arrives at 
+     the destination system intact, so that it may be processed in its 
+     canonical form. 
+      
+     There are four conceptual elements to this model: 
+      
+        *   Character codes. Labels from internationalized domain names 
+            have a single logical canonical representation as sequences 
+            of UCS code point values. The UCS characters are used when 
+            a particular label from a domain name is created by an 
+            application, stored in a zone, hosts or cache database, and 
+            is used whenever two sets of domain names or labels need to 
+            be compared. However, different kinds of domain names have 
+            different rules which govern the character codes that may 
+            be used. 
+      
+        *   Storage encodings. Whenever a domain name is created or 
+            copied from the network, it must be stored in a format that 
+            is reversible to the canonical UCS character representation 
+            of that domain name. This specification does not mandate or 
+            require any particular storage encoding, and allows this 
+            decision to be made on a per-implementation basis, as long 
+   
+  Hall                    I-D Expires: May 2002              [page 19] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            as the storage encoding supports character codes which can 
+            be converted to UCS equivalent values for comparison 
+            purposes. However, the use of UTF-8 for this purpose is 
+            encouraged, since it is the most common. 
+      
+        *   Transfer encodings. Whenever a domain name needs to be sent 
+            over the network, it must be packaged in a form which is 
+            compliant with the capabilities of the transfer protocol in 
+            use. This document specifies three transfer encodings which 
+            may be used to encode canonical UCS character codes in DNS 
+            messages or application streams, which are: the octet 
+            encoding from STD13, the ACE encoding from <ACE-Z>, and the 
+            UTF-8 encoding from RFC2279. Each encoding has different 
+            costs and benefits in different usage scenarios. 
+      
+        *   Comparison operations. When two domain names need to be 
+            compared, they also follow rules which are appropriate to 
+            the type of domain name being provided, and the transfer 
+            encoding which may have been used to provide the domain 
+            name to the system. 
+      
+     This document defines four distinct types of internationalized 
+     domain names which may exist in the internationalized namespace, 
+     and also describes how each of the above considerations affect 
+     those domain names and their labels. These domain name types are 
+     described throughout the remainder of this section. 
+      
+      
+  4.1.    Internationalized Domain Names and Labels 
+      
+     This section describes the master template rules for all domain 
+     names and labels which may be used in the internationalized 
+     namespace, although subordinate rules and restrictions are also 
+     applied as secondary filters, depending on the intended usage of 
+     the domain name. 
+      
+     For example, domain names and labels which are to be used as 
+     internationalized host identifiers (either as host names, or as 
+     domain names which are used to specify a host) are restricted to a 
+     specific subset of UCS characters. Meanwhile, domain names and 
+     labels which are compliant with STD13's global rules are 
+     restricted to eight-bit code values, while the domain names and 
+     labels which are used as STD13 host identifiers are restricted to 
+     a specific subset of US-ASCII. 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 20] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+      
+     The following diagram illustrates how the subordinate rules are 
+     applied and interpreted against the master restrictions: 
+      
+                      +-----------------------+ 
+                      | Internationalized DNs | 
+                      +-----------------------+ 
+                       any UCS character codes 
+                          /       | 
+                         /        | 
+                        /         | 
+                       /          | 
+          +-----------+     +-----------+     +------------+ 
+          | Int. Host |     | STD13 DNs +-----+ STD13 Host | 
+          +-----------+     +-----------+     +------------+ 
+          normalized        character         ASCII letters, 
+          subset of         codes 0x00        numbers, and 
+          UCS chars         through 0xFF      hyphen char 
+      
+     As can be seen, the internationalized domain names and labels 
+     rules allow any UCS character code to be stored, although each 
+     particular usage of the domain names and labels will have their 
+     own secondary rules and restrictions. 
+      
+     In order to allow future documents to define additional rules as 
+     required for their usage, this document defines very few global 
+     rules on the core internationalized domain names and labels. 
+      
+      
+  4.1.1.  IDN syntax and structure 
+      
+     In this specification, an internationalized domain name consists 
+     of a variable number of labels, each of which contain a variable 
+     number of UCS character codes, not all of which will have defined 
+     UCS character interpretations. 
+      
+     Furthermore, the encoding system which is used to store and 
+     interpret those values on a system is not relevant to this 
+     specification, and is therefore not defined. The characters in a 
+     label can be stored in memory or on disk as UTF-8, UCS-4, ACE, or 
+     any other storage encoding which is desired by the operators and 
+     implementers of the affected system, as long as that encoding 
+     system is reversible to the canonical UCS character code values, 
+     and is able to represent the necessary range of UCS characters 
+     (the "necessary range" varies by operation). 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 21] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     The only universal restrictions which apply to internationalized 
+     domain names and labels are those which govern length. This 
+     specification requires that labels from internationalized domain 
+     names MUST be restricted to a minimum length of two characters and 
+     a maximum length of 63 characters, inclusive. The exception to 
+     this rule is the root domain, which is always represented by a 
+     zero-length label. Note that this rule specifically refers to the 
+     canonical UCS characters, rather than any encoded form (encoding 
+     will often result in labels and domain names with fewer actual 
+     characters, due to overhead from the encoding algorithm). 
+      
+     A fully-qualified internationalized domain name is formed by 
+     joining a series of labels together, with the most-contextually 
+     specific label in the left-most position of the label sequence, 
+     and with the root domain occupying the right-most position. The 
+     sum total of all labels in an internationalized domain name MUST 
+     NOT exceed 255 characters, inclusive. Any number of labels MAY be 
+     stored in the domain name, but the sum total of their lengths MUST 
+     NOT exceed this limit. 
+      
+     However, labels which contain UCS character codes greater than 
+     U+007F will result in multi-byte UTF-8 and ACE encodings, so the 
+     maximum length of a label or an internationalized domain name is 
+     governed by their UTF-8 and ACE encoded lengths. Both encodings 
+     MUST result in an encoded length of 63 octets or less in order to 
+     be usable, with a maximum cumulative length of 255 octets. 
+      
+      
+  4.1.2.  IDN transfer encodings 
+      
+     The UCS is currently occupies a 21-bit range of character code 
+     values, containing tens of thousands of assigned characters, and 
+     hundreds of thousands of unassigned characters. Due to the multi-
+     byte nature of the code point values, UCS characters cannot be 
+     passed as protocol or application data in most of the existing 
+     Internet protocols (including DNS messages), at least not without 
+     the help of some kind of encoding scheme. At the very least, the 
+     UCS character values have to be encoded as eight-bit sequences if 
+     they are to fit within existing eight-bit data structures, and 
+     have to be encoded as a subset of US-ASCII characters if they are 
+     to be usable with legacy protocols and applications which only use 
+     STD13's host identifier rules for their structured domain name 
+     data types. 
+      
+     With this objective in mind, this document defines three different 
+     transfer encoding systems which can be used to convert 
+   
+  Hall                    I-D Expires: May 2002              [page 22] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     internationalized domain names and labels into a form which is 
+     suitable for transfer in different data streams. These are the 
+     legacy STD13 octet encoding, ACE, and UTF-8. Each of these 
+     encoding schemes provide different benefits and capabilities to 
+     the internationalized DNS effort. 
+      
+        *   STD13 octets. The STD13 octet encoding scheme provides a 
+            direct one-to-one mapping between eight-bit characters and 
+            their eight-bit values, but it is only capable of storing 
+            character codes in the range of U+0000 through U+00FF, 
+            which severely restricts its usefulness. 
+      
+        *   ACE. The ACE encoding scheme is capable of storing UCS 
+            character code value as seven-bit sequences in STD13 legacy 
+            labels. While this makes it practically compatible with the 
+            legacy host identifier rules, the resulting data imposes 
+            additional labor on the Internet community, and the reuse 
+            of the legacy label also results in certain amounts of 
+            ambiguity with some DNS domain names and labels. 
+      
+        *   UTF-8. The UTF-8 encoding scheme is capable of encoding all 
+            UCS character code values as sequences of eight-bit data 
+            which are compatible with legacy DNS message restrictions, 
+            but the encoded output requires explicit support from 
+            internationalized applications and protocols. UTF-8 output 
+            uses a new label type in order to prevent additional 
+            ambiguity problems from arising. 
+      
+     The table below illustrates the UCS character code sequences which 
+     are supported by each of the different encoding schemes. 
+      
+                          STD13 
+                          Octets   ACE    UTF-8 
+                        +-------+-------+-------- 
+                        |       |       | 
+               US-ASCII |   Y   |       |   Y 
+                        |       |       | 
+              Eight-Bit |   Y   |   Y   |   Y 
+                        |       |       | 
+          Any UCS Chars |       |   Y   |   Y 
+                        |       |       | 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 23] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     More specifically, the character code sequence ranges and their 
+     valid encodings are: 
+      
+        *   US-ASCII. If a label only contains character codes from the 
+            range of U+0000 through U+007F, then it MAY be encoded as a 
+            legacy STD13 octet sequence or UTF-8, but MUST NOT be 
+            encoded as ACE. 
+      
+            Note that this specification explicitly prohibits seven-bit 
+            labels from being encoded as ACE data, since such an action 
+            would be redundant, results in greater processing overhead 
+            for those labels, and multiple representations introduce 
+            problems with caches on legacy systems. Furthermore, 
+            certain security risks would be introduced if this were 
+            allowed. For example, a malicious user could register or 
+            purposefully create an ACE encoded representation of the 
+            "example.com" label sequence such that users mistakenly 
+            sent sensitive data to malicious systems. 
+      
+            In order to prevent these problems from occurring, this 
+            specification requires that any ACE-encoded label which 
+            consists entirely of seven-bit characters MUST be 
+            immediately discarded with extreme prejudice. This rule 
+            applies to every implementation of this specification, 
+            including any applications, resolvers, caches or servers 
+            which process labels. 
+      
+        *   Eight-bit codes. If a label contains character codes from 
+            the eight-bit range of U+0000 through U+00FF, then it MAY 
+            be encoded as STD13 octet sequences, ACE, or UTF-8. This 
+            rule specifically requires that the label MUST contain at 
+            least one character from the eight-bit range, MAY contain 
+            any number of characters from the seven-bit range, but MUST 
+            NOT contain characters with code values which are greater 
+            than U+00FF. 
+      
+            Since the STD13 octet encoding and ACE both use the legacy 
+            STD13 label type, this specification relies on the input 
+            encoding of a domain name in order to determine the output 
+            encoding. In some cases, however, the input encoding will 
+            not be clear, or will not be specified, and this can result 
+            in some ambiguity with label sequences from this range. 
+      
+            For example, if the domain name provided in a query 
+            consists of seven-bit labels, then the STD13 octet sequence 
+            is the only valid encoding for the legacy STD13 label, 
+   
+  Hall                    I-D Expires: May 2002              [page 24] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            meaning that ACE could not have been used in the query. If 
+            the specified domain name exists as a CNAME resource record 
+            which refers to a domain name that contains eight-bit 
+            character codes, then the proper output encoding for that 
+            domain name will not be clearly discernable. Moreover, the 
+            STD13 and ACE encodings will generate different results, 
+            since the STD13 octet sequence will only contain a single 
+            octet for the eight-bit character, while the ACE encoding 
+            will contain multiple octets of encoded data. 
+      
+            When this situation arises, systems MUST give preference to 
+            the ACE encoding, on the assumption that the referenced 
+            character is more likely to represent a UCS character than 
+            an eight-bit code value (the UCS characters in this range 
+            are Latin-1, which are the most common characters after the 
+            legacy US-ASCII set). Furthermore, the ACE encoded 
+            representation of these characters allow for a broader 
+            range of subsequent operations (since it complies with the 
+            legacy host naming restrictions, it can be used with CNAME 
+            resource records that refer to hosts), while the STD13 
+            octet encoded representation does not. 
+      
+            It is possible to avoid this scenario on authoritative zone 
+            servers (and thus the affected caches) by allowing the 
+            operator to specify whether or not the input is Latin-1 UCS 
+            character data or binary data, with the server generating 
+            the proper output accordingly. Also note that the default 
+            encoding specified by this document is UTF-8, which does 
+            not suffer from the ambiguity problems described above. 
+      
+        *   Any UCS character codes. If a label consists of any 
+            character codes greater than U+00FF, then it MAY be encoded 
+            as ACE or UTF-8, but MUST NOT be encoded as STD13 octet 
+            sequences. STD13 is not capable of representing character 
+            codes greater than U+00FF, so it cannot be used with any 
+            UCS characters beyond the eight-bit range. 
+      
+     Encodings are performed on a per-label basis. Each label MUST NOT 
+     be encoded more than once. Also note that recursive encodings 
+     result in applications discarding the domain name. 
+      
+     When the STD13 octet encoding is used to encode labels for 
+     transmission, the labels are encoded according to the rules 
+     specified in STD13, and are encapsulated in STD13 legacy labels. 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 25] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     When ACE is used to encode labels for transmission, the labels are 
+     encoded according to the rules specified in <ACE-Z>, and are 
+     encapsulated in STD13 legacy labels (this process is described in 
+     section 5.2). 
+      
+     When UTF-8 is used to encode labels for transmission, the labels 
+     are encoded according to the rules specified in RFC2279, and are 
+     encapsulated in EDNS/UTF-8 extended labels (the format of this 
+     label is described in section 5.1). 
+      
+     Note that a domain name MAY contain any combination of STD13 octet 
+     encoded labels and ACE encoded labels. However, if a domain name 
+     contains any UTF-8 encoded labels, then ALL of the labels from 
+     that domain name MUST be encoded as UTF-8 data. This rule 
+     primarily exists so that DNS compression services can be 
+     maintained consistently, but it also prevents mixed referrals 
+     which can trigger unnecessary fall-back processing, and also 
+     provides a single encoding representation to internationalized 
+     systems which benefits efficiency. 
+      
+     The root domain (as specified by the zero-length label at the 
+     right edge of the domain name) MUST NOT be encoded with ACE. More 
+     specifically, zero-length labels MUST NOT contain any character 
+     data of any kind, and since ACE labels have prefix strings, they 
+     are explicitly forbidden from being used for the root domain. 
+      
+      
+  4.1.3.  IDN comparison operations 
+      
+     When an internationalized domain name label is received from the 
+     network as ACE or UTF-8 encoded data, the labels MUST be decoded 
+     to their canonical UCS character representation, and the resulting 
+     UCS characters MUST be compared as case-exact sequences to their 
+     stored equivalents. Except where specifically required in this 
+     specification (EG, validity tests which are performed by 
+     applications), normalization and case-conversion MUST NOT be 
+     performed against the resulting UCS character codes prior to any 
+     comparison operations being performed. 
+      
+     However, internationalized domain name labels which are received 
+     as STD13 octet sequences MUST be given special treatment, as these 
+     domain names could have originated from legacy systems operating 
+     under STD13's rules. In this case, the seven-bit US-ASCII 
+     alphabetic characters (U+0041 through U+005A, and U+0061 through 
+     U+007A) from those labels MUST be compared in a case-neutral form. 
+     All other code values MUST be compared as case-exact code values 
+   
+  Hall                    I-D Expires: May 2002              [page 26] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     (this particularly includes eight-bit characters, which were not 
+     defined by STD13). 
+      
+      
+  4.2.    Internationalized Host Identifiers 
+      
+     Internationalized host identifiers are a subset of the 
+     internationalized domain names described in section 4.1, which 
+     only use a subset of the allowable UCS characters, but which reuse 
+     the global transfer encodings and comparison routines. 
+      
+     Most of the displayable characters from the UCS can be used in 
+     host identifiers, and there are no additional rules governing the 
+     ordering or length of their labels. However, the characters which 
+     are used in internationalized host identifiers MUST be normalized 
+     and case-converted before they are encoded for storage or 
+     transfer. This requires more effort on the part of applications 
+     and servers when the internationalized domain names are initially 
+     created, but results in less ambiguity and lower processing 
+     requirements for servers, caches and resolvers during subsequent 
+     comparison operations. 
+      
+     The restrictions which govern the creation of internationalized 
+     host identifiers are as follows: 
+      
+        a.  Labels MUST be restricted to the subset of characters which 
+            are permitted by <nameprep> [nameprep]. Characters which 
+            are prohibited by <nameprep> MUST NOT appear in any label 
+            of any internationalized host identifier. 
+      
+        b.  Labels MUST be normalized through <nameprep> before they 
+            are stored or encoded for transfer. Internationalized host 
+            identifiers will not be normalized as part of any 
+            comparison operation, so systems MUST normalize the labels 
+            before they are stored or transmitted. 
+      
+        c.  Labels MUST be converted to lowercase according to the 
+            case-mappings rules specified in <nameprep> before they are 
+            stored or encoded for transfer. Internationalized host 
+            identifiers will not be converted to lowercase as part of 
+            any comparison operation, so systems MUST normalize the 
+            labels before they are stored or transmitted. 
+      
+     According to the rules above, a label from an internationalized 
+     host identifier which was originally created with the UCS 
+     character sequence of <LATIN CAPITAL LETTER A><COMBINING ACUTE 
+   
+  Hall                    I-D Expires: May 2002              [page 27] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     ACCENT><LATIN CAPITAL LETTER B> (U+0041 U+0301 U+0042) would be 
+     normalized and lowercased to <LATIN SMALL LETTER A WITH 
+     ACUTE><LATIN SMALL LETTER B> (U+00E1 U+0062). The normalized, 
+     lowercase form would be used as the canonical UCS character 
+     representation of that label when it was encoded for storage and 
+     transmission purposes, and would be the form which was used for 
+     comparison operations on any resolvers, caches and servers. 
+      
+     Internationalized host identifiers which are received from the 
+     network can contain labels which have been encoded as STD13 octet 
+     sequences, ACE or UTF-8. In all of these cases, the comparison 
+     rules defined in section 4.1.3 MUST be applied. 
+      
+      
+  4.3.    STD13 Domain Names 
+      
+     STD13 allows any eight-bit code values to be used in domain name 
+     labels. However, STD13 host identifiers (as described in section 
+     4.4 of this specification) are the most common form of STD13 
+     domain names, and have much tighter restrictions. 
+      
+     There are common uses of STD13 domain names which do not comply 
+     with the STD13 host identifier subset, however. One common example 
+     of this is SRV identifiers, which use an underscore character 
+     (U+005F) as part of their label syntax. Another common example is 
+     found when email addresses are provided in SOA and RP resource 
+     records, and where the left-hand side of the email address is 
+     stored as an STD13 domain name label which does not represent a 
+     host identifier. Furthermore, email addresses often contain extra 
+     characters which are not legal in STD13 host identifiers, such as 
+     a full-stop character (U+002E). For example, "joe.admin" could be 
+     stored as an STD13 domain name label in the fully-qualified domain 
+     name of "joe.admin.example.com.", which would represent the email 
+     address of "joe.admin@example.com" when that domain name was 
+     extracted from the SOA or RP resource record and processed. 
+      
+     Implementations of this specification MUST allow STD13 domain 
+     names to be created and stored, using the following rules: 
+      
+        a.  Labels MUST be restricted to the code values of U+0000 
+            through U+00FF. Restrictions on character content MUST NOT 
+            be applied (note that if this domain name will be used as 
+            part of an STD13 host identifier, the rules specified in 
+            section 4.4 MUST be used instead). 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 28] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+        b.  Labels MUST NOT be normalized or lowercased before they are 
+            stored or encoded for transfer. 
+      
+        c.  Systems MUST allow STD13 domain names to be specified as 
+            exact sequences of eight-bit octet values, and MUST NOT 
+            treat these sequences as canonical UCS characters which are 
+            normalized or lowercased. STD13 defines an escaping 
+            mechanism whereby the decimal value of the octet is 
+            prefaced with a reverse-solidus (such as "\193"), which is 
+            suggested for this usage. 
+      
+     STD13 domain names which are received from the network can contain 
+     labels which have been encoded as STD13 octet sequences, ACE or 
+     UTF-8. In all of these cases, the comparison rules defined in 
+     section 4.1.3 MUST be applied. Note that some of these sequences 
+     can contain octet code values which have not been normalized or 
+     lowercased by the originating system, since these values can be 
+     used to specify binary domain names. 
+      
+      
+  4.4.    STD13 Host Identifiers 
+      
+     This document does not deprecate, replace or modify the host name 
+     rules defined by RFC952, STD3 or STD13 as they apply to legacy 
+     host identifiers. However, there are several issues which affect 
+     the usage of these domain names and their labels in this system. 
+      
+     The range of characters which are currently defined as valid in 
+     STD13 host identifiers are the uppercase and lowercase letters, 
+     numbers and hyphen character from US-ASCII. No other characters 
+     are allowed to be used. Furthermore, the current rules also 
+     prohibit the use of the hyphen character in the first or last 
+     character position of a host identifier label. 
+      
+     Implementations of this specification MUST allow STD13 host 
+     identifiers to be created and stored, using the following rules: 
+      
+        a.  Labels MUST be restricted to the code values of U+002D, 
+            U+0031 through U+0039, U+0041 through U+005A, and U+0061 
+            through U+007A. 
+      
+        b.  Labels MUST NOT contain the code value of U+002D in either 
+            the first or last character position of the label. 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 29] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+        c.  The alphabetic characters MUST be converted to lowercase 
+            before they are stored or transmitted. STD13 host 
+            identifiers are always compared in a case-neutral form. 
+      
+     STD13 host identifiers which are received from the network can 
+     contain labels which have been encoded as STD13 octet sequences 
+     UTF-8. In both cases, the comparison rules defined in section 
+     4.1.3 MUST be applied. 
+      
+      
+  5.      Transfer Encodings and Label Types 
+      
+     As was discussed in section 4.1.2, internationalized domain names 
+     and labels are required to be encoded as either eight-bit or 
+     seven-bit data whenever they are transmitted as protocol or 
+     application data. 
+      
+     The particular output encoding format which will be used for any 
+     given label will be primarily determined by the capabilities of 
+     the participating end-point systems. If the application or 
+     protocol which is relaying the domain name labels supports 
+     internationalized domain names directly then UTF-8 encoded labels 
+     can be used, but if the protocol or application is only capable of 
+     supporting STD13 host identifiers as domain name data, then the 
+     STD13 octet and/or ACE encoded labels will have to be used. 
+      
+     With DNS messages in particular, the "data type" is the label 
+     encapsulation in use. Although STD13 legacy labels allow for the 
+     use of eight-bit codes, multiple encodings for the same basic 
+     character data result in interpretation problems without some form 
+     of ancillary tagging service. For this reason, each encoding is 
+     represented differently by this specification. When the STD13 
+     legacy label contains STD13 octet sequences then no tagging is 
+     provided, but if the STD13 legacy label contains ACE encoded data 
+     then the encoded sequence is tagged with an ACE identifier (a 
+     character prefix which does not normally appear in labels). When 
+     UTF-8 domain names are provided, an EDNS/UTF-8 extended label is 
+     used to encapsulate the internationalized domain name. 
+      
+     Furthermore, the encoding which is used for any label in the 
+     message will also determine the label type which is used to 
+     encapsulate and transfer the entire domain name. If any label 
+     contains EDNS/UTF-8 extended labels, then all of the labels from 
+     that domain name are required to be encapsulated for transfer in 
+     EDNS/UTF-8 extended labels. Conversely, if a domain name contains 
+     ACE or STD13 octet encoded labels, then all of the labels from 
+   
+  Hall                    I-D Expires: May 2002              [page 30] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     that domain name are required to be encapsulated for transfer 
+     using the STD13 legacy label format. 
+      
+     Note that other legacy applications and protocols will most likely 
+     be required to provide extended encodings or negotiation features 
+     before they can exchange internationalized domain names directly. 
+     However, new applications and protocols which are subsequently 
+     written to comply with BCP18 and this specification should not 
+     require any such effort, as they should be capable of transferring 
+     UTF-8 domain names from the beginning. 
+      
+      
+  5.1.    The EDNS/UTF-8 Label Type 
+      
+     Any internationalized domain name label which has been encoded as 
+     UTF-8 for transmission in a DNS message MUST be encapsulated as a 
+     EDNS/UTF-8 label. 
+      
+     The EDNS/UTF-8 extended label is an instance of EDNS extended 
+     label types (as defined by RFC2671). Extended labels are indicated 
+     by the leading bit pattern of 0b01 in the label type field (the 
+     first two bits from the "label length" octet of the STD13 legacy 
+     label type), with the remaining six bits of this octet indicating 
+     the extended label type in use. The EDNS/UTF-8 label type uses the 
+     binary value of 0b000011 for this indication (note that IANA may 
+     change this assignment). 
+      
+     EDNS/UTF-8 labels contain two subordinate units of data. The first 
+     octet contains a length indicator which works exactly the same as 
+     the length octet as used by STD13 legacy labels: if the first two 
+     bits of this octet are 0b00 then the rest of that octet provides 
+     the length of the label data field, but if the first two bits of 
+     this octet are 0b11 then the label is a pointer to some other 
+     label, and the remainder of the length octet provides an off-set 
+     which points to the length octet of the referenced label, as per 
+     the rules provided in section 4.1.4 of RFC 1035 (STD13, part 2). 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 31] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     The structure of the EDNS/UTF-8 extended label is illustrated by 
+     the following figure. 
+      
+                              1 1 1 1 1 1 1 1 1 1 
+          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 
+         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
+         |0 1|0 0 0 0 1 1|    length     |  label data  ///  | 
+         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
+      
+          0b01 “ The extended label identifier. 
+      
+          0b000011 “ The EDNS/UTF-8 extended label type identifier. 
+      
+          Length “ The number of octets in the label data, or the off-
+            set to the length octet of another EDNS/UTF-8 label. 
+      
+          Label data “ The label data, encoded as UTF-8 octets. 
+      
+     The following example shows the domain name of me.com, where the 
+     "e" in "me" is the UCS character <LATIN SMALL LETTER E WITH ACUTE> 
+     (U+00E9), which has the UTF-8 encoded octet sequence of 0xC3A9. 
+      
+            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
+         20 | 0  1  0  0  0  0  1  1|          0x03         | 
+            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
+         22 |        0x6D (m)       |      0xC3 (e')        | 
+            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
+         24 |      0xA9 (e')        | 0  1  0  0  0  0  1  1| 
+            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
+         26 |         0x03          |        0x63 (c)       | 
+            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
+         28 |        0x6F (o)       |        0x6D (m)       | 
+            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
+         30 | 0  1  0  0  0  0  1  1|         0x00          | 
+            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 
+      
+     Octet 20 identifies the EDNS/UTF-8 extended label type, while 
+     octet 21 indicates that the label is three octets long. Octet 22 
+     contains the UTF-8 value for lowercase "m", while octets 23 and 24 
+     contain the UTF-8 value for the UCS character <LATIN SMALL LETTER 
+     E WITH ACUTE> (encoded as 0xC3A9). 
+      
+     Similarly, octet 25 identifies another EDNS/UTF-8 extended label 
+     type, while octet 26 indicates that the label is three octets 
+     long, while octets 27 through 29 contain the UTF-8 values for the 
+     lowercase alphabetic sequence of "com". 
+   
+  Hall                    I-D Expires: May 2002              [page 32] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+      
+     Finally, octet 30 identifies another EDNS/UTF-8 extended label 
+     type, while octet 31 indicates that the label is zero octets in 
+     length, thereby signifying the root zone (the end of the queried 
+     domain name). 
+      
+     Note that the use of the EDNS/UTF-8 extended label type serves 
+     multiple purposes. On the one hand, it provides a method of 
+     signaling the resolver's capabilities to the server, so that the 
+     server can determine which format it needs to use when returning 
+     answers, referrals or errors. Moreover, using an encapsulation 
+     format which is not backwards compatible prevents certain 
+     ambiguity problems which can result from overloading the STD13 
+     legacy label with multiple encodings. These problems are seen in 
+     certain situations with STD13 octet encoding and ACE, where a 
+     server cannot adequately determine which encoding a resolver 
+     desires. By using a separate extended label type for UT-8, these 
+     kinds of ambiguities are avoided. 
+      
+     There are additional benefits which come from using EDNS extended 
+     label types, which are best expressed as "future possibilities". 
+     Once the EDNS extended label mechanisms are widely deployed, it 
+     becomes feasible to specify additional encoding mechanisms as soon 
+     as the Internet community deems it desirable. In this regard, 
+     defining alternative encodings is much easier the second time. 
+      
+      
+  5.2.    The STD13 Legacy Label Type 
+      
+     Any internationalized domain name label which has been encoded as 
+     ACE or STD13 octet sequences for transmission in a DNS message 
+     MUST be encapsulated within an STD13 legacy label. 
+      
+     This document does not deprecate, replace or extend the STD13 
+     octet encoding or label encapsulation rules defined by STD13. 
+     However, this document does provide some guidance on the creation 
+     and interpretation of ACE encoded labels when they are stored in 
+     legacy labels, which is necessary in order for recipient systems 
+     to properly detect and decode the label contents. 
+      
+     Note that STD13 octet sequences and ACE data MAY both be provided 
+     the same domain name. As such, each STD13 legacy label from a DNS 
+     message must be examined and processed independently. 
+      
+      
+   
+  Hall                    I-D Expires: May 2002              [page 33] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+  5.2.1.  ACE encoded labels 
+      
+     ACE encoded labels always begin with the character sequence of 
+     <TBD> (this document uses "zz--" as a placeholder sequence until a 
+     formal assignment is made). Any label which contains ACE encoded 
+     data MUST begin with this character sequence prefix. Similarly, 
+     any label which begins with this character sequence MUST be 
+     recognized and processed as an ACE encoded label, according to the 
+     rules defined in this specification. 
+      
+     Encoding and encapsulating a label as ACE data is a three-part 
+     process, as follows: 
+      
+        a.  Encode the canonical UCS character data from the 
+            internationalized domain name label into ACE using the 
+            procedure defined in <ACE-Z> 
+      
+        b.  Preface the encoded output with the "zz--" prefix sequence, 
+            thereby indicating that this label contains ACE encoded UCS 
+            character data. 
+      
+        c.  Determine the length of the encoded data and store this 
+            value in the STD13 legacy label's length octet. 
+      
+     Decoding an ACE label is the opposite of that process. 
+      
+     Note that whenever the ACE algorithm encounters a seven-bit 
+     character code in the input, it is passed through unmodified to 
+     the encoded output. If a label only contains seven-bit character 
+     codes, the label MUST NOT be encoded as ACE, and MUST be encoded 
+     as either STD13 octet sequences or UTF-8. Forcing a seven-bit 
+     label to be encoded as ACE serves no benefit, incurs additional 
+     processing on the end-point systems, and can also expose certain 
+     security risks. Any system which is capable of generating and 
+     deciphering ACE encoded labels is required to treat such sequences 
+     as hostile, and MUST dispose of them immediately without any 
+     further processing immediately; systems are forbidden to even 
+     return these labels in DNS error messages. 
+      
+     Similarly, ACE MUST NOT be used to encode any zero-length labels 
+     (including but not specifically limited to the root domain), since 
+     the presence of prefix characters in these labels can invalidate 
+     their protocol-specific interpretations. 
+      
+     When an STD13 legacy label is received which has "zz--" in the 
+     first four character positions, the label MUST be treated as an 
+   
+  Hall                    I-D Expires: May 2002              [page 34] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     ACE-encoded internationalized domain name, and MUST be decoded to 
+     its canonical UCS character values for further processing. 
+      
+     Note that STD13 legacy labels MUST be verified before the ACE 
+     encoded data is extracted (as per the rules defined in STD13 which 
+     govern the STD13 legacy label type), but systems which are 
+     compliant with this specification MUST perform all subsequent 
+     comparison, caching, or storage operations against the canonical 
+     UCS characters, and MUST NOT use the ACE encoded label sequence 
+     for any of these operations. 
+      
+     Note that the legacy systems which are not compliant with this 
+     specification will treat ACE encoded labels as any other STD13 
+     legacy label. 
+      
+      
+  5.2.2.  STD13 octet encoded labels 
+      
+     Any STD13 legacy labels which do not begin with the ACE prefix 
+     MUST be treated as STD13 octet encoding sequences. The rules for 
+     this process are defined by STD13's default label encapsulation 
+     services, although this document also provides some clarifications 
+     on the use of this encoding with internationalized domain names 
+     and labels. 
+      
+     Whenever the STD13 octet sequence is used to encode the labels 
+     from an internationalized domain name, the octet values of the 
+     canonical UCS characters are stored directly in the label. Because 
+     the DNS message is limited to octets, the range of UCS character 
+     codes which are eligible for use with STD13 octet sequences is 
+     limited to U+0000 through U+00FF. If any UCS character codes 
+     outside this range need to be transferred, the internationalized 
+     domain name label will have to be encoded as ACE or UTF-8. 
+      
+     Note that comparison operations for the seven-bit range of 
+     alphabetic character values MUST be performed in a case-neutral 
+     form, although eight-bit code values MUST NOT be normalized or 
+     case-converted as part of a comparison operation. These rules are 
+     required in order to ensure backwards compatibility with the STD13 
+     compliant systems which may be generating these labels as parts of 
+     an STD13 domain name while also supporting the normalization and 
+     case-conversion which may have been applied to the UCS characters 
+     in the storage or transfer encoding systems. 
+      
+      
+   
+  Hall                    I-D Expires: May 2002              [page 35] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+  6.      Application Guidelines 
+      
+     As was discussed in section 3.3, there are multiple scenarios in 
+     which an application can make use of internationalized domain 
+     names, ranging from simple lookups of connection identifiers to 
+     abstract encapsulations of unstructured application data. This is 
+     an extremely broad range of uses, which is complicated by the 
+     extreme pervasiveness of applications and protocols that use 
+     domain names for one or more of these purposes. 
+      
+     Furthermore, network applications face a complex array of input 
+     and output operations which will cumulatively affect the ability 
+     of that application to make use of the internationalized domain 
+     name system for various services and functions. These issues are 
+     illustrated by the figure below: 
+      
+                       [IDNs]              [IDNs] 
+                         |                   ^ 
+                         |                   | 
+                  +------V------+     +------+------+ 
+                  |    input    |     |   output    | 
+                  |   charset   |     |   charset   | 
+                  +-----------+-+     +-+-----------+ 
+                               \       / 
+                            +---+-----+---+ 
+                            | Application | 
+                            +---+-----+---+ 
+                               /       \ 
+                  +-----------+-+     +-+-----------+ 
+                  |   lookups   |     |   app data  <---> [IDNs] 
+                  +------+------+     +-------------+ 
+                         | 
+                  +------+------+ 
+                  |   resolver  <---> [IDNs] 
+                  +-------------+ 
+      
+     As can be seen, the ability for an applications to complete adopt 
+     internationalized domain names will be determined by many factors, 
+     any one of which could prevent the application from completely 
+     incorporating the restrictions and recommendations prescribed by 
+     this specification. 
+      
+     In order to allow for a flexible adoption schedule, this 
+     specification defines very few mandates that applications must 
+     adopt, but instead focuses on recommendations which applications 
+     should comply with whenever they need to use internationalized 
+   
+  Hall                    I-D Expires: May 2002              [page 36] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     domain names, and also provides recommendations for situations 
+     where the preferred behavior is not feasible. Applications which 
+     are compliant with all of the recommendations provided in this 
+     specification will be able to generate, store, transfer and 
+     resolve internationalized domain names throughout all of their 
+     operations, using UTF-8 as a common encoding for all of these 
+     operations. Meanwhile, applications which are not in complete 
+     compliance with this specification will still be able to make use 
+     of the internationalized domain names in these operations, 
+     although such access may be limited to using backwards-compatible 
+     encodings which require greater amounts of effort to implement and 
+     which provide fewer benefits. 
+      
+      
+  6.1.    Input and Output Charsets 
+      
+     If an application is unable to accept, process, store or display 
+     characters from the complete UCS repertoire, that application's 
+     support for internationalized domain names will be somewhat 
+     limited, by definition. 
+      
+     Although this document does not mandate any particular charset or 
+     encoding which all applications must use for all operations, 
+     applications SHOULD use coded character sets or encodings which 
+     can handle characters from a reasonable number of scripts. 
+      
+     In particular, the following areas have specific requirements: 
+      
+        *   Input charsets and encodings. Since UTF-8 is used as the 
+            default encoding for internationalized domain names 
+            throughout this specification (and others, such as BCP18), 
+            UTF-8 is also RECOMMENDED for use with input encodings of 
+            internationalized domain names in particular, although this 
+            is not required. Many platforms and development 
+            environments support UTF-8 as a local encoding of the UCS 
+            and it can be reasonably used with many types of input 
+            (such as configuration files), although many systems will 
+            require a specific encoding (such as UCS-2, or ISO/IEC 
+            8859-1) in situations which require memory access or 
+            keyboard input. 
+      
+            Regardless of the input encodings used, implementations 
+            MUST map domain names and labels to their canonical UCS 
+            characters for any normalization and case-conversion work 
+            which is subsequently required by any DNS lookups (see 
+            section 6.3). 
+   
+  Hall                    I-D Expires: May 2002              [page 37] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+      
+        *   Output choices will likely be limited to a system-preferred 
+            charset or encoding. In general, this document RECOMMENDS 
+            that output systems choose an output charset or encoding 
+            which reflects the data being provided. However, 
+            applications MUST NOT display unknown characters with 
+            generic replacement characters (such as boxes or circles) 
+            if it is known that the original characters are not 
+            available for display with the specified charset, as such 
+            characters will almost certainly trigger failure conditions 
+            in subsequent protocol operations. 
+      
+     In those situations where adequate input or output charsets or 
+     encodings are unavailable, applications MAY use ACE to encode 
+     internationalized domain names for the purpose of ensuring that 
+     the data is provided intact. Since ACE is capable of representing 
+     UCS characters as sequences of seven-bit characters, it is 
+     functionally usable as a last line of defense in almost any 
+     environment, with the caveat that ACE encoding sequences are 
+     extremely cryptic and will likely result in lower levels of 
+     usability and functionality. 
+      
+      
+  6.2.    Protocol and Application Data 
+      
+     There are several interrelated issues which will determine an 
+     application's ability to provide or accept internationalized 
+     domain names as protocol or application data, although the 
+     principle determining factors for any such usage will generally be 
+     the capabilities of the underlying protocol itself. 
+      
+     If a protocol allows negotiation or tagging services in order to 
+     distinguish between different encodings, that protocol can likely 
+     be extended to support the use of UTF-8 as protocol or application 
+     data through command/response negotiation options or through data-
+     type tags. Older protocols which do not provide any negotiation 
+     services or which mandate the use of US-ASCII in all data will 
+     likely require the use of ACE encoded domain names as a short-term 
+     measure until the protocol is made compliant with BCP18. 
+      
+        *   Protocol data. If the protocol supports UTF-8 encoded 
+            internationalized domain names in commands or responses, 
+            then that encoding SHOULD be used wherever it is allowed. 
+            If UTF-8 is not supported by the protocol, STD13 octet 
+            sequences and/or ACE encoded equivalents of the 
+            internationalized domain name MUST be used. 
+   
+  Hall                    I-D Expires: May 2002              [page 38] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+      
+            In some cases, this negotiation can be performed on a per-
+            session basis, while in other cases this work will need to 
+            be performed for each transaction within the session, while 
+            in other cases the internationalized domain names will have 
+            to be tagged whenever they are provided as protocol or 
+            application data. 
+      
+            The DNS protocol is itself an example of a protocol which 
+            requires tagging in order for internationalized domain 
+            names to be exchanged within the existing DNS message (with 
+            these indicators taking the form of ACE encoding prefixes 
+            and EDNS/UTF-8 extended label type codes). Meanwhile, a 
+            protocol such as WHOIS can theoretically support a session-
+            wide negotiation option that allowed the use of 
+            internationalized domain names as protocol and application 
+            data for the duration of that session. Conversely, a 
+            protocol such as SMTP will likely require the use of 
+            session-specific identifiers for some operations, while 
+            other operations may be able to use label tags (similar to 
+            the existing support for domain literals, which are 
+            identified by a pair of surrounding square brackets). 
+      
+            Regardless of the encodings which are used, implementations 
+            MUST map domain names and labels to their canonical UCS 
+            characters for any normalization and case-conversion work 
+            which is subsequently required as part of a DNS lookup (see 
+            section 6.3). 
+      
+        *   Structured application data. Structured application data 
+            such as URLs and email addresses MUST be processed 
+            according to the rules which govern those data formats. 
+            Applications MUST NOT perform any conversion or 
+            transliteration which is not explicitly prescribed by the 
+            governing documents, since non-standard usages are likely 
+            to result in misinterpreted data. 
+      
+        *   Unstructured application data. Domain names which appear as 
+            unstructured data in application content are beyond the 
+            control of this specification, and are generally subject to 
+            the encoding and formatting desires of the end-users who 
+            created the data. Generally speaking, it is RECOMMENDED 
+            that applications allow users to enter or view documents in 
+            whatever format they prefer, but that any conversion 
+            between multiple source and destination charsets and 
+            encodings use UCS as the translation intermediary, such 
+   
+  Hall                    I-D Expires: May 2002              [page 39] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            that internationalized domain names are properly converted 
+            along with the rest of the application data. 
+      
+     In some cases, the application will need to probe the resolver 
+     before it can use internationalized domain names as data. For 
+     example, a participating system may need to determine the 
+     internationalized domain name of the local system so that it can 
+     provide this data in a protocol-specific banner message, and in 
+     these cases, the application will have to communicate with the 
+     resolver before this data can be provided. 
+      
+     Due to the usage-specific nature of internationalized domain names 
+     within protocol and application data streams, each development 
+     group will have to analyze the restrictions and capabilities which 
+     affect their specific services independently. 
+      
+      
+  6.3.    DNS Lookups and Resolver Calls 
+      
+     One of the most frequent uses for domain names is for lookup 
+     operations, such as for locating the IP addresses associated with 
+     a specified domain name, determining the domain name associated 
+     with a specified IP address, or performing a protocol-specific 
+     lookup operation for a specific resource record (such as the MX or 
+     SOA resource records associated with a specific domain). 
+      
+     Since these lookup operations do not directly affect external 
+     protocols or data, internationalized domain names can be used for 
+     lookup operations at the application's discretion. For example, 
+     applications such as ping and netstat only use domain names for 
+     display purposes, and can therefore make immediate use of 
+     internationalized domain names within their protocol operations. 
+     Similarly, a protocol can be limited to STD13 host identifiers as 
+     protocol identifiers which will require the application to provide 
+     internationalized domain names as ACE encoded sequences, but any 
+     lookup operations which are necessary for the internationalized 
+     domain names can still be performed in their native form. In these 
+     cases, the protocol operations and lookup operations are separate 
+     tasks with separate rules. 
+      
+     Similarly, applications are not required to use internationalized 
+     domain names and internationalized resolver APIs for every lookup. 
+     In some cases, it may be more efficient for an application to only 
+     use internationalized domain names for lookup operations against 
+     connection identifiers, and to use STD13 octet sequences or ACE 
+     encoded legacy lookups for domain names which were obtained as 
+   
+  Hall                    I-D Expires: May 2002              [page 40] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     protocol or application data (this will be especially true in 
+     those cases where the protocol does not yet provide an 
+     internationalized domain name data-type). In those cases where an 
+     application prefers to use the legacy resolution path, the 
+     application MUST use the resolver's legacy APIs. For lookups 
+     against internationalized domain names, the application MUST use 
+     the resolver's internationalized APIs. 
+      
+     Note that this specification does not define a mandatory encoding 
+     which must be used between the applications and the local 
+     resolver. However, resolvers MUST provide at least one encoding 
+     which is capable of supporting the entire UCS repertoire of 
+     character codes, including character codes which are currently 
+     unassigned. Since UTF-8 is the default encoding which is used 
+     throughout this specification, it is also RECOMMENDED for use with 
+     resolver APIs, although this is not required. Resolvers MAY 
+     dictate a local encoding, with the only requirement being support 
+     for the entire range of UCS character codes. 
+      
+     Regardless of the data being provided or the charset or encoding 
+     which is used to provide that data, applications MUST normalize 
+     and case-convert any internationalized host identifiers which it 
+     generates or receives from a lookup operation. This process MUST 
+     use the canonical UCS characters of the domain name according to 
+     the rules specified in <nameprep> for every host identifier which 
+     is sent to or received from a resolver. 
+      
+     If the application knows that the requested data specifically 
+     refers to a host identifier, then the domain name data which is 
+     returned by the resolver MUST be normalized and case-converted, 
+     and the resulting domain name MUST be compared to the original 
+     domain name which was received prior to the normalization and 
+     case-conversion steps. If the processed domain name does not match 
+     the domain name which was received, the domain name MUST be 
+     discarded as malformed. 
+      
+     This step is necessary in order to ensure the integrity and 
+     veracity of internationalized domain names which are processed by 
+     applications, since there are multiple opportunities for errors to 
+     be introduced (such as mistyped entries in the resolver's hosts 
+     database, or malicious data which has been purposefully provided 
+     in a zone), and these errors can result in sensitive data being 
+     directed to the wrong network. Note that the above rule 
+     specifically applies to host identifiers and not to all 
+     internationalized domain names as a whole; applications MUST NOT 
+     arbitrarily normalize and case-convert any and all domain names, 
+   
+  Hall                    I-D Expires: May 2002              [page 41] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     but MUST apply these steps to any and all domain names which are 
+     known to be used as host identifiers. 
+      
+     As part of the processing rules for DNS lookups, it is expected 
+     that an application can exchange internationalized domain names 
+     with the resolver using a charset or encoding which is capable of 
+     representing the entire UCS character code range. Towards this 
+     objective, applications SHOULD test the capabilities of the 
+     resolver prior to transferring internationalized domain names. In 
+     those situations where the resolver is unable to support this 
+     usage, the application MUST encode the internationalized domain 
+     name as STD13 octet sequences or ACE, and pass the resulting STD13 
+     host identifier to the resolver. 
+      
+      
+  7.      Resolver Guidelines 
+      
+     Resolvers play a crucial role in the use of internationalized 
+     domain names, in that they provide the internationalized namespace 
+     which applications work with. As part of this service, resolvers 
+     provide encapsulation services for the internationalized domain 
+     names which are exchanged with the applications, resolve queries 
+     in the internationalized namespace on behalf of the applications, 
+     and provide lookup matching for entries which are stored in a 
+     local hosts database. Note that resolvers which cache answer data 
+     for subsequent operations are also governed by the caching 
+     restrictions provided in section 9. 
+      
+      
+  7.1.    Resolver APIs 
+      
+     Stub resolvers which communicate directly with applications that 
+     are compliant with this specification are strongly encouraged to 
+     provide a separate set of APIs for those applications to use 
+     whenever internationalized domain names need to be provided in 
+     queries or response messages. 
+      
+     The use of an internationalized API will generally facilitate 
+     smoother operations for the applications, in that it will allow 
+     the application to determine the capabilities of the resolver, to 
+     obtain the internationalized domain name of the local system, and 
+     to process queries for internationalized domain names as special 
+     data types. 
+      
+     Furthermore, the use of internationalized versus legacy APIs 
+     provides a way for resolvers to separate internationalized and 
+   
+  Hall                    I-D Expires: May 2002              [page 42] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     legacy application query paths, such that the legacy APIs only 
+     result in STD13 legacy labels, while the internationalized APIs 
+     generate and trigger EDNS/UTF-8 extended labels. The output 
+     formatting of the DNS messages are controlled by tight 
+     restrictions, and the use of alternative APIs will likely result 
+     in simpler resolver implementations. 
+      
+     For example, it is suggested that applications use the 
+     internationalized APIs for all of the DNS lookups they generate, 
+     even if the domain name only contains seven-bit characters. This 
+     is required in case the queried domain name only exists with a 
+     CNAME or PTR resource record which references an internationalized 
+     domain name, and the server has to know which encoding to use for 
+     that query. If the client had not used the internationalized API 
+     for the original lookup of the domain name, the resolver may have 
+     chosen the wrong label type, and thus the response data would only 
+     be returned as ACE encoded data. 
+      
+     Conversely, older applications which generate malformed eight-bit 
+     queries through the legacy APIs will result in those queries being 
+     properly rejected by the DNS servers, preventing undue problems 
+     with these applications from occurring. For example, an older 
+     application may process an internationalized domain name through 
+     the system-default charset or encoding (such as MacRoman), which 
+     would result in the domain name being malformed when the 
+     application tried to do something important with that domain name 
+     (such as send an email message over SMTP). The use of multiple 
+     APIs causes these malformed applications to break, and the invalid 
+     domain names are kept out of the application protocol space. 
+      
+     Internationalized APIs are optional to the extent that an 
+     application MAY use an embedded resolver which is known to be 
+     capable of generating and processing internationalized domain 
+     names through the existing function calls. However, the use of 
+     separate APIs for internationalized domain names is encouraged. 
+      
+     Although this document does not mandate any specific APIs, the 
+     following functions SHOULD be provided for in some form: 
+      
+        *   Test Wide. Applications MUST be able to test the resolver 
+            for compliance with this specification. In those cases 
+            where this function is performed by some other function 
+            (such as one of the following), the capabilities of the 
+            resolver MUST be detectable even if the requested operation 
+            fails. For example, if an application issues a call for the 
+            internationalized domain name of the local system, the 
+   
+  Hall                    I-D Expires: May 2002              [page 43] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            capability of the resolver to handle internationalized 
+            domain names MUST be uniquely represented even if the local 
+            host name cannot be determined. 
+      
+        *   Get Wide X-By-Y. Applications SHOULD be able to specify any 
+            resource record associated with any internationalized 
+            domain name as part of a lookup operation. Whether this 
+            service is provided as a series of lookup-specific APIs or 
+            as a general purpose API is up to the resolver. 
+      
+        *   Get Wide Local Name. Applications which utilize 
+            internationalized domain names as data will need to be able 
+            to determine the internationalized form of their local 
+            system name for some operations (such as a protocol-
+            specific welcome banner). When this function is called, the 
+            resulting data MUST be provided as the canonical UCS 
+            character code values, or their equivalent as represented 
+            by a locally mandated charset or encoding. 
+      
+            Note that an ACE equivalent of the system name SHOULD be 
+            returned when the relevant legacy API is queried. In those 
+            cases where the legacy and internationalized domain names 
+            both contain seven-bit character codes (possibly because 
+            the host name is only available in US-ASCII, or because the 
+            host name was assigned as ACE by an external configuration 
+            service), the internationalized host name MUST still be 
+            accessible through the internationalized function. 
+      
+     Note that this application does not specify a charset or encoding 
+     which must be used by the resolver APIs. However, wherever an 
+     internationalized API is presented, the resolver MUST utilize a 
+     charset or encoding which supports the entire UCS repertoire of 
+     character codes, including character codes which are currently 
+     unassigned. Since UTF-8 is the default charset for most of the 
+     operations specified in this document, it is also RECOMMENDED for 
+     this service, but is not required. 
+      
+      
+  7.2.    Query Processing Services 
+      
+     Resolvers which are compliant with the recommendations provided in 
+     this specification will provide two query paths, one of which 
+     supports STD13 domain names and another which supports 
+     internationalized domain names. Technically, there is no 
+     requirement for two processing paths, although these paths will 
+   
+  Hall                    I-D Expires: May 2002              [page 44] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     likely exist as conceptual paths even if they are not represented 
+     or implemented uniquely in all resolvers. 
+      
+     The legacy processing path is defined by STD13. This document does 
+     not update, modify or extend the rules that resolvers operate 
+     under when an STD13 compliant domain name is received by a legacy 
+     application through any legacy APIs which may exist. However, when 
+     an internationalized domain name is received from an 
+     internationalized application through any internationalized APIs, 
+     the processing rules defined in this section MUST be followed. 
+     Note that these rules apply to all resolvers, whether they are 
+     stub resolvers, forwarders or caching servers. 
+      
+     Generally speaking, the internationalized domain name resolution 
+     process has two major components: processing internationalized 
+     domain names as queries, and performing fall-back processing if an 
+     EDNS/UTF-8 query is rejected by an authoritative server. 
+      
+      
+  7.2.1.  Internationalized queries 
+      
+     Queries for internationalized domain names which are received 
+     through internationalized APIs can be expected to have originated 
+     at an application which is capable of accepting and processing 
+     internationalized domain names in the response messages. 
+      
+     Resolvers MUST encode the labels from the queried domain name as 
+     UTF-8 and encapsulate the resulting encoded labels into EDNS/UTF-8 
+     extended labels for transfer within DNS messages, per the 
+     instructions provided in section 5.1. 
+      
+     Any and all responses to these queries will also be encoded as 
+     UTF-8 and encapsulated in EDNS/UTF-8 extended labels. Resolvers 
+     MUST decode the provided response data, convert the labels to 
+     their canonical UCS character codes, and return the requested data 
+     to the calling application. 
+      
+     The resolver MUST NOT normalize or case convert internationalized 
+     domain names which may be received in queries or response 
+     messages. Since the queries have originated from applications 
+     which have indicated that they are compliant with this 
+     specification (via the API) while the responses will have 
+     originated from caches or servers which indicate that they are 
+     also compliant (via the EDNS/UTF-8 extended labels), those systems 
+     are assumed to have normalized and case-converted the domain names 
+     before they were generated or stored. Also note that applications 
+   
+  Hall                    I-D Expires: May 2002              [page 45] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     will validate the host identifiers that they receive in response 
+     messages, so an additional check is expected to be performed on 
+     the answer data by those systems. 
+      
+      
+  7.2.2.  Fall-back processing 
+      
+     If a queried server is unable to process EDNS/UTF-8 extended 
+     labels, then it is required by STD13 to generate an error 
+     signifying the problem. Resolvers MUST interpret these errors, 
+     decode the UTF-8 queried domain name, re-encode it as STD13 octets 
+     and/or ACE per the instructions provided in section 5.2, and then 
+     reissue the query as an STD13 legacy label sequence. 
+      
+     The legacy DNS error responses which will trigger this series of 
+     events are FORMERR and NOTIMPL. Any other errors indicate that the 
+     EDNS/UTF-8 extended label was successfully processed but that the 
+     query was not matched, and those errors MUST be returned to the 
+     application. If the fallback processing results in any error 
+     responses whatsoever, then the resolver MUST return those errors 
+     to the calling application. 
+      
+     Any servers which subsequently receive the fall-back queries and 
+     which are compliant with this specification will process the 
+     queries as internationalized domain names, and will return the 
+     answer data as STD13 octet sequences or ACE encoded data, using 
+     the STD13 legacy label. 
+      
+     Generally speaking, fall-back processing serves two purposes: 
+      
+        *   Answering the initial query. If a UTF-8 domain name cannot 
+            be resolved because a server in the delegation path does 
+            not understand the EDNS/UTF-8 label type, the resolver can 
+            reissue the query as an ACE encoded legacy label type so 
+            that the query proceeds past the problematic server. 
+      
+        *   Seeding the resolver's cache. As a result of the above, the 
+            resolver will learn about the authoritative name servers 
+            for the target zone, and this information can be used for 
+            any subsequent queries for domain names within the 
+            specified zone (for as long as the data is cached, anyway). 
+            As such, any subsequent EDNS/UTF-8 queries which are issued 
+            for the portion of the namespace served by that zone will 
+            be sent directly to one of those authoritative servers 
+            where they can be answered directly. In this regard, 
+   
+  Hall                    I-D Expires: May 2002              [page 46] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+            subsequent lookups do not require fall-back processing if 
+            they are received during the cache window. 
+      
+     Regardless of whether or not fall-back processing has been 
+     performed, if the calling application issued the original query as 
+     an internationalized domain name, then the resolver MUST respond 
+     to the query in that form as well. This means that the resolver 
+     MUST convert any STD13 octet sequences or ACE encoded labels into 
+     their canonical UCS characters, convert the answer data into the 
+     resolver's native charset or encoding, and return the data to the 
+     calling process. The resolver MUST NOT perform any normalization 
+     or case-conversion during this process, as such an action can 
+     corrupt domain names which are not used for host identifiers. 
+      
+     If the original query was received through the resolver's legacy 
+     APIs, then the query MUST be generated and returned in the legacy 
+     format, and MUST NOT be converted to an internationalized domain 
+     name prior to the query or response being passed through. 
+      
+     Once fall-back processing occurs, the process MUST NOT be repeated 
+     for any additional queries in the current lookup operation. No 
+     other queries from the current lookup operations MUST NOT be sent 
+     as EDNS/UTF-8 extended labels, since multiple fall-back operations 
+     can result in time-outs on the client systems. 
+      
+     Because the fall-back process results in two lookups being issued 
+     against the rejecting zone, eliminating the fall-back processing 
+     as soon as possible will be an operational requirement for many 
+     organizations. Any caches or forwarders which are used by stub 
+     resolvers within an end-user network are practically required to 
+     be able to process the EDNS/UTF-8 queries, since those servers 
+     will receive every query which is issued by the stub resolvers. 
+     While this isn't a technical requirement (fall-back processing 
+     will get around the problematic servers), it will likely prove to 
+     be a consideration for network operators looking to support 
+     internationalized domain names on their local networks. 
+      
+     This document also strongly encourages the root and TLD servers to 
+     be upgraded as soon as possible (even if they do not intend to 
+     directly provide UTF-8 domain name delegations), in order to allow 
+     those servers to read and process the EDNS/UTF-8 extended labels, 
+     thereby reducing the number of fall-back queries which are sent to 
+     those servers. 
+      
+      
+   
+  Hall                    I-D Expires: May 2002              [page 47] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+  7.3.    The Hosts Database 
+      
+     Generally speaking, there are two areas of consideration for stub 
+     resolvers that provide local hosts databases for name resolution 
+     services. These are the input requirements for internationalized 
+     domain names which will be added to the hosts database, and the 
+     requirements which govern how queries will be compared to the 
+     entries in the hosts database. 
+      
+     Note that resolvers are not required to implement a hosts database 
+     or local lookup services (STD3 says "a host MAY also implement a 
+     host name translation mechanism that searches a local Internet 
+     host table"). However, wherever a hosts database is provided with 
+     an internationalized resolver, compliance with the rules specified 
+     in this section is required. 
+      
+     If a stub resolver offers the capability to compare 
+     internationalized domain names against a local hosts database, 
+     that database MUST be compatible with the internationalized domain 
+     name rules specified in section 4 of this document. 
+      
+     In particular, the resolver SHOULD allow internationalized domain 
+     names with any code values to be stored, even if the canonical UCS 
+     characters for those values are undefined or are illegal for use 
+     with internationalized host identifiers (this is required to 
+     support domain names which are not host identifiers). In those 
+     cases where an internationalized domain name specifies an exact 
+     sequence of octets for binary comparison, the hosts database MUST 
+     provide a mechanism for tagging the eight-bit characters so that 
+     they are not interpreted, processed or compared as the canonical 
+     UCS character equivalents of those codes. 
+      
+     However, entries which explicitly provide host identifiers MUST be 
+     normalized and case-converted prior to being stored. In order to 
+     satisfy both of these requirements, it is RECOMMENDED that hosts 
+     databases store internationalized host identifiers as untagged 
+     data, but that they also provide some sort of tagging service for 
+     character code values which are to be returned as-is. STD13 
+     defines an escaping mechanism whereby the decimal value of the 
+     octet is prefaced with a reverse-solidus (such as "\193"), which 
+     is suggested for this usage. 
+      
+     The storage format of the hosts database MAY use any charset or 
+     encoding the resolver deems most suitable for that platform, as 
+     long as the rules and restrictions provided above are followed. 
+     Since UTF-8 is used as the default encoding throughout this 
+   
+  Hall                    I-D Expires: May 2002              [page 48] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     specification, it is RECOMMENDED as the default encoding for hosts 
+     databases as well, although this is not required. 
+      
+     Not all of the applications which use a resolver are likely to be 
+     compliant with this specification, so resolvers MUST ensure that 
+     they are able to interpret and process any queries from the legacy 
+     APIs which provide the ACE equivalent of an internationalized 
+     domain name that is stored in the hosts database. When such a 
+     query arrives, the domain name MUST be converted to the canonical 
+     UCS character codes represented by the ACE encoded sequence and 
+     compared to entries in the hosts database in that form (tagged 
+     octets excluded). Any internationalized domain names which are 
+     required to be returned through the legacy APIs MUST be converted 
+     to STD13 octet sequences and/or ACE before they are returned. 
+      
+      
+  8.      Server Guidelines 
+      
+     When a zone administrator desires to provide internationalized 
+     domain names in a zone, they are presented with two options: they 
+     can add the STD13 octets or ACE encoded internationalized domain 
+     names to an existing zone, or they can use internationalized zone 
+     databases directly. Both of these usage scenarios have their own 
+     benefits and restrictions. 
+      
+     Using STD13 octet sequences and ACE with legacy servers allows for 
+     the immediate deployment of internationalized domain names on 
+     existing servers, and within hierarchies which include 
+     internationalized domain names. However, any such queries which 
+     originate at applications that are compliant with this 
+     specification will always initially fail, guaranteeing that fall-
+     back processing will always occur for those zones. 
+      
+     Conversely, using internationalized zones directly allows servers 
+     to process legacy, ACE and EDNS/UTF-8 queries equally, thereby 
+     providing greater value to the applications and resolvers which 
+     have been made compliant with this specification. However, 
+     internationalized zones have additional requirements (most 
+     notably, they are required to be upgraded simultaneously), and 
+     these will prove burdensome to some zone operators. 
+      
+     This specification focuses on the processing requirements for 
+     internationalized zones which support the use of internationalized 
+     domain names as explicit data, and which also support the 
+     necessary subordinate mechanisms such as EDNS/UTF-8 queries. When 
+     STD13 octet sequences or ACE encoded domain names are used with 
+   
+  Hall                    I-D Expires: May 2002              [page 49] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     legacy servers, the rules defined in STD13 for those servers MUST 
+     be used. 
+      
+     Note that each zone SHOULD be configurable independently. If a 
+     server hosts multiple zones, each of those zones SHOULD be 
+     operable as independent entities, with any of them using ACE or 
+     internationalized domain names as necessary. This rule is 
+     necessary since each zone is likely to have different replication 
+     partners and configuration rules which will require different 
+     migration strategies. 
+      
+      
+  8.1.    Internationalized Zones 
+      
+     All domain names which are published by an internationalized zone 
+     MUST be compatible with the restrictions specified in section 4 of 
+     this document. In particular, the zone database MUST allow binary 
+     domain names to be stored as any octet value, but MUST also comply 
+     with the normalization and case-mapping rules when a domain name 
+     represents a host identifier. These restrictions MUST be applied 
+     as part of the process in which the domain name is being added to 
+     the zone database. In those cases where an internationalized 
+     domain name specifies an exact sequence of octets for binary 
+     comparison, the hosts database MUST provide a mechanism for 
+     tagging the eight-bit characters so that they are not interpreted, 
+     processed or compared as the canonical UCS character equivalents 
+     of those codes. STD13 defines an escaping mechanism whereby the 
+     decimal value of the octet is prefaced with a reverse-solidus 
+     (such as "\193"), which is suggested for this usage. 
+      
+     Servers which are compliant with this specification MUST be 
+     capable of providing UTF-8 and ACE encoded representations of the 
+     UCS domain names which are stored in the zone, and servers MUST 
+     restrict output to only one label type for any protocol operation, 
+     such that queries containing STD13 legacy labels MUST be answered 
+     with STD13 octet sequences and/or ACE encoded domain names, while 
+     EDNS/UTF-8 queries MUST only be answered with UTF-8 encoded domain 
+     names (this not only includes basic operations such as simple 
+     queries, but also includes advanced operations such as zone 
+     transfers; see section 8.2). Similarly, external operations such 
+     as exporting the contents of the zone to a master file (as 
+     discussed in section 8.3) MUST result in a single encoding form 
+     being used for that specific operation. 
+      
+     Note that the underlying zone database technology which may be 
+     employed by any particular server is beyond the scope of this 
+   
+  Hall                    I-D Expires: May 2002              [page 50] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     document. Servers MAY use any database technology, charset or 
+     encoding deemed appropriate for the local environment, although 
+     the contents of the zone MUST be mapped to the canonical UCS 
+     character codes for all comparison operations (octet values 
+     excluded). Since UTF-8 is used as the default encoding throughout 
+     this specification, it is RECOMMENDED for use as the default 
+     encoding with zone databases as well, but is not required. 
+      
+     Servers MUST NOT normalize or case-map any UCS characters which 
+     are decoded from UTF-8 or ACE encoded labels, and MUST restrict 
+     comparison operations of these labels to precise matches of the 
+     UCS domain names which are stored in the zone database. However, 
+     the seven bit character codes from any labels which are received 
+     as STD13 octet sequences MUST be compared in a case-neutral form, 
+     and MUST NOT be normalized as part of the comparison operation. 
+      
+     When a zone is converted to support internationalized domain 
+     names, all of the servers which replicate that zone MUST be 
+     upgraded. This is required due to ambiguities that can occur with 
+     labels which may be encoded as either STD13 octet sequences or ACE 
+     data, and where the label only uses character codes from the 
+     eight-bit range of character codes (this problem is described in 
+     detail in section 4.1.2). In order to ensure that all of the 
+     servers for a zone respond to one of those queries correctly, all 
+     of the servers which replicate the zone MUST fully support this 
+     document and its requirements. 
+      
+      
+  8.2.    Namespace Visibility Restrictions 
+      
+     In all cases, the encoding format of the domain names which are 
+     returned in response to a query MUST be the same as the encoding 
+     format which was used by the query. If the query was provided as a 
+     sequence of legacy labels, then all of the domain names which are 
+     provided in the response message MUST be provided as legacy labels 
+     (containing either ACE or STD13 octet encoded values). 
+      
+     Similarly, if a query is provided as EDNS/UTF-8 encoded data, all 
+     domain names which are provided in the response message MUST be 
+     provided as UTF-8 encoded data in EDNS/UTF-8 extended labels. In 
+     some situations, this process may require the server to perform an 
+     extra conversion. 
+      
+     For example, assume that the <idn>.example.com. domain name has 
+     two associated MX resource records, one of which points to the UCS 
+     domain name of mail.<idn>.example.com, while the other points to 
+   
+  Hall                    I-D Expires: May 2002              [page 51] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     the ACE encoded domain name of mail.<ace>.example.net. (where the 
+     "<ace>" label is the ACE equivalent of an internationalized sub-
+     domain in the example.net. zone). If a UTF-8 query arrives for the 
+     MX resource records associated with the <idn>.example.com. domain 
+     name, both resource records MUST be returned as EDNS/UTF-8 data. 
+     In order for this requirement to be satisfied, the server will 
+     have to decode the <ace> label to its UCS canonical form for zone 
+     storage purposes, and encode the domain name as UTF-8 for 
+     transmission whenever an EDNS/UTF-8 answer set is required. 
+      
+     The visibility rules specified in this section are mandatory for 
+     every domain name which is provided in any message. If a system 
+     requests a zone transfer and uses the EDNS/UTF-8 extended label 
+     type in the request, all of the domain names in all of the 
+     messages which are sent as part of the zone transfer MUST be 
+     provided in their UTF-8 encoded form. Similarly, if a zone 
+     transfer is requested and uses the legacy label type, then all of 
+     the domain names from all of the messages which are sent as part 
+     of the zone transfer MUST be provided as either STD13 octet 
+     sequences or ACE encoded data, using the legacy label type. 
+      
+      
+  8.3.    The Master File Format 
+      
+     STD13 specifies a "master file" format which is used as a 
+     platform-neutral storage and transfer format for importing and 
+     exporting the contents of a particular zone. Note that the master 
+     file is not the same as the operating database for a zone; the 
+     master file format is used (or is useful) for copying a zone to 
+     another server, storing a copy of the zone database off-line, 
+     emailing a copy of the zone to another user or system, and 
+     performing other off-line actions against the database' contents. 
+     Once a zone is loaded on a server, however, any database 
+     technology can be used for managing the zones and generating 
+     response messages. 
+      
+     In order to facilitate the continued use of master files, any zone 
+     which is compliant with this specification MUST support the use of 
+     UTF-8 as an import and export encoding format for the master file 
+     associated with that zone. 
+      
+     Furthermore, compliant versions of a master file are required to 
+     have the "$UTF-8" control literal at the beginning of the first 
+     line of text in the master file if it contains UTF-8 encoded data. 
+     Master files from zones which do not contain UTF-8 encoded domain 
+   
+  Hall                    I-D Expires: May 2002              [page 52] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     names MUST NOT contain the "$UTF-8" control literal in the first 
+     print position of any line. 
+      
+     If the master file contains the "$UTF-8" control literal, all of 
+     the data within the master file MUST be encoded in UTF-8 as 
+     specified by RFC2279, and SHOULD be managed with UTF-8 compliant 
+     tools (such as UTF-8 text editors, mailers that support UTF-8 MIME 
+     encodings, and so forth). 
+      
+      
+  9.      Caching Guidelines 
+      
+     Whenever an internationalized domain name is stored in a cache, it 
+     MUST be stored in its canonical UCS character code form, 
+     regardless of whether the domain name was received as STD13 octet 
+     encoding sequences, UTF-8, or ACE data. Caches MUST NOT normalize 
+     or case convert any domain names that they store, as such a 
+     process could invalidate domain names that are not used for host 
+     identifiers. 
+      
+     Any subsequent queries which are processed through the cache MUST 
+     be compared against the stored UCS characters. Internationalized 
+     domain name labels which are decoded from UTF-8 or ACE labels MUST 
+     NOT be normalized or case-converted as part of the comparison 
+     operation, although labels which are provided as STD13 octet 
+     sequences MUST be compared as case-neutral octet values. 
+      
+     Caches MUST be capable of providing UTF-8 and ACE encoded 
+     representations of the UCS domain names which are stored in the 
+     cache, with the appropriate format determined by the format used 
+     in the corresponding query. However, answer data MUST be 
+     restricted to only one encoding form for any protocol operation, 
+     meaning that queries containing legacy labels MUST only be 
+     answered with STD13 octet sequences and/or ACE encoded labels, 
+     while UTF-8 queries MUST only be answered with UTF-8 encoded 
+     domain names. 
+      
+      
+  10.     Security Considerations 
+      
+     This document defines an extension to the domain name system, and 
+     as such, it inherits the weaknesses which already exist in DNS. 
+     Where possible, this specification strengthens DNS with multiple 
+     checks. For example, this specification requires that domain names 
+     be validated three times before they are used by applications: 
+     once on specification, once on entry at the authoritative zone or 
+   
+  Hall                    I-D Expires: May 2002              [page 53] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+     hosts database, and once again when the answer data is received by 
+     the requesting application. Despite these checks, the root 
+     weaknesses inherent in DNS are still present. 
+      
+     This document uses multiple encoding algorithms, although boundary 
+     conditions from the existing DNS are preserved for both the source 
+     and encoded representations. 
+      
+      
+  11.     IANA Considerations 
+      
+     This document requires the use of an EDNS extended label type 
+     identification code. This document uses the b000011 ELT code. 
+      
+      
+  12.     References 
+      
+          [AMC-ACE-Z] <draft-ietf-idn-amc-ace-z>, "AMC-ACE-Z version 
+            0.3.1" 
+      
+          [NAMEPREP] <draft-ietf-idn-nameprep>, "Preparation of 
+            Internationalized Host Names" 
+      
+          [RFC2119] "Key words for use in RFCs to Indicate Requirement 
+            Levels" 
+      
+          [RFC952] "DoD Internet host table specification" 
+      
+          [STD13] (RFC 1034) "Domain names - concepts and facilities", 
+            (RFC 1035) "Domain names - implementation and 
+            specification" 
+      
+          [STD3] (RFC 1122) "Requirements for Internet Hosts -- 
+            Communication Layers", (RFC1123) "Requirements for Internet 
+            Hosts -- Application and Support" 
+      
+          [BCP18] (RFC 2277) "IETF Policy on Character Sets and 
+            Languages" 
+      
+          [RFC2279] "UTF-8, a transformation format of ISO 10646" 
+      
+          [RFC2671] "Extension Mechanisms for DNS (EDNS0)" 
+      
+          [ASCII] "ANSI X3.4-1968. USA Standard Code for Information 
+            Interchange" 
+      
+   
+  Hall                    I-D Expires: May 2002              [page 54] 
+  INTERNET-DRAFT        draft-hall-dm-idns-00.txt        November 2001 
+   
+   
+          [ISO10646] "ISO/IEC 10646-1:2000. International Standard -- 
+            Information technology -- Universal Multiple-Octet Coded 
+            Character Set (UCS) -- Part 1: Architecture and Basic 
+            Multilingual Plane" 
+      
+      
+  13.     Acknowledgements 
+      
+     This document is an assembly of multiple ideas and proposals which 
+     have been made on the IDN working group mailing list. Many of the 
+     ideas presented here have been proposed by multiple parties in one 
+     form or another, although Dan Oscarsson is credited for proposing 
+     a dual-mode operation which is capable of simultaneously 
+     supporting UTF-8 and legacy mode encodings. Other contributors to 
+     key elements from this specification (some of them unknowingly or 
+     unwillingly) include (alphabetically) Marc Blanchett, Adam 
+     Costello, Mark Davis, Martin Duerst, Patrik Faltstrom, Paul 
+     Hoffman, David Hopwood, and many others. 
+      
+      
+  14.     Editor's Address 
+      
+     Eric A. Hall 
+     ehall@ehsco.com 
+      
+      
+      
+   
+  Hall                    I-D Expires: May 2002              [page 55]