new/updated drafts

2026-06-10 18:40:00 -04:00 · 2001-03-05 12:18:56 +00:00 · 2001-03-05 12:18:56 +00:00 · fef2d3dce0
commit fef2d3dce0
parent 0015ab0974
6 changed files with 5174 additions and 2713 deletions
--- a/doc/draft/draft-ietf-idn-amc-ace-m-00.txt
+++ b/doc/draft/draft-ietf-idn-amc-ace-m-00.txt
--- a/doc/draft/draft-ietf-idn-mua-00.txt
+++ b/doc/draft/draft-ietf-idn-mua-00.txt
@ -0,0 +1,374 @@
+Internet Draft                                             Maynard Kang
+draft-ietf-idn-mua-00.txt                                   i-EMAIL.net
+February 5, 2001                                
+Expires on August 5, 2001                               
+
+          Internationalizing Domain Names in Mail User Agents
+ 
+Status of this Memo
+
+This document is an Internet-Draft and is in full conformance with all
+provisions of Section 10 of RFC2026.
+
+Internet-Drafts are working documents of the Internet Engineering Task
+Force (IETF), its areas, and its working groups. Note that other
+groups may also distribute working documents as Internet-Drafts.
+
+Internet-Drafts are draft documents valid for a maximum of six months
+and may be updated, replaced, or obsoleted by other documents at any
+time. It is inappropriate to use Internet-Drafts as reference material
+or to cite them other than as "work in progress."
+
+
+     The list of current Internet-Drafts can be accessed at
+     http://www.ietf.org/ietf/1id-abstracts.txt
+
+     The list of Internet-Draft Shadow Directories can be accessed at
+     http://www.ietf.org/shadow.html.
+
+
+
+Abstract
+
+This document describes a way where domain names used in Internet e-mail 
+can be internationalized by making changes only to end-user Mail User 
+Agents and, by doing so, avoid damaging other applications which handle
+Internet e-mail, such as Message Transfer Agents and Delivery Agents.
+
+1. Introduction
+
+One of the proposed solutions for internationalized domain names (IDN)
+involves only updating the user applications with no changes required
+to the DNS protocol, servers and resolvers [IDNA] compared to other
+solutions which require changes to be made to protocol, servers,
+resolvers and applications.
+
+The underlying principle of [IDNA] may be similarly applied to the
+Internet e-mail system today - by effecting changes to only the Mail
+User Agent (MUA) component of the e-mail system. Thus, existing
+Message Transfer Agents, Delivery Agents and other applications which 
+handle e-mail do not have to be changed at all.
+
+1.1 Definitions and Conventions
+
+Usage of terms related to the character encoding model are in
+reference to Unicode Technical Report 17 [UTR17].
+
+The terms "international character", "non-ASCII character" and 
+"multilingual character", which are used interchangeably, are taken 
+to mean any abstract character which is not included in the range 
+specified by [US-ASCII].
+
+1.2 Terminology
+
+The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
+and "MAY" in this document are to be interpreted as described in RFC 
+2119 [RFC2119].
+
+1.3. Design Philosophy
+
+As the Internet e-mail system is a diverse, distributed and 
+heterogeneous system with many vendors deploying a vast number of 
+applications, it is of utmost importance that interoperability amongst 
+these various components is maintained. Thus, the ideal solution would 
+be one which does not compromise or damage the operation of any of these 
+existing components once internationalized domain names are encountered.
+
+Also, solutions which call for changes to be made to many or even all
+components of the Internet e-mail system would require far too much
+time and effort to deploy, given that Internet e-mail has such a huge
+installed base.
+
+This solution adheres to both of the above principles, in that
+interoperability is preserved and that the cost and speed of 
+implementation is low. All that the user has to do to use IDNs in e-mail 
+is update his or her MUA.
+
+1.4. IDN Summary
+
+This solution specifies an IDN architecture of arch-3 (just send ACE)
+and a transition strategy of trans-1 (always do current plus new
+architecture) as described in [IDNCOMP]. The choice of ACE format is not 
+defined in this document, but MUST be the same as that specified in 
+[IDNA] in order to maintain uniqueness and consistency.
+
+1.5. E-mail Internationalization Summary
+
+As many Internet e-mail standards such as the SMTP protocol [RFC821]
+and the e-mail message format [RFC822] only specify usage of the 7-bit
+ASCII character set [US-ASCII], international characters which use octet-
+based character encoding schemes (CES) cannot be used in e-mail 
+transmission, headers and bodies.
+
+Although this issue has been addressed in [RFC2045] for message bodies
+and [RFC2047] for message headers through the use of a Transfer Encoding
+Syntax (TES) such as Quoted-Printable or Base64, there is no similar 
+solution which extends the functionality of [RFC821] to include usage of
+international characters, except for [RFC1652] which allows transmission 
+of 8-bit data passed by the DATA command in an SMTP session.
+
+[RFC1652] however, does not fully address the problem of using IDNs in
+an SMTP session - the IDN may be used in areas within the SMTP session 
+other than the DATA command, such as the MAIL FROM and RCPT TO commands, 
+where an IDN may be part of the e-mail address(es) specified there.
+
+Hence, this would be a major stumbling block to deploying "just-send-
+8bit" IDNs for use in Internet e-mail, as these IDNs would not be able
+to be used in SMTP e-mail transmissions due to [RFC821] restrictions.
+
+2. Architectural Overview
+
+The end-user MUA may encounter IDNs in the scenarios below:
+
+(i)   When specifying the transmission server (i.e. SMTP server)
+(ii)  When specifying the retrieval server (i.e. POP3/IMAP4/any other
+      retrieval mechanism)
+(iii) When specifying e-mail addresses during composition of a message
+(iv)  When reading messages with e-mail addresses in it
+
+As with [IDNA], the MUA is updated in a similar fashion to process IDNs 
+which are input by users and process IDNs which are displayed to users, 
+in all of the scenarios above.
+
+For (i) and (ii), the IDN MUST be handled in the same manner as 
+specified in [IDNA]. The method of handling an IDN For (iii) and (iv) is
+described below in 2.1.
+
+2.1 Interfaces between E-mail components when composing/reading a mail
+
+The interfaces between e-mail components can be pictorially represented 
+as shown below.
+
+The example assumes the setup of a POP3/IMAP4 retrieval client and 
+server, but the exact nature of end-to-end e-mail transmission may vary
+accordingly (e.g. elm or pine would read directly from the mail store). 
+However, these variations do not impact an accurate description of this 
+solution to a large extent as no changes are required at these levels.
+
+        +------+                                       +------+
+        | User |                                       | User |
+        +------+                                       +---^--|
+          | User Input:          User Display: Characters/ |
+          | Keyboard/Pen/etc        Glyphs on CRT or other |
+    +-----v---------------+    Representation (e.g. sound) |
+    | Input Method Editor |                   +------------|-----+
+    +---------------------+                   | Rendering Engine |
+        | Input: Any localized/               +---------^--------+
+        | internationalized      Output: Any localized/ |
+        | charset                     internationalized |
+   +----v-----------------+                     charset |
+   | +------------------+ |                  +----------|-------------+
+   | | Mail Composition | |                  | +--------------+       |
+   | | Interface        | | Sender's         | | Mail Reading |       |
+   | +------------------+ | MUA              | | Interface    |       |
+   |    |                 |                  | +--------^-----+       |
+   |    | Nameprepped ACE |       Receiver's |          | Nameprepped |
+   |    v                 |              MUA |          | ACE         |
+   | +-------------+      |                  | +-------------------+  |
+   | | SMTP Client |      |                  | | POP3/IMAP4 Client |  |
+   | +-------------+      |                  | +-------------------+  |
+   +----|-----------------+                  +----------^-------------+
+        | Nameprepped                                   | Nameprepped
+        v ACE         Nameprepped       Nameprepped     | ACE
+     +-------------+  ACE   +------------+  ACE   +-------------------+
+     | SMTP Server | -----> | Mail Store | -----> | POP3/IMAP4 Server |
+     +-------------+        +------------+        +-------------------+
+
+2.1.1 Interface between User and Input Method Editor
+
+For ASCII characters, input is straightforward: the user types on the 
+keyboard and whichever character that is pressed is sent to the 
+application.
+
+However, for international characters, the end-user has to use a script-
+specific Input Method Editor (IME), which may or may not be built-into
+the OS, to interpret what the user communicates to the system and
+thereafter send the respective international characters to the 
+application.
+
+For example, for input of Chinese characters, some users use IMEs
+which support the "Pinyin" input method. When a user types "zhongguo" 
+(in ASCII characters) on the keyboard and selects the characters which
+represent "China" (in Chinese) from a list, the IME sends the 
+international characters to the application in a user-determined 
+charset (e.g. GB2312).
+
+2.1.2 Interface between Input Method Editor and MUA Composition 
+      Interface
+
+The MUA mail composition interface (i.e. the "Compose Message"
+function of the MUA) SHOULD be able to accept IDNs using 8-bit character 
+encoding schemes, including those represented in any localized (e.g. 
+GB2312) or internationalized (e.g. UTF-8) charsets.
+
+This input typically takes place where e-mail addresses are entered
+such as the "From", "To", "Cc", "Bcc" fields, amongst others, as IDNs 
+may be used at the right-hand-side of the "@" sign in an e-mail address
+(domain-parts).
+
+The mail composition interface MAY allow ACE input for the same
+reasons as specified in [IDNA], but is not recommended as ACE is opaque 
+and ugly.
+
+2.1.3 Interface between MUA Composition Interface and SMTP Client
+
+The MUA composition interface communicates with the SMTP client in the
+MUA typically through internal function calls within the software itself
+or through an API. It is at this level where ACE conversion of any IDN
+encountered by the MUA composition interface takes place.
+
+Before converting the name parts of the IDN into ACE, the MUA MUST
+prepare each name part as specified in [NAMEPREP]. Thereafter, the MUA 
+MUST convert the name parts into ACE before passing any data to the SMTP
+client.
+
+The SMTP client then prepares the e-mail for transmission using the
+SMTP protocol [RFC821], and thereafter establishes an SMTP connection 
+with the user-specified SMTP server to transmit the e-mail.
+
+It is important to note that an IDN specified in the parameters of any
+SMTP command MUST be represented in nameprepped ACE at this point in 
+time. This includes SMTP commands which require domain parameters (such 
+as the HELO and EHLO commands) and commands where e-mail addresses are 
+specified (such as the MAIL FROM, RCPT TO, DATA, VRFY, EXPN, SEND, SOML 
+and SAML commands).
+
+As for data passed by the DATA command, ACE conversion MUST be
+performed when the "domain" portion of an "addr-spec" or when a "domain" 
+itself, within the context of [RFC822], is encountered. This is 
+necessary as an updated MUA may originate a message which is read by a 
+non-updated MUA. If this happens, the non-updated MUA may face 
+operational problems dealing with IDNs that appear in the "addr-spec" 
+which are not in ACE.
+
+Any transfer encoding syntax to be applied to the mail headers as
+specified in [RFC2047] SHOULD be performed before nameprepped ACE 
+conversion. This is to reduce confusion between IDNs within "addr-spec" 
+and "domain" portions, in the context of [RFC822], and IDNs which appear 
+as arbitrary data in mail headers and bodies.
+
+2.1.4. Interface between POP3/IMAP4 client (or local mail store) and 
+       Mail Reading Interface
+
+The MUA mail reading interface (i.e. "Read mail" function of an MUA)
+typically displays e-mail data retrieved from either a POP3/IMAP4
+client or from a local mail store through internal function calls within 
+the MUA software or through an API.
+
+When e-mail containing an ACE-represented IDN is to be displayed, the
+MUA SHOULD convert the ACE-represented IDN contained within the
+"addr-spec" or "domain" portion specified in [RFC822] back into any 
+localized or internationalized charset of the user's choice, whenever 
+possible. In the event that it is impossible to achieve conversion back 
+into the selected localized charset (for example, conversion of RACE-
+represented Hangeul characters into ISO-8859-1 is impossible), the MUA 
+should prompt the user with an error message.
+
+It may be possible to save and retrieve information about the original
+charset of the ACE-converted IDN through the use of additional
+[RFC822] mail headers, but that is not (yet) addressed by this memo.
+
+Although it is possible to render ACE into properly decoded glyphs and
+display the actual abstract characters without any conversion to other
+charsets, the MUA SHOULD NOT do this as it is not the primary function
+of an MUA to render characters. This should be left to a rendering 
+engine which is separate from the MUA and typically embedded into the 
+OS. It is sufficient for the MUA to pass the appropriate charset to the
+rendering engine for proper display.
+
+3. ACE Length Considerations
+
+As [RFC821] in Section 4.5.3 restricts the maximum total length of a
+domain name to 64 characters, representation of IDNs using ACE may
+pose a potential problem. Most ACEs typically require 3-4 ASCII 
+characters to represent one international character (especially in the 
+case of CJK characters, where compression is less effective).
+
+That would leave only about 16-24 characters for the whole IDN,
+including all name parts and dots. This is highly undesirable as some 
+languages such as Arabic are unable to be abbreviated and the domain 
+names may require a larger length than that which is allowed by 
+[RFC821].
+
+To further complicate matters, several mailing list software such as
+ezmlm embed domain names into the local-parts portion of an e-mail 
+address during management of subscriptions, together with randomly-
+generated subscription information. This would leave an even smaller 
+maximum ACE length, if interoperability with these mailing list software 
+were to be maintained, given that there is also a 64 character 
+restriction on local parts.
+
+4. Security Considerations
+
+As this memo is based on [IDNA], security considerations are similar
+to that faced by [IDNA]. This includes security considerations from
+[NAMEPREP] as well.
+
+5. Other Considerations
+
+Although this document addresses end-user MUAs (e.g. elm, mutt, pine,
+Eudora, Outlook Express, etc) to a large extent, the definition of an
+MUA could be extended to include web-based e-mail server software and
+automated programs such as mailing list management software.
+
+End-user MUAs may also include additional functionality where IDNs may
+be encountered, such as calendaring/scheduling, directory services and
+digital certificate storage. This is not (yet) addressed in this memo.
+
+6. Future Extensions
+
+It is possible to achieve internationalization of the entire e-mail
+address by representation of international characters in the local-parts 
+of an "addr-spec" using nameprepped ACE conversion in a similar fashion 
+as described in this memo.
+
+However, this is a different problem altogether and is currently beyond
+the scope of this memo.
+
+7. References
+
+[IDNA] Paul Hoffman & Patrik Faltstrom, "Internationalizing Host Names
+in Applications (IDNA)", draft-ietf-idn-idna.
+
+[UTR17] K. Whistler & M. Davis, Unicode Consortium, "Character Encoding
+Model", Unicode Technical Report #17, 
+http://www.unicode.org/unicode/reports/tr17/
+
+[US-ASCII] United States of America Standards Institute, "USA Code for 
+Information Interchange", X3.4, 1968.
+
+[RFC2119] Scott  Bradner, "Key words for  use in  RFCs to Indicate 
+Requirement Levels", March 1997, RFC 2119.
+
+[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
+Proposals", draft-ietf-idn-compare.
+
+[RFC821] Jonathan B. Postel, "Simple Mail Transfer Protocol", August 
+1982, RFC 821.
+
+[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet 
+Text Messages", August 1982, RFC 822.
+
+[RFC2045] N. Freed & N. Borenstein, "Multipurpose Internet Mail 
+Extensions (MIME) Part One: Format of Internet Message Bodies", 
+November 1996, RFC 2045.
+
+[RFC2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions) 
+Part Three: Message Header Extensions for Non-ASCII Text", November 
+1996, RFC 2047.
+
+[RFC1652] J. Klensin et al., "SMTP Service Extension for 8bit-
+MIMEtransport", July 1994, RFC 1652.
+
+
+[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
+Internationalized Host Names", draft-ietf-idn-nameprep.
+
+A. Author's Address
+
+Maynard Kang
+i-EMAIL.net Pte Ltd
+1 Kim Seng Promenade #12-07
+Great World City West Tower
+Singapore 237994
+E-mail: maynard@i-email.net
--- a/doc/draft/draft-ietf-idn-nameprep-00.txt
+++ b/doc/draft/draft-ietf-idn-nameprep-00.txt
@ -1,855 +0,0 @@
-Internet Draft                                          Paul Hoffman
-draft-ietf-idn-nameprep-00.txt                            IMC & VPNC
-July 3, 2000                                           Marc Blanchet
-Expires in six months                                       ViaGenie
-
-             Preparation of Internationalized Host Names
-
-Status of this memo
-
-This document is an Internet-Draft and is in full conformance with all
-provisions of Section 10 of RFC2026.
-
-Internet-Drafts are working documents of the Internet Engineering Task
-Force (IETF), its areas, and its working groups. Note that other groups
-may also distribute working documents as Internet-Drafts.
-
-Internet-Drafts are draft documents valid for a maximum of six months
-and may be updated, replaced, or obsoleted by other documents at any
-time. It is inappropriate to use Internet-Drafts as reference material
-or to cite them other than as "work in progress."
-
-
-     The list of current Internet-Drafts can be accessed at
-     http://www.ietf.org/ietf/1id-abstracts.txt
-
-     The list of Internet-Draft Shadow Directories can be accessed at
-     http://www.ietf.org/shadow.html.
-
-
-Abstract
-
-This document describes how to prepare internationalized host names for
-transmission on the wire. The steps include excluding characters that
-are prohibited from appearing in internationalized host names, changing
-all characters that have case properties to be lowercase, and
-normalizing the characters. Further, this document lists the prohibited
-characters.
-
-
-1. Introduction
-
-When expanding today's DNS to include internationalized host names,
-those new names will be handled in many parts of the DNS. The IDN
-Working Group's requirements document [IDNReq] describes a framework for
-domain name handling as well as requirements for the new names. The IDN
-Working Group's comparison document [IDNComp] gives a framework for how
-various parts of the IDN solution work together.
-
-A user can enter a domain name into an application program in a myriad
-of fashions. Depending on the input method, the characters entered in
-the domain name may or may not be those that are allowed in
-internationalized host names. Thus, there must be a way to canonicalized
-the user's input before the name is resolved in the DNS.
-
-It is a design goal of this document to allow users to enter host names
-in applications and have the highest chance of getting the name correct.
-This means that the user should not be limited to only entering exactly
-the characters that might have been used, but to instead be able to
-enter characters that unambiguously canonicalize to characters in the
-desired host name. At the same time, this process must not introduce any
-chance that two host names could be represented by two distinct strings
-of characters that look identical to typical users. It is also a design
-goal to have all preprocessing of IDN done before going on the wire, so
-that no transformation is done in the DNS server space.
-
-This document describes the steps needed to convert a name part from one
-that is entered by the user to one that can be used in the DNS.
-
-1.1 Terminology
-
-The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
-"MAY" in this document are to be interpreted as described in RFC 2119
-[RFC2119].
-
-Examples in this document use the notation from the Unicode Standard
-[Unicode3] as well as the ISO 10646 [ISO10646] names. For example, the
-letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER
-A". In the lists of prohibited characters, the "U+" is left off to make
-the lists easier to read.
-
-1.2 IDN summary
-
-Using the terminology in [IDNComp], this document specifies all of the
-prohibited characters and the canonicalization for an IDN solution.
-Specifically, it covers the following sections from [IDNComp]:
-
-prohib-1: Identical and near-identical characters
-prohib-2: Separators
-prohib-3: Non-displaying and non-spacing characters
-prohib-4: Private use characters
-prohib-5: Punctuation
-prohib-6: Symbols
-canon-1.2: Normalization Form KC
-canon-2.1: Case folding in ASCII
-canon-2.2: Case folding in non-ASCII
-
-Note that this document does not cover:
-canon-1.1: Normalization Form C
-canon-2.3: Han folding
-
-1.3 Open issues
-
-This is the first draft of this document. Although there has been much
-discussion on the WG mailing list about the topics here, there has not
-yet been much agreement on some issues. Now that there is a document to
-talk about, that discussion can be more focussed.
-
-1.3.1 Where to do name preparation
-
-Section 2.1 says to do name preparation in the resolver. An argument can
-be made for doing name preparation in the application, before the
-application service interface. An advantage of that proposal is that
-resolvers would not need to do any name preparation. A disadvantage is
-that applications would have to be updated each time the IDN protocol is
-updated, such as if new characters are added to the repertoire of
-allowed characters. It seems likely that resolvers are more easily
-updated than all the individual applications that use internationalized
-host names.
-
-1.3.2 Choosing between normalization form C and KC
-
-Much of the discussion of normalization on the WG mailing list assumed
-that normalization form C would be used. Near the time that this
-document was written, people started considering form KC instead of C.
-This document used form KC, but the reasons for doing so could be
-contentious.
-
-1.3.3 Does the prohibition catch all bad characters?
-
-On the mailing list, it was discussed doing prohibition in two steps: a
-short list of prohibited characters before case folding in order to
-prevent uppercase characters that have no lowercase equivalents from
-getting through, and then a full check on the output of normalization.
-In this draft, all checking is done before case folding, based on the
-(possibly wrong) assumption that none of the prohibited characters will
-re-appear after the case folding and normalization. If that assumption
-turns out to be wrong, a check for just those problematic characters can
-be added after normalization, or a full check against the prohibited
-characters can be added.
-
-
-2. Preparation Overview
-
-This section describes where name preparation happens and the steps that
-name preparation software must take.
-
-2.1 Where name preparation happens
-
-Part of the chart in section 1.4 of [IDNReq] looks like this:
-
-+---------------+
-| Application   |
-+---------------+
-      |  Application service interface
-      |  For ex. GethostbyXXXX interface
-+---------------+
-| Resolver      |
-+---------------+
-      |     <-----   DNS service interface
-+-------------------------------------------+
- 
-In this specification, the name preparation is done in the resolver,
-before the DNS service interface. That is, it is acceptable for software
-in the application service interface (such as a "GetHostByName" API) to
-pass the resolver a name that has not been prepared. However, the
-resolver MUST prepare the name as described in this specification before
-passing it to the DNS service interface.
-
-2.2 Name preparation steps
-
-The steps for preparing names are:
-
-1) Input from the application service interface -- This can be done in
-many ways and is not specified in this document
-
-2) Look for prohibited input -- Check for any characters that are not
-allowed in the input. If any are found, return an error to the
-application service interface. This step is necessary to prevent errors
-in the following two steps. This step fulfills prohib-1, prohib-2,
-prohib-3, prohib-4, prohib-5, and prohib-6 from [IDNComp].
-
-3) Fold case -- Change all uppercase characters into lowercase
-characters. Design note: this step could just as easily have been
-"change all lowercase characters into uppercase characters". However,
-the upper-to-lower folding was chosen because most users of the Internet
-today enter host names in lowercase. This step fulfills canon-2.1 and
-canon-2.2 from [IDNComp].
-
-4) Canonicalize -- Normalize the characters. This step fulfils canon-1.2
-from [IDNComp].
-
-5) Resolution of the prepared name -- This must be specified in a
-different IDN document.
-
-The above steps MUST be performed in the order given in order to comply
-with this specification.
-
-
-3. Prohibited Input
-
-Before the text can be processed, it must be checked for prohibited
-characters. There is a variety of prohibited characters, as described in
-this section.
-
-Note that one of the goals of IDN is to allow the widest possible set of
-host names as long as those host names do not cause other problems, such
-as possible ambiguity. Specifically, experience with current DNS names
-have shown that there is a desire for host names that include personal
-names, company names, and spoken phrases. A goal of this section is to
-prohibit as few characters that might be used in these contexts as
-possible while making sure that characters that might easily cause
-confusion or ambiguity are prohibited.
-
-Note that every character listed in this section MUST NOT be transmitted
-on the DNS service interface. Although the checking is being performed
-before case folding and canonicalization, those steps cannot result in
-any of these characters if these characters are not in the input stream.
-[[[NOTE: THIS STATEMENT NEEDS TO BE CHECKED ALGORITHMICALLY.]]] If a DNS
-server receives a request containing a prohibited character, then the
-IDN protocol MUST return an error message.
-
-
-Note that some characters listed in one section would also appear in
-other sections. Each character is only listed once.
-
-3.1 prohib-1: Identical and near-identical characters
-
-Many characters in [ISO10646] are identical or nearly identical to other
-characters. These were often included for compatibility with other
-character sets.
-
-The characters prohibited because they are identical or nearly identical
-to allowed characters are:
-
-00AD        SOFT HYPHEN
-00D7        MULTIPLICATION SIGN
-01C3        LATIN LETTER RETROFLEX CLICK
-02B0-02FF   [SPACING MODIFIER LETTERS]
-066D        ARABIC FIVE POINTED STAR
-1806        MONGOLIAN TODO SOFT HYPHEN
-2010        HYPHEN
-2011        NON-BREAKING HYPHEN
-2012        FIGURE DASH
-2013        EN DASH
-2014        EM DASH
-2160-217F   [ROMAN NUMERALS]
-FB1D-FB4F   [HEBREW PRESENTATION FORMS]
-FB50-FDFF   [ARABIC PRESENTATION FORMS A]
-FE20-FE2F   [COMBINING HALF MARKS]
-FE30-FE4F   [CJK COMPATIBILITY FORMS]
-FE50-FE6F   [SMALL FORM VARIANTS]
-FE70-FEFC   [ARABIC PRESENTATION FORMS B]
-FF00-FFEF   [HALFWIDTH AND FULLWIDTH FORMS]
-
-3.2 prohib-2: Separators
-
-Horizontal and vertical spacing characters would make it unclear where a
-host name begins and ends. The prohibited spacing characters are:
-
-0020        SPACE
-00A0        NO-BREAK SPACE
-1680        OGHAM SPACE MARK
-2000-200B   [SPACES]
-2028        LINE SEPARATOR
-2029        PARAGRAPH SEPARATOR
-202F        NARROW NO-BREAK SPACE
-3000        IDEOGRAPHIC SPACE
-
-Allowing periods and period-like characters as characters within a name
-part would also cause similar confusion. The prohibited periods,
-characters that look like periods, and characters that canonicalize to a
-period or to a period-like character are:
-
-002E        FULL STOP
-06D4        ARABIC FULL STOP
-2024        ONE DOT LEADER
-2025        TWO DOT LEADER
-2026        HORIZONTAL ELLIPSIS
-2488        DIGIT ONE FULL STOP
-2489        DIGIT TWO FULL STOP
-248A        DIGIT THREE FULL STOP
-248B        DIGIT FOUR FULL STOP
-248C        DIGIT FIVE FULL STOP
-248D        DIGIT SIX FULL STOP
-248E        DIGIT SEVEN FULL STOP
-248F        DIGIT EIGHT FULL STOP
-2490        DIGIT NINE FULL STOP
-2491        NUMBER TEN FULL STOP
-2492        NUMBER ELEVEN FULL STOP
-2493        NUMBER TWELVE FULL STOP
-2494        NUMBER THIRTEEN FULL STOP
-2495        NUMBER FOURTEEN FULL STOP
-2496        NUMBER FIFTEEN FULL STOP
-2497        NUMBER SIXTEEN FULL STOP
-2498        NUMBER SEVENTEEN FULL STOP
-2499        NUMBER EIGHTEEN FULL STOP
-249A        NUMBER NINETEEN FULL STOP
-249B        NUMBER TWENTY FULL STOP
-33C2        SQUARE AM
-33C2        SQUARE AM
-33C7        SQUARE CO
-33D8        SQUARE PM
-33D8        SQUARE PM
-
-3.3 prohib-3: Non-displaying and non-spacing characters
-
-There are many characters that cannot be seen in the ISO 10646 character
-set. These include control characters, non-breaking spaces, formatting
-characters, and tagging characters. These characters would certainly
-cause confusion if allowed in host names.
-
-0000-001F   [CONTROL CHARACTERS]
-007F        DELETE
-0080-009F   [CONTROL CHARACTERS]
-070F        SYRIAC ABBREVIATION MARK
-180B        MONGOLIAN FREE VARIATION SELECTOR ONE
-180C        MONGOLIAN FREE VARIATION SELECTOR TWO
-180D        MONGOLIAN FREE VARIATION SELECTOR THREE
-180E        MONGOLIAN VOWEL SEPARATOR
-200C        ZERO WIDTH NON-JOINER
-200D        ZERO WIDTH JOINER
-200E        LEFT-TO-RIGHT MARK
-200F        RIGHT-TO-LEFT MARK
-202A        LEFT-TO-RIGHT EMBEDDING
-202B        RIGHT-TO-LEFT EMBEDDING
-202C        POP DIRECTIONAL FORMATTING
-202D        LEFT-TO-RIGHT OVERRIDE
-202E        RIGHT-TO-LEFT OVERRIDE
-206A        INHIBIT SYMMETRIC SWAPPING
-206B        ACTIVATE SYMMETRIC SWAPPING
-206C        INHIBIT ARABIC FORM SHAPING
-206D        ACTIVATE ARABIC FORM SHAPING
-206E        NATIONAL DIGIT SHAPES
-206F        NOMINAL DIGIT SHAPES
-FEFF        ZERO WIDTH NO-BREAK SPACE
-FFF9        INTERLINEAR ANNOTATION ANCHOR
-FFFA        INTERLINEAR ANNOTATION SEPARATOR
-FFFB        INTERLINEAR ANNOTATION TERMINATOR
-FFFC        OBJECT REPLACEMENT CHARACTER
-FFFD        REPLACEMENT CHARACTER
-
-3.4 prohib-4: Private use characters
-
-Because private-use characters do not have defined meanings, they are
-prohibited. The private-use characters are:
-
-E000-F8FF   [PRIVATE USE, PLANE 0]
-
-3.5 prohib-5: Punctuation
-
-The following characters are reserved or delimiters in URLs [RFC2396]
-and [RFC2732]:
-
-" # $ % & + , . / : ; < = > ? @ [ ]
-
-3.5.1 Characters from URLs
-
-The following punctuation characters are prohibited because they are
-reserved or delimiters in URLs.
-
-0022        QUOTATION MARK
-0023        NUMBER SIGN
-0024        DOLLAR SIGN
-0025        PERCENT SIGN
-0026        AMPERSAND
-002B        PLUS SIGN
-002C        COMMA
-002E        FULL STOP
-002F        SOLIDUS
-003A        COLON
-003B        SEMICOLON
-003C        LESS-THAN SIGN
-003D        EQUALS SIGN
-003E        GREATER-THAN SIGN
-003F        QUESTION MARK
-0040        COMMERCIAL AT
-005B        LEFT SQUARE BRACKET
-005D        RIGHT SQUARE BRACKET
-
-3.5.2 Characters that canonicalize to characters from URLs
-
-The following punctuation characters are prohibited because their
-normalization contains one or more of the characters from section 3.5.1.
-
-037E        GREEK QUESTION MARK
-2048        QUESTION EXCLAMATION MARK
-2049        EXCLAMATION QUESTION MARK
-207A        SUPERSCRIPT PLUS SIGN
-207C        SUPERSCRIPT EQUALS SIGN
-208A        SUBSCRIPT PLUS SIGN
-208C        SUBSCRIPT EQUALS SIGN
-2100        ACCOUNT OF
-2101        ADDRESSED TO THE SUBJECT
-2105        CARE OF
-2106        CADA UNA
-
-3.5.3 Characters that look like characters from URLs
-
-The following are prohibited because they look indistinguishable from
-the characters listed in section 3.5.1.
-
-037E        GREEK QUESTION MARK
-0589        ARMENIAN FULL STOP
-060C        ARABIC COMMA
-061B        ARABIC SEMICOLON
-066A        ARABIC PERCENT SIGN
-201A        SINGLE LOW-9 QUOTATION MARK
-2030        PER MILLE SIGN
-2031        PER TEN THOUSAND SIGN
-2033        DOUBLE PRIME
-2039        SINGLE LEFT-POINTING ANGLE QUOTATION MARK
-2044        FRACTION SLASH
-203A        SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
-203D        INTERROBANG
-3001        IDEOGRAPHIC COMMA
-3002        IDEOGRAPHIC FULL STOP
-3003        DITTO MARK
-3008        LEFT ANGLE BRACKET
-3009        RIGHT ANGLE BRACKET
-3014        LEFT TORTOISE SHELL BRACKET
-3015        RIGHT TORTOISE SHELL BRACKET
-301A        LEFT WHITE SQUARE BRACKET
-301B        RIGHT WHITE SQUARE BRACKET
-
-3.5.4 Other punctuation
-
-The following punctuation are prohibited because they are unlikely to
-be used in names and may be confusing to users or to character-entry
-processes:
-
-005C        REVERSE SOLIDUS
-
-3.6 prohib-6: Symbols
-
-[UniData] has non-normative categories for symbols. The four symbol
-categories are:
-
-Symbol, Currency: Currency symbols could appear in company names and
-spoken phrases, so they are not prohibited.
-
-Symbol, Modifier: Stand-alone modifiers might appear in personal names,
-company names, and spoken phrases, so they are not prohibited.
-
-Symbol, Math: It is very unlikely that there are any significant
-personal names, company names, or spoken phrases that contain
-mathematical symbols. Further, many of these symbols are the same or
-similar to other punctuation, thereby leading to ambiguity. For this
-reason, math-specific symbols are prohibited. These prohibited math
-symbols are:
-
-00AC        NOT SIGN
-00B1        PLUS-MINUS SIGN
-2200-22FF   [MATHEMATICAL OPERATORS]
-
-Further, the following characters canonicalize to characters in the
-above math list, and therefore are also prohibited:
-
-00BC        VULGAR FRACTION ONE QUARTER
-00BD        VULGAR FRACTION ONE HALF
-00BE        VULGAR FRACTION THREE QUARTERS
-207B        SUPERSCRIPT MINUS
-208B        SUBSCRIPT MINUS
-2153        VULGAR FRACTION ONE THIRD
-2154        VULGAR FRACTION TWO THIRDS
-2155        VULGAR FRACTION ONE FIFTH
-2156        VULGAR FRACTION TWO FIFTHS
-2157        VULGAR FRACTION THREE FIFTHS
-2158        VULGAR FRACTION FOUR FIFTHS
-2159        VULGAR FRACTION ONE SIXTH
-215A        VULGAR FRACTION FIVE SIXTHS
-215B        VULGAR FRACTION ONE EIGHTH
-215C        VULGAR FRACTION THREE EIGHTHS
-215D        VULGAR FRACTION FIVE EIGHTHS
-215E        VULGAR FRACTION SEVEN EIGHTHS
-215F        FRACTION NUMERATOR ONE
-33A7        SQUARE M OVER S
-33A8        SQUARE M OVER S SQUARED
-33AE        SQUARE RAD OVER S
-33AF        SQUARE RAD OVER S SQUARED
-33C6        SQUARE C OVER KG
-
-Symbol, Other: This category covers a multitude of symbols, few of which
-would ever appear in personal names, company names, and spoken phrases.
-The rest of the prohibited symbols are:
-
-2190-21FF   [ARROWS]
-2300-23FF   [MISCELLANEOUS TECHNICAL]
-2400-243F   [CONTROL PICTURES]
-2440-245F   [OPTICAL CHARACTER RECOGNITION]
-2500-257F   [BOX DRAWING]
-2580-259F   [BLOCK ELEMENTS]
-25A0-25FF   [GEOMETRIC SHAPES]
-2600-267F   [MISCELLANEOUS SYMBOLS]
-2700-27BF   [DINGBATS]
-2800-287F   [BRAILLE PATTERNS]
-
-3.7 Additional prohibited characters
-
-3.7.1 Unassigned characters
-
-All characters not yet assigned in [ISO10646] are prohibited. Although
-this may at first seem trivial, it is extremely important because
-characters that may be assigned in the future might have properties that
-would cause them to be prohibited or might have case-folding properties.
-As is the case of all prohibited characters, if a DNS server receives a
-request containing an unassigned character, then the IDN protocol MUST
-return an error message.
-
-3.7.2 Surrogate characters
-
-So far, all proposals for binary encodings of internationalized name
-parts have specified UTF-8 as the encoding format. In such an encoding,
-surrogate characters MUST NOT be used. Therefore, for UTF-8 encodings,
-the following are prohibited:
-
-D800-DFFF   [SURROGATE CHARACTERS]
-
-3.7.3 Uppercase characters with no lowercase mappings
-
-There are many uppercase characters in [ISO10646] which do not have
-lowercase equivalents in [UniData]. Therefore, they are prohibited on
-input because they would get through the case mapping step while still
-being in uppercase.
-
-The characters that are prohibited on input because they are uppercase
-but have no lowercase mappings are:
-
-03D2        GREEK UPSILON WITH HOOK SYMBOL
-03D3        GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
-03D4        GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
-04C0        CYRILLIC LETTER PALOCHKA
-10A0-10C5   [GEORGIAN CAPITAL LETTERS]
-
-Note that many characters in the range U+1200 to U+213A, the letterlike
-symbols, also are uppercase but have no lowercase mappings. However,
-they are not listed here because the entire range is already prohibited
-in section 3.6.
-
-3.7.4 Radicals and Ideographic Description
-
-Some Han characters can be informally defined in terms of ideographic
-descriptions. However, ideographic descriptions can lead to multiple
-character streams leading to the same character in a fashion that does
-not canonicalize. Thus, the radicals for ideographic description and the
-ideographic description characters themselves are prohibited. These
-characters are:
-
-2E80-2EFF   [CJK RADICALS SUPPLEMENT]
-2F00-2FDF   [KANGXI RADICALS]
-2FF0-2FFF   [IDEOGRAPHIC DESCRIPTION CHARACTERS]
-
-3.8 Summary of prohibited characters
-
-The following is a collected list from the previous sections.
-
-0000-001F   [CONTROL CHARACTERS]
-0020        SPACE
-0022        QUOTATION MARK
-0023        NUMBER SIGN
-0024        DOLLAR SIGN
-0025        PERCENT SIGN
-0026        AMPERSAND
-002B        PLUS SIGN
-002C        COMMA
-002E        FULL STOP
-002E        FULL STOP
-002F        SOLIDUS
-003A        COLON
-003B        SEMICOLON
-003C        LESS-THAN SIGN
-003D        EQUALS SIGN
-003E        GREATER-THAN SIGN
-003F        QUESTION MARK
-0040        COMMERCIAL AT
-005B        LEFT SQUARE BRACKET
-005C        REVERSE SOLIDUS
-005D        RIGHT SQUARE BRACKET
-007F        DELETE
-0080-009F   [CONTROL CHARACTERS]
-00A0        NO-BREAK SPACE
-00AC        NOT SIGN
-00AD        SOFT HYPHEN
-00B1        PLUS-MINUS SIGN
-00BC        VULGAR FRACTION ONE QUARTER
-00BD        VULGAR FRACTION ONE HALF
-00BE        VULGAR FRACTION THREE QUARTERS
-00D7        MULTIPLICATION SIGN
-01C3        LATIN LETTER RETROFLEX CLICK
-02B0-02FF   [SPACING MODIFIER LETTERS]
-037E        GREEK QUESTION MARK
-037E        GREEK QUESTION MARK
-03D2        GREEK UPSILON WITH HOOK SYMBOL
-03D3        GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
-03D4        GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
-04C0        CYRILLIC LETTER PALOCHKA
-0589        ARMENIAN FULL STOP
-060C        ARABIC COMMA
-061B        ARABIC SEMICOLON
-066A        ARABIC PERCENT SIGN
-066D        ARABIC FIVE POINTED STAR
-06D4        ARABIC FULL STOP
-070F        SYRIAC ABBREVIATION MARK
-10A0-10C5   [GEORGIAN CAPITAL LETTERS]
-1680        OGHAM SPACE MARK
-1806        MONGOLIAN TODO SOFT HYPHEN
-180B        MONGOLIAN FREE VARIATION SELECTOR ONE
-180C        MONGOLIAN FREE VARIATION SELECTOR TWO
-180D        MONGOLIAN FREE VARIATION SELECTOR THREE
-180E        MONGOLIAN VOWEL SEPARATOR
-2000-200B   [SPACES]
-200C        ZERO WIDTH NON-JOINER
-200D        ZERO WIDTH JOINER
-200E        LEFT-TO-RIGHT MARK
-200F        RIGHT-TO-LEFT MARK
-2010        HYPHEN
-2011        NON-BREAKING HYPHEN
-2012        FIGURE DASH
-2013        EN DASH
-2014        EM DASH
-201A        SINGLE LOW-9 QUOTATION MARK
-2024        ONE DOT LEADER
-2025        TWO DOT LEADER
-2026        HORIZONTAL ELLIPSIS
-2028        LINE SEPARATOR
-2029        PARAGRAPH SEPARATOR
-202A        LEFT-TO-RIGHT EMBEDDING
-202B        RIGHT-TO-LEFT EMBEDDING
-202C        POP DIRECTIONAL FORMATTING
-202D        LEFT-TO-RIGHT OVERRIDE
-202E        RIGHT-TO-LEFT OVERRIDE
-202F        NARROW NO-BREAK SPACE
-2030        PER MILLE SIGN
-2031        PER TEN THOUSAND SIGN
-2033        DOUBLE PRIME
-2039        SINGLE LEFT-POINTING ANGLE QUOTATION MARK
-203A        SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
-203D        INTERROBANG
-2044        FRACTION SLASH
-2048        QUESTION EXCLAMATION MARK
-2049        EXCLAMATION QUESTION MARK
-206A        INHIBIT SYMMETRIC SWAPPING
-206B        ACTIVATE SYMMETRIC SWAPPING
-206C        INHIBIT ARABIC FORM SHAPING
-206D        ACTIVATE ARABIC FORM SHAPING
-206E        NATIONAL DIGIT SHAPES
-206F        NOMINAL DIGIT SHAPES
-207A        SUPERSCRIPT PLUS SIGN
-207B        SUPERSCRIPT MINUS
-207C        SUPERSCRIPT EQUALS SIGN
-208A        SUBSCRIPT PLUS SIGN
-208B        SUBSCRIPT MINUS
-208C        SUBSCRIPT EQUALS SIGN
-2100        ACCOUNT OF
-2101        ADDRESSED TO THE SUBJECT
-2105        CARE OF
-2106        CADA UNA
-2153        VULGAR FRACTION ONE THIRD
-2154        VULGAR FRACTION TWO THIRDS
-2155        VULGAR FRACTION ONE FIFTH
-2156        VULGAR FRACTION TWO FIFTHS
-2157        VULGAR FRACTION THREE FIFTHS
-2158        VULGAR FRACTION FOUR FIFTHS
-2159        VULGAR FRACTION ONE SIXTH
-215A        VULGAR FRACTION FIVE SIXTHS
-215B        VULGAR FRACTION ONE EIGHTH
-215C        VULGAR FRACTION THREE EIGHTHS
-215D        VULGAR FRACTION FIVE EIGHTHS
-215E        VULGAR FRACTION SEVEN EIGHTHS
-215F        FRACTION NUMERATOR ONE
-2160-217F   [ROMAN NUMERALS]
-2190-21FF   [ARROWS]
-2200-22FF   [MATHEMATICAL OPERATORS]
-2300-23FF   [MISCELLANEOUS TECHNICAL]
-2400-243F   [CONTROL PICTURES]
-2440-245F   [OPTICAL CHARACTER RECOGNITION]
-2488        DIGIT ONE FULL STOP
-2489        DIGIT TWO FULL STOP
-248A        DIGIT THREE FULL STOP
-248B        DIGIT FOUR FULL STOP
-248C        DIGIT FIVE FULL STOP
-248D        DIGIT SIX FULL STOP
-248E        DIGIT SEVEN FULL STOP
-248F        DIGIT EIGHT FULL STOP
-2490        DIGIT NINE FULL STOP
-2491        NUMBER TEN FULL STOP
-2492        NUMBER ELEVEN FULL STOP
-2493        NUMBER TWELVE FULL STOP
-2494        NUMBER THIRTEEN FULL STOP
-2495        NUMBER FOURTEEN FULL STOP
-2496        NUMBER FIFTEEN FULL STOP
-2497        NUMBER SIXTEEN FULL STOP
-2498        NUMBER SEVENTEEN FULL STOP
-2499        NUMBER EIGHTEEN FULL STOP
-249A        NUMBER NINETEEN FULL STOP
-249B        NUMBER TWENTY FULL STOP
-2500-257F   [BOX DRAWING]
-2580-259F   [BLOCK ELEMENTS]
-25A0-25FF   [GEOMETRIC SHAPES]
-2600-267F   [MISCELLANEOUS SYMBOLS]
-2700-27BF   [DINGBATS]
-2800-287F   [BRAILLE PATTERNS]
-2E80-2EFF   [CJK RADICALS SUPPLEMENT]
-2F00-2FDF   [KANGXI RADICALS]
-2FF0-2FFF   [IDEOGRAPHIC DESCRIPTION CHARACTERS]
-3000        IDEOGRAPHIC SPACE
-3001        IDEOGRAPHIC COMMA
-3002        IDEOGRAPHIC FULL STOP
-3003        DITTO MARK
-3008        LEFT ANGLE BRACKET
-3009        RIGHT ANGLE BRACKET
-33A7        SQUARE M OVER S
-33A8        SQUARE M OVER S SQUARED
-33AE        SQUARE RAD OVER S
-33AF        SQUARE RAD OVER S SQUARED
-33C2        SQUARE AM
-33C2        SQUARE AM
-33C6        SQUARE C OVER KG
-33C7        SQUARE CO
-33D8        SQUARE PM
-33D8        SQUARE PM
-D800-DFFF   [SURROGATE CHARACTERS]
-E000-F8FF   [PRIVATE USE, PLANE 0]
-FB1D-FB4F   [HEBREW PRESENTATION FORMS]
-FB50-FDFF   [ARABIC PRESENTATION FORMS A]
-FE20-FE2F   [COMBINING HALF MARKS]
-FE30-FE4F   [CJK COMPATIBILITY FORMS]
-FE50-FE6F   [SMALL FORM VARIANTS]
-FE70-FEFC   [ARABIC PRESENTATION FORMS B]
-FEFF        ZERO WIDTH NO-BREAK SPACE
-FF00-FFEF   [HALFWIDTH AND FULLWIDTH FORMS]
-FFF9        INTERLINEAR ANNOTATION ANCHOR
-FFFA        INTERLINEAR ANNOTATION SEPARATOR
-FFFB        INTERLINEAR ANNOTATION TERMINATOR
-FFFC        OBJECT REPLACEMENT CHARACTER
-FFFD        REPLACEMENT CHARACTER
-Unassigned characters
-
-
-4. Case Folding
-
-After it has been verified that the input text has none of the
-characters prohibited for case folding, the case-folding step itself is
-quite straight-forward. For each character in the input, if there is a
-lowercase mapping for that character in [UniData], the input character
-is changed to the mapped lowercase letter.
-
-
-5. Canonicalization
-
-After case folding, the input string is normalized using form KC, as
-described in [UTR15].
-
-6. IDN Table Revisions
-
-A table consisting of all characters allowed and prohibited and the
-rules for case folding and canonicalization will be created based on the
-content of the [UniData] and on the content of this document. This table
-will be the authority for implementations to follow and will be
-normatively referenced by this document. Such a table will enable the
-IDN protocol to have versions independent of the revisions to Unicode
-and/or to ISO 10646 because the revision of IDN and its deployment may
-not in sync with revisions to Unicode and ISO 10646.
-
-In a future draft of this document, IANA will be asked to keep this
-table, with an initial version number of 1. Each new version of the
-table will have a new, higher version number.
-
-
-7. Security Considerations
-
-Much of the security of the Internet relies on the DNS. Thus, any change
-to the characteristics of the DNS can change the security of much of the
-Internet.
-
-Host names are used by users to connect to Internet servers. The
-security of the Internet would be compromised if a user entering a
-single internationalized name could be connected to different servers
-based on different interpretations of the internationalized host name.
-
-
-8. References
-
-[IDNComp] Paul Hoffman, "Comparison of Internationalized Domain Name
-Proposals", draft-ietf-idn-compare.
-
-[IDNReq] James Seng, "Requirements of Internationalized Domain Names",
-draft-ietf-idn-requirement.
-
-[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
-technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
-1: Architecture and Basic Multilingual Plane.  Five amendments and a
-technical corrigendum have been published up to now. UTF-16 is described
-in Annex Q, published as Amendment 1. 17 other amendments are currently
-at various stages of standardization. [[[ THIS REFERENCE NEEDS TO BE
-UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]]
-
-[Normalize] Character Normalization in IETF Protocols,
-draft-duerst-i18n-norm-03
-
-[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
-Requirement Levels", March 1997, RFC 2119.
-
-[RFC2396] Tim Berners-Lee, et. al., "Uniform Resource Identifiers (URI):
-Generic Syntax", August 1998, RFC 2396.
-
-[RFC2732] Robert Hinden, et. al., Format for Literal IPv6 Addresses in
-URL's, December 1999, RFC 2732.
-
-[STD13] Paul Mockapetris, "Domain names - implementation and
-specification", November 1987, STD 13 (RFC 1035).
-
-[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version
-3.0", ISBN 0-201-61633-5. Described at
-<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.
-
-[UniData] The Unicode Consortium. UnicodeData File.
-<ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt>.
-
-[UTR15] Mark Davis and Martin Duerst. Unicode Normalization Forms.
-Unicode Technical Report #15.
-<http://www.unicode.org/unicode/reports/tr15/>.
-
-
-A. Acknowledgements
-
-Many people from the IETF IDN Working Group and the Unicode Technical
-Committee contributed ideas that went into the first draft of this
-document. Mark Davis was particularly helpful in some of the early
-ideas.
-
-
-B. Changes From Previous Versions of this Draft
-
-This is the -00 version, so there are no changes.
-
-
-C. IANA Considerations
-
-There are no specific IANA considerations in this draft, but there will
-be in a future draft of this document.
-
-
-D. Author Contact Information
-
-Paul Hoffman
-Internet Mail Consortium and VPN Consortium
-127 Segre Place
-Santa Cruz, CA  95060 USA
-paul.hoffman@imc.org and paul.hoffman@vpnc.org
-
-Marc Blanchet
-Viagenie inc.
-2875 boul. Laurier, bur. 300
-Ste-Foy, Quebec, Canada, G1V 2M2
-Marc.Blanchet@viagenie.qc.ca
--- a/doc/draft/draft-ietf-idn-nameprep-02.txt
+++ b/doc/draft/draft-ietf-idn-nameprep-02.txt
--- a/doc/draft/draft-ietf-idn-uri-00.txt
+++ b/doc/draft/draft-ietf-idn-uri-00.txt
@ -0,0 +1,269 @@
+INTERNET-DRAFT                                          Martin Duerst
+draft-ietf-idn-uri-00                             W3C/Keio University
+Expires July 2001                                     January 6, 2001
+
+
+           Internationalized Domain Names in URIs and IRIs
+
+Status of this Memo
+
+This document is an Internet-Draft and is in full conformance with all
+provisions of Section 10 of RFC2026.
+
+Internet-Drafts are working documents of the Internet Engineering Task
+Force (IETF), its areas, and its working groups.  Note that other
+groups may also distribute working documents as Internet-Drafts.
+
+Internet-Drafts are draft documents valid for a maximum of six months
+and may be updated, replaced, or obsoleted by other documents at any
+time.  It is inappropriate to use Internet- Drafts as reference
+material or to cite them other than as "work in progress."
+
+The list of current Internet-Drafts can be accessed at
+http://www.ietf.org/ietf/1id-abstracts.txt.
+
+The list of Internet-Draft Shadow Directories can be accessed at
+http://www.ietf.org/shadow.html.
+
+
+Abstract
+
+This document is a first draft for the provisions necessary to
+upgrade the definitions of URIs [RFC 2396] and IRIs (Internationalized
+Resource Identifiers, [IRI]) to work with internationalized domain
+names.
+
+
+1. Introduction
+
+Internet domain names serve to identify hosts and services on the
+Internet in a convenient way. The IETF IDN working group is currently
+working on extending the character repertoire usable in domain names
+beyond a subset of US-ASCII.
+
+One of the most important places where domain names appear are
+Uniform Resource Identifiers (URIs, [RFC 2396], as modified by
+[RFC2732]). However, in the current definition of the generic URI
+syntax, the restrictions on domain names are 'hard-coded'. This
+document proposes to relax these restrictions by updating the syntax,
+and defines how internationalized domain names are encoded in URIs.
+
+URIs themselves are restricted to a subset of US-ASCII. However,
+there is a proposal for relieving these restrictions by creating
+a new protocol element called an IRI (Internationalized Resource
+Identifier [IRI]). While IRIs in general allow the use of non-ASCII
+characters, the syntax of IRIs has the same restriction for domain
+names as the syntaxt of URIs. This document proposes to relax these
+restrictions, too, in a way that is compatible with the new syntax
+for URIs. This means that encoding an internationalized domain name in
+an URI and encoding the same name in an IRI will produce an URI and an
+IRI that can be converted into each other using the procedures defined
+in [IRI] for these conversions.
+
+2. URI syntax changes
+
+The syntax of URIs [RFC2326] currently contains the following rules
+relevant to domain names:
+
+       hostname      = *( domainlabel "." ) toplabel [ "." ]
+       domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
+       toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
+
+The later two rules are changed as follows:
+
+       domainlabel   = escalphanum | escalphanum *( escalphanum | "-" )
+                       escalphanum
+       toplabel      = escalpha | escalpha *( escalphanum | "-" )
+                       escalphanum
+
+and the following rules are added:
+
+       escalphanum   = escaped8 | alphanum
+       escalpha      = elcaped8 | alpha
+       escaped8      = "%" hexdig8 HEXDIG
+       hexdig8       = <<HEXDIG greater than 7>>
+
+The %HH escaping is used to encode characters outside the repertoire
+of US-ASCII. This is done by first encoding the characters in UTF-8
+[RFC 2279], resulting in a sequence of octets, and then escaping these
+octets.
+
+Using UTF-8 assures that this encoding interoperates with IRIs (see
+Section 3). It is also alligned with the recommendations in [RFC 2277]
+and [RFC 2718], and is consistent with the URN syntax [RFC2141] as
+well as recent URL scheme definitions that define encodings of
+non-ASCII characters based on (e.g., IMAP URLs [RFC 2192] and POP URLs
+[RFC 2384]).
+
+Please note that the use of UTF-8 for encoding internationalized
+domain names in URIs is independent of the choice of encoding chosen
+for these names in the DNS protocol. In case something else than UTF-8
+is chosen for the later, a future version of this document may give
+instructions for the conversion if deemed necessary.
+
+The above syntax rules do not extend the possible domain names based
+on US-ASCII characters. This may have to be changed in case the IDN
+WG should decide to allow such extensions.
+
+The above rules also do not allow escaping of US-ASCII characters,
+although this is allowed in the other parts of an URI (except for the
+special provisions in case of reserved characters). Allowing such
+escaping would make the syntax rules quite a bit more complicated,
+would mean that the restrictions on US-ASCII characters can be
+circumvented by using escaping, or would lead to much simpler syntax
+rules that don't express these restrictions anymore. Even in case
+escaping of US-ASCII characters is allowed in order to simplify
+processing, it should be noted that it is always better not to escape
+US-ASCII characters in domain names because of the possibility that
+a resolver cannot unescape them. At least purely US-ASCII domain names
+would then always be resolved by such a processor.
+
+While only the restrictions on US-ASCII characters are expressed in the 
+rules above, all the other restrictions on internationalized
+domain names that will be defined by the IDN WG MUST be respected.
+
+The work of the IDN WG currently includes some procedures for name
+preparation. Before encoding an internationalized domain name in an
+URI, this preparation step SHOULD be applied. However, the resolver
+MUST also apply name preparation.
+
+
+2. IRI syntax changes
+
+The syntax of IRIs [IRI] currently contains the following rules
+relevant to domain names:
+
+       hostname      = *( domainlabel "." ) toplabel [ "." ]
+       domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
+       toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
+
+The later two rules are changed as follows:
+
+       domainlabel   = intalphanum | intalphanum *( intalphanum | "-" )
+                       intalphanum
+       toplabel      = intalpha | intalpha *( intalphanum | "-" )
+                       intalphanum
+
+and the following rules are added:
+
+       intalphanum   = ichar | alphanum | escaped8
+       intalpha      = ichar | alpha | escaped8
+       escaped8      = "%" hexdig8 HEXDIG
+       hexdig8       = <<HEXDIG greater than 7>>
+
+where ichar, as in [IRI], is:
+
+       ichar         =  << any character of UCS [ISO10646] beyond
+                           U+0080, subject to limitations in Section
+                           3.1. of [IRI] >>
+
+With respect to the allowed domain names based on US-ASCII characters,
+the same considerations as in Section 2 apply.
+
+As in Section 2, all the other restrictions on internationalized
+domain names that will be defined by the IDN WG MUST be respected.
+Also, before encoding an internationalized domain name in an IRI,
+name preparation SHOULD be applied. However, the IRI resolver MUST
+also apply name preparation.
+
+It is expected that the rules in Section 3.1 of [IRI] will be less
+restrictive than the rules for internationalized domain names, so that
+no escaping is necessary. Nevertheless, escaping is allowed for cases
+where not all characters can be directly represented.
+
+
+4. Security Considerations
+
+Besides the security considerations of [RFC 2396] and [IRI] and those
+applying to the various aspects of internationalized domain names in
+general, there are currently no known security problems.
+
+
+Acknowledgements
+
+To be done.
+
+
+Copyright
+
+Copyright (C) The Internet Society, 1997. All Rights Reserved.
+
+This document and translations of it may be copied and furnished to
+others, and derivative works that comment on or otherwise explain it
+or assist in its implementation may be prepared, copied, published
+and distributed, in whole or in part, without restriction of any
+kind, provided that the above copyright notice and this paragraph
+are included on all such copies and derivative works.  However, this
+document itself may not be modified in any way, such as by removing
+the copyright notice or references to the Internet Society or other
+Internet organizations, except as needed for the purpose of
+developing Internet standards in which case the procedures for
+copyrights defined in the Internet Standards process must be
+followed, or as required to translate it into languages other
+than English.
+
+The limited permissions granted above are perpetual and will not be
+revoked by the Internet Society or its successors or assigns.
+
+This document and the information contained herein is provided on an
+"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
+
+
+Author's address
+
+          Martin J. Duerst
+          W3C/Keio University
+          5322 Endo, Fujisawa
+          252-8520 Japan
+          duerst@w3.org
+          http://www.w3.org/People/D%C3%BCrst/
+          Tel/Fax: +81 466 49 1170
+
+          Note: Please write "Duerst" with u-umlaut wherever
+                possible, e.g. as "D&#252;rst" in XML and HTML.
+
+
+References
+
+[IRI] L. Masinter, M. Duerst, "Internationalized Resource Identifiers
+  (IRI)", Internet Draft, January 2001,
+  <http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-06.txt>,
+  work in progress.
+
+[ISO10646] ISO/IEC, Information Technology - Universal Multiple-Octet
+  Coded Character Set (UCS) - Part 1: Architecture and Basic
+  Multilingual Plane, Oct. 2000, with amendments.
+
+[RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate
+  Requirement Levels", March 1997.
+
+[RFC 2141] R. Moats, "URN Syntax", May 1997.
+
+[RFC 2192] C. Newman, "IMAP URL Scheme", September 1997.
+
+[RFC 2277] H. Alvestrad, "IETF Policy on Character Sets and
+  Languages".
+
+[RFC 2279] F. Yergeau. "UTF-8, a transformation format of ISO 10646.",
+  January 1998.
+
+[RFC 2384] R. Gellens, "POP URL Scheme", August 1998.
+
+[RFC 2396] T.Berners-Lee, R.Fielding, L.Masinter. "Uniform Resource
+  Identifiers (URI): Generic Syntax." August, 1998.
+
+[RFC 2640] B. Curtis, "Internationalization of the File Transfer
+  Protocol", July 1999.
+
+[RFC 2718] L. Masinter, H. Alvestrand, D. Zigmond, R. Petke,
+  "Guidelines for new URL Schemes", November 1999.
+
+[RFC 2732] R. Hinden, B. Carpenter, L. Masinter, "Format for Literal
+  IPv6 Addresses in URL's", December 1999.
+
+
+
--- a/doc/draft/draft-macgowan-dnsext-label-intel-manage-00.txt
+++ b/doc/draft/draft-macgowan-dnsext-label-intel-manage-00.txt