mirror of
https://github.com/isc-projects/bind9.git
synced 2026-06-10 18:40:00 -04:00
new/updated drafts
This commit is contained in:
parent
0015ab0974
commit
fef2d3dce0
6 changed files with 5174 additions and 2713 deletions
1741
doc/draft/draft-ietf-idn-amc-ace-m-00.txt
Normal file
1741
doc/draft/draft-ietf-idn-amc-ace-m-00.txt
Normal file
File diff suppressed because it is too large
Load diff
374
doc/draft/draft-ietf-idn-mua-00.txt
Normal file
374
doc/draft/draft-ietf-idn-mua-00.txt
Normal file
|
|
@ -0,0 +1,374 @@
|
|||
Internet Draft Maynard Kang
|
||||
draft-ietf-idn-mua-00.txt i-EMAIL.net
|
||||
February 5, 2001
|
||||
Expires on August 5, 2001
|
||||
|
||||
Internationalizing Domain Names in Mail User Agents
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task
|
||||
Force (IETF), its areas, and its working groups. Note that other
|
||||
groups may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference material
|
||||
or to cite them other than as "work in progress."
|
||||
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This document describes a way where domain names used in Internet e-mail
|
||||
can be internationalized by making changes only to end-user Mail User
|
||||
Agents and, by doing so, avoid damaging other applications which handle
|
||||
Internet e-mail, such as Message Transfer Agents and Delivery Agents.
|
||||
|
||||
1. Introduction
|
||||
|
||||
One of the proposed solutions for internationalized domain names (IDN)
|
||||
involves only updating the user applications with no changes required
|
||||
to the DNS protocol, servers and resolvers [IDNA] compared to other
|
||||
solutions which require changes to be made to protocol, servers,
|
||||
resolvers and applications.
|
||||
|
||||
The underlying principle of [IDNA] may be similarly applied to the
|
||||
Internet e-mail system today - by effecting changes to only the Mail
|
||||
User Agent (MUA) component of the e-mail system. Thus, existing
|
||||
Message Transfer Agents, Delivery Agents and other applications which
|
||||
handle e-mail do not have to be changed at all.
|
||||
|
||||
1.1 Definitions and Conventions
|
||||
|
||||
Usage of terms related to the character encoding model are in
|
||||
reference to Unicode Technical Report 17 [UTR17].
|
||||
|
||||
The terms "international character", "non-ASCII character" and
|
||||
"multilingual character", which are used interchangeably, are taken
|
||||
to mean any abstract character which is not included in the range
|
||||
specified by [US-ASCII].
|
||||
|
||||
1.2 Terminology
|
||||
|
||||
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
|
||||
and "MAY" in this document are to be interpreted as described in RFC
|
||||
2119 [RFC2119].
|
||||
|
||||
1.3. Design Philosophy
|
||||
|
||||
As the Internet e-mail system is a diverse, distributed and
|
||||
heterogeneous system with many vendors deploying a vast number of
|
||||
applications, it is of utmost importance that interoperability amongst
|
||||
these various components is maintained. Thus, the ideal solution would
|
||||
be one which does not compromise or damage the operation of any of these
|
||||
existing components once internationalized domain names are encountered.
|
||||
|
||||
Also, solutions which call for changes to be made to many or even all
|
||||
components of the Internet e-mail system would require far too much
|
||||
time and effort to deploy, given that Internet e-mail has such a huge
|
||||
installed base.
|
||||
|
||||
This solution adheres to both of the above principles, in that
|
||||
interoperability is preserved and that the cost and speed of
|
||||
implementation is low. All that the user has to do to use IDNs in e-mail
|
||||
is update his or her MUA.
|
||||
|
||||
1.4. IDN Summary
|
||||
|
||||
This solution specifies an IDN architecture of arch-3 (just send ACE)
|
||||
and a transition strategy of trans-1 (always do current plus new
|
||||
architecture) as described in [IDNCOMP]. The choice of ACE format is not
|
||||
defined in this document, but MUST be the same as that specified in
|
||||
[IDNA] in order to maintain uniqueness and consistency.
|
||||
|
||||
1.5. E-mail Internationalization Summary
|
||||
|
||||
As many Internet e-mail standards such as the SMTP protocol [RFC821]
|
||||
and the e-mail message format [RFC822] only specify usage of the 7-bit
|
||||
ASCII character set [US-ASCII], international characters which use octet-
|
||||
based character encoding schemes (CES) cannot be used in e-mail
|
||||
transmission, headers and bodies.
|
||||
|
||||
Although this issue has been addressed in [RFC2045] for message bodies
|
||||
and [RFC2047] for message headers through the use of a Transfer Encoding
|
||||
Syntax (TES) such as Quoted-Printable or Base64, there is no similar
|
||||
solution which extends the functionality of [RFC821] to include usage of
|
||||
international characters, except for [RFC1652] which allows transmission
|
||||
of 8-bit data passed by the DATA command in an SMTP session.
|
||||
|
||||
[RFC1652] however, does not fully address the problem of using IDNs in
|
||||
an SMTP session - the IDN may be used in areas within the SMTP session
|
||||
other than the DATA command, such as the MAIL FROM and RCPT TO commands,
|
||||
where an IDN may be part of the e-mail address(es) specified there.
|
||||
|
||||
Hence, this would be a major stumbling block to deploying "just-send-
|
||||
8bit" IDNs for use in Internet e-mail, as these IDNs would not be able
|
||||
to be used in SMTP e-mail transmissions due to [RFC821] restrictions.
|
||||
|
||||
2. Architectural Overview
|
||||
|
||||
The end-user MUA may encounter IDNs in the scenarios below:
|
||||
|
||||
(i) When specifying the transmission server (i.e. SMTP server)
|
||||
(ii) When specifying the retrieval server (i.e. POP3/IMAP4/any other
|
||||
retrieval mechanism)
|
||||
(iii) When specifying e-mail addresses during composition of a message
|
||||
(iv) When reading messages with e-mail addresses in it
|
||||
|
||||
As with [IDNA], the MUA is updated in a similar fashion to process IDNs
|
||||
which are input by users and process IDNs which are displayed to users,
|
||||
in all of the scenarios above.
|
||||
|
||||
For (i) and (ii), the IDN MUST be handled in the same manner as
|
||||
specified in [IDNA]. The method of handling an IDN For (iii) and (iv) is
|
||||
described below in 2.1.
|
||||
|
||||
2.1 Interfaces between E-mail components when composing/reading a mail
|
||||
|
||||
The interfaces between e-mail components can be pictorially represented
|
||||
as shown below.
|
||||
|
||||
The example assumes the setup of a POP3/IMAP4 retrieval client and
|
||||
server, but the exact nature of end-to-end e-mail transmission may vary
|
||||
accordingly (e.g. elm or pine would read directly from the mail store).
|
||||
However, these variations do not impact an accurate description of this
|
||||
solution to a large extent as no changes are required at these levels.
|
||||
|
||||
+------+ +------+
|
||||
| User | | User |
|
||||
+------+ +---^--|
|
||||
| User Input: User Display: Characters/ |
|
||||
| Keyboard/Pen/etc Glyphs on CRT or other |
|
||||
+-----v---------------+ Representation (e.g. sound) |
|
||||
| Input Method Editor | +------------|-----+
|
||||
+---------------------+ | Rendering Engine |
|
||||
| Input: Any localized/ +---------^--------+
|
||||
| internationalized Output: Any localized/ |
|
||||
| charset internationalized |
|
||||
+----v-----------------+ charset |
|
||||
| +------------------+ | +----------|-------------+
|
||||
| | Mail Composition | | | +--------------+ |
|
||||
| | Interface | | Sender's | | Mail Reading | |
|
||||
| +------------------+ | MUA | | Interface | |
|
||||
| | | | +--------^-----+ |
|
||||
| | Nameprepped ACE | Receiver's | | Nameprepped |
|
||||
| v | MUA | | ACE |
|
||||
| +-------------+ | | +-------------------+ |
|
||||
| | SMTP Client | | | | POP3/IMAP4 Client | |
|
||||
| +-------------+ | | +-------------------+ |
|
||||
+----|-----------------+ +----------^-------------+
|
||||
| Nameprepped | Nameprepped
|
||||
v ACE Nameprepped Nameprepped | ACE
|
||||
+-------------+ ACE +------------+ ACE +-------------------+
|
||||
| SMTP Server | -----> | Mail Store | -----> | POP3/IMAP4 Server |
|
||||
+-------------+ +------------+ +-------------------+
|
||||
|
||||
2.1.1 Interface between User and Input Method Editor
|
||||
|
||||
For ASCII characters, input is straightforward: the user types on the
|
||||
keyboard and whichever character that is pressed is sent to the
|
||||
application.
|
||||
|
||||
However, for international characters, the end-user has to use a script-
|
||||
specific Input Method Editor (IME), which may or may not be built-into
|
||||
the OS, to interpret what the user communicates to the system and
|
||||
thereafter send the respective international characters to the
|
||||
application.
|
||||
|
||||
For example, for input of Chinese characters, some users use IMEs
|
||||
which support the "Pinyin" input method. When a user types "zhongguo"
|
||||
(in ASCII characters) on the keyboard and selects the characters which
|
||||
represent "China" (in Chinese) from a list, the IME sends the
|
||||
international characters to the application in a user-determined
|
||||
charset (e.g. GB2312).
|
||||
|
||||
2.1.2 Interface between Input Method Editor and MUA Composition
|
||||
Interface
|
||||
|
||||
The MUA mail composition interface (i.e. the "Compose Message"
|
||||
function of the MUA) SHOULD be able to accept IDNs using 8-bit character
|
||||
encoding schemes, including those represented in any localized (e.g.
|
||||
GB2312) or internationalized (e.g. UTF-8) charsets.
|
||||
|
||||
This input typically takes place where e-mail addresses are entered
|
||||
such as the "From", "To", "Cc", "Bcc" fields, amongst others, as IDNs
|
||||
may be used at the right-hand-side of the "@" sign in an e-mail address
|
||||
(domain-parts).
|
||||
|
||||
The mail composition interface MAY allow ACE input for the same
|
||||
reasons as specified in [IDNA], but is not recommended as ACE is opaque
|
||||
and ugly.
|
||||
|
||||
2.1.3 Interface between MUA Composition Interface and SMTP Client
|
||||
|
||||
The MUA composition interface communicates with the SMTP client in the
|
||||
MUA typically through internal function calls within the software itself
|
||||
or through an API. It is at this level where ACE conversion of any IDN
|
||||
encountered by the MUA composition interface takes place.
|
||||
|
||||
Before converting the name parts of the IDN into ACE, the MUA MUST
|
||||
prepare each name part as specified in [NAMEPREP]. Thereafter, the MUA
|
||||
MUST convert the name parts into ACE before passing any data to the SMTP
|
||||
client.
|
||||
|
||||
The SMTP client then prepares the e-mail for transmission using the
|
||||
SMTP protocol [RFC821], and thereafter establishes an SMTP connection
|
||||
with the user-specified SMTP server to transmit the e-mail.
|
||||
|
||||
It is important to note that an IDN specified in the parameters of any
|
||||
SMTP command MUST be represented in nameprepped ACE at this point in
|
||||
time. This includes SMTP commands which require domain parameters (such
|
||||
as the HELO and EHLO commands) and commands where e-mail addresses are
|
||||
specified (such as the MAIL FROM, RCPT TO, DATA, VRFY, EXPN, SEND, SOML
|
||||
and SAML commands).
|
||||
|
||||
As for data passed by the DATA command, ACE conversion MUST be
|
||||
performed when the "domain" portion of an "addr-spec" or when a "domain"
|
||||
itself, within the context of [RFC822], is encountered. This is
|
||||
necessary as an updated MUA may originate a message which is read by a
|
||||
non-updated MUA. If this happens, the non-updated MUA may face
|
||||
operational problems dealing with IDNs that appear in the "addr-spec"
|
||||
which are not in ACE.
|
||||
|
||||
Any transfer encoding syntax to be applied to the mail headers as
|
||||
specified in [RFC2047] SHOULD be performed before nameprepped ACE
|
||||
conversion. This is to reduce confusion between IDNs within "addr-spec"
|
||||
and "domain" portions, in the context of [RFC822], and IDNs which appear
|
||||
as arbitrary data in mail headers and bodies.
|
||||
|
||||
2.1.4. Interface between POP3/IMAP4 client (or local mail store) and
|
||||
Mail Reading Interface
|
||||
|
||||
The MUA mail reading interface (i.e. "Read mail" function of an MUA)
|
||||
typically displays e-mail data retrieved from either a POP3/IMAP4
|
||||
client or from a local mail store through internal function calls within
|
||||
the MUA software or through an API.
|
||||
|
||||
When e-mail containing an ACE-represented IDN is to be displayed, the
|
||||
MUA SHOULD convert the ACE-represented IDN contained within the
|
||||
"addr-spec" or "domain" portion specified in [RFC822] back into any
|
||||
localized or internationalized charset of the user's choice, whenever
|
||||
possible. In the event that it is impossible to achieve conversion back
|
||||
into the selected localized charset (for example, conversion of RACE-
|
||||
represented Hangeul characters into ISO-8859-1 is impossible), the MUA
|
||||
should prompt the user with an error message.
|
||||
|
||||
It may be possible to save and retrieve information about the original
|
||||
charset of the ACE-converted IDN through the use of additional
|
||||
[RFC822] mail headers, but that is not (yet) addressed by this memo.
|
||||
|
||||
Although it is possible to render ACE into properly decoded glyphs and
|
||||
display the actual abstract characters without any conversion to other
|
||||
charsets, the MUA SHOULD NOT do this as it is not the primary function
|
||||
of an MUA to render characters. This should be left to a rendering
|
||||
engine which is separate from the MUA and typically embedded into the
|
||||
OS. It is sufficient for the MUA to pass the appropriate charset to the
|
||||
rendering engine for proper display.
|
||||
|
||||
3. ACE Length Considerations
|
||||
|
||||
As [RFC821] in Section 4.5.3 restricts the maximum total length of a
|
||||
domain name to 64 characters, representation of IDNs using ACE may
|
||||
pose a potential problem. Most ACEs typically require 3-4 ASCII
|
||||
characters to represent one international character (especially in the
|
||||
case of CJK characters, where compression is less effective).
|
||||
|
||||
That would leave only about 16-24 characters for the whole IDN,
|
||||
including all name parts and dots. This is highly undesirable as some
|
||||
languages such as Arabic are unable to be abbreviated and the domain
|
||||
names may require a larger length than that which is allowed by
|
||||
[RFC821].
|
||||
|
||||
To further complicate matters, several mailing list software such as
|
||||
ezmlm embed domain names into the local-parts portion of an e-mail
|
||||
address during management of subscriptions, together with randomly-
|
||||
generated subscription information. This would leave an even smaller
|
||||
maximum ACE length, if interoperability with these mailing list software
|
||||
were to be maintained, given that there is also a 64 character
|
||||
restriction on local parts.
|
||||
|
||||
4. Security Considerations
|
||||
|
||||
As this memo is based on [IDNA], security considerations are similar
|
||||
to that faced by [IDNA]. This includes security considerations from
|
||||
[NAMEPREP] as well.
|
||||
|
||||
5. Other Considerations
|
||||
|
||||
Although this document addresses end-user MUAs (e.g. elm, mutt, pine,
|
||||
Eudora, Outlook Express, etc) to a large extent, the definition of an
|
||||
MUA could be extended to include web-based e-mail server software and
|
||||
automated programs such as mailing list management software.
|
||||
|
||||
End-user MUAs may also include additional functionality where IDNs may
|
||||
be encountered, such as calendaring/scheduling, directory services and
|
||||
digital certificate storage. This is not (yet) addressed in this memo.
|
||||
|
||||
6. Future Extensions
|
||||
|
||||
It is possible to achieve internationalization of the entire e-mail
|
||||
address by representation of international characters in the local-parts
|
||||
of an "addr-spec" using nameprepped ACE conversion in a similar fashion
|
||||
as described in this memo.
|
||||
|
||||
However, this is a different problem altogether and is currently beyond
|
||||
the scope of this memo.
|
||||
|
||||
7. References
|
||||
|
||||
[IDNA] Paul Hoffman & Patrik Faltstrom, "Internationalizing Host Names
|
||||
in Applications (IDNA)", draft-ietf-idn-idna.
|
||||
|
||||
[UTR17] K. Whistler & M. Davis, Unicode Consortium, "Character Encoding
|
||||
Model", Unicode Technical Report #17,
|
||||
http://www.unicode.org/unicode/reports/tr17/
|
||||
|
||||
[US-ASCII] United States of America Standards Institute, "USA Code for
|
||||
Information Interchange", X3.4, 1968.
|
||||
|
||||
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", March 1997, RFC 2119.
|
||||
|
||||
[IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name
|
||||
Proposals", draft-ietf-idn-compare.
|
||||
|
||||
[RFC821] Jonathan B. Postel, "Simple Mail Transfer Protocol", August
|
||||
1982, RFC 821.
|
||||
|
||||
[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
|
||||
Text Messages", August 1982, RFC 822.
|
||||
|
||||
[RFC2045] N. Freed & N. Borenstein, "Multipurpose Internet Mail
|
||||
Extensions (MIME) Part One: Format of Internet Message Bodies",
|
||||
November 1996, RFC 2045.
|
||||
|
||||
[RFC2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions)
|
||||
Part Three: Message Header Extensions for Non-ASCII Text", November
|
||||
1996, RFC 2047.
|
||||
|
||||
[RFC1652] J. Klensin et al., "SMTP Service Extension for 8bit-
|
||||
MIMEtransport", July 1994, RFC 1652.
|
||||
|
||||
|
||||
[NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of
|
||||
Internationalized Host Names", draft-ietf-idn-nameprep.
|
||||
|
||||
A. Author's Address
|
||||
|
||||
Maynard Kang
|
||||
i-EMAIL.net Pte Ltd
|
||||
1 Kim Seng Promenade #12-07
|
||||
Great World City West Tower
|
||||
Singapore 237994
|
||||
E-mail: maynard@i-email.net
|
||||
|
|
@ -1,855 +0,0 @@
|
|||
Internet Draft Paul Hoffman
|
||||
draft-ietf-idn-nameprep-00.txt IMC & VPNC
|
||||
July 3, 2000 Marc Blanchet
|
||||
Expires in six months ViaGenie
|
||||
|
||||
Preparation of Internationalized Host Names
|
||||
|
||||
Status of this memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task
|
||||
Force (IETF), its areas, and its working groups. Note that other groups
|
||||
may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet-Drafts as reference material
|
||||
or to cite them other than as "work in progress."
|
||||
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This document describes how to prepare internationalized host names for
|
||||
transmission on the wire. The steps include excluding characters that
|
||||
are prohibited from appearing in internationalized host names, changing
|
||||
all characters that have case properties to be lowercase, and
|
||||
normalizing the characters. Further, this document lists the prohibited
|
||||
characters.
|
||||
|
||||
|
||||
1. Introduction
|
||||
|
||||
When expanding today's DNS to include internationalized host names,
|
||||
those new names will be handled in many parts of the DNS. The IDN
|
||||
Working Group's requirements document [IDNReq] describes a framework for
|
||||
domain name handling as well as requirements for the new names. The IDN
|
||||
Working Group's comparison document [IDNComp] gives a framework for how
|
||||
various parts of the IDN solution work together.
|
||||
|
||||
A user can enter a domain name into an application program in a myriad
|
||||
of fashions. Depending on the input method, the characters entered in
|
||||
the domain name may or may not be those that are allowed in
|
||||
internationalized host names. Thus, there must be a way to canonicalized
|
||||
the user's input before the name is resolved in the DNS.
|
||||
|
||||
It is a design goal of this document to allow users to enter host names
|
||||
in applications and have the highest chance of getting the name correct.
|
||||
This means that the user should not be limited to only entering exactly
|
||||
the characters that might have been used, but to instead be able to
|
||||
enter characters that unambiguously canonicalize to characters in the
|
||||
desired host name. At the same time, this process must not introduce any
|
||||
chance that two host names could be represented by two distinct strings
|
||||
of characters that look identical to typical users. It is also a design
|
||||
goal to have all preprocessing of IDN done before going on the wire, so
|
||||
that no transformation is done in the DNS server space.
|
||||
|
||||
This document describes the steps needed to convert a name part from one
|
||||
that is entered by the user to one that can be used in the DNS.
|
||||
|
||||
1.1 Terminology
|
||||
|
||||
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
|
||||
"MAY" in this document are to be interpreted as described in RFC 2119
|
||||
[RFC2119].
|
||||
|
||||
Examples in this document use the notation from the Unicode Standard
|
||||
[Unicode3] as well as the ISO 10646 [ISO10646] names. For example, the
|
||||
letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER
|
||||
A". In the lists of prohibited characters, the "U+" is left off to make
|
||||
the lists easier to read.
|
||||
|
||||
1.2 IDN summary
|
||||
|
||||
Using the terminology in [IDNComp], this document specifies all of the
|
||||
prohibited characters and the canonicalization for an IDN solution.
|
||||
Specifically, it covers the following sections from [IDNComp]:
|
||||
|
||||
prohib-1: Identical and near-identical characters
|
||||
prohib-2: Separators
|
||||
prohib-3: Non-displaying and non-spacing characters
|
||||
prohib-4: Private use characters
|
||||
prohib-5: Punctuation
|
||||
prohib-6: Symbols
|
||||
canon-1.2: Normalization Form KC
|
||||
canon-2.1: Case folding in ASCII
|
||||
canon-2.2: Case folding in non-ASCII
|
||||
|
||||
Note that this document does not cover:
|
||||
canon-1.1: Normalization Form C
|
||||
canon-2.3: Han folding
|
||||
|
||||
1.3 Open issues
|
||||
|
||||
This is the first draft of this document. Although there has been much
|
||||
discussion on the WG mailing list about the topics here, there has not
|
||||
yet been much agreement on some issues. Now that there is a document to
|
||||
talk about, that discussion can be more focussed.
|
||||
|
||||
1.3.1 Where to do name preparation
|
||||
|
||||
Section 2.1 says to do name preparation in the resolver. An argument can
|
||||
be made for doing name preparation in the application, before the
|
||||
application service interface. An advantage of that proposal is that
|
||||
resolvers would not need to do any name preparation. A disadvantage is
|
||||
that applications would have to be updated each time the IDN protocol is
|
||||
updated, such as if new characters are added to the repertoire of
|
||||
allowed characters. It seems likely that resolvers are more easily
|
||||
updated than all the individual applications that use internationalized
|
||||
host names.
|
||||
|
||||
1.3.2 Choosing between normalization form C and KC
|
||||
|
||||
Much of the discussion of normalization on the WG mailing list assumed
|
||||
that normalization form C would be used. Near the time that this
|
||||
document was written, people started considering form KC instead of C.
|
||||
This document used form KC, but the reasons for doing so could be
|
||||
contentious.
|
||||
|
||||
1.3.3 Does the prohibition catch all bad characters?
|
||||
|
||||
On the mailing list, it was discussed doing prohibition in two steps: a
|
||||
short list of prohibited characters before case folding in order to
|
||||
prevent uppercase characters that have no lowercase equivalents from
|
||||
getting through, and then a full check on the output of normalization.
|
||||
In this draft, all checking is done before case folding, based on the
|
||||
(possibly wrong) assumption that none of the prohibited characters will
|
||||
re-appear after the case folding and normalization. If that assumption
|
||||
turns out to be wrong, a check for just those problematic characters can
|
||||
be added after normalization, or a full check against the prohibited
|
||||
characters can be added.
|
||||
|
||||
|
||||
2. Preparation Overview
|
||||
|
||||
This section describes where name preparation happens and the steps that
|
||||
name preparation software must take.
|
||||
|
||||
2.1 Where name preparation happens
|
||||
|
||||
Part of the chart in section 1.4 of [IDNReq] looks like this:
|
||||
|
||||
+---------------+
|
||||
| Application |
|
||||
+---------------+
|
||||
| Application service interface
|
||||
| For ex. GethostbyXXXX interface
|
||||
+---------------+
|
||||
| Resolver |
|
||||
+---------------+
|
||||
| <----- DNS service interface
|
||||
+-------------------------------------------+
|
||||
|
||||
In this specification, the name preparation is done in the resolver,
|
||||
before the DNS service interface. That is, it is acceptable for software
|
||||
in the application service interface (such as a "GetHostByName" API) to
|
||||
pass the resolver a name that has not been prepared. However, the
|
||||
resolver MUST prepare the name as described in this specification before
|
||||
passing it to the DNS service interface.
|
||||
|
||||
2.2 Name preparation steps
|
||||
|
||||
The steps for preparing names are:
|
||||
|
||||
1) Input from the application service interface -- This can be done in
|
||||
many ways and is not specified in this document
|
||||
|
||||
2) Look for prohibited input -- Check for any characters that are not
|
||||
allowed in the input. If any are found, return an error to the
|
||||
application service interface. This step is necessary to prevent errors
|
||||
in the following two steps. This step fulfills prohib-1, prohib-2,
|
||||
prohib-3, prohib-4, prohib-5, and prohib-6 from [IDNComp].
|
||||
|
||||
3) Fold case -- Change all uppercase characters into lowercase
|
||||
characters. Design note: this step could just as easily have been
|
||||
"change all lowercase characters into uppercase characters". However,
|
||||
the upper-to-lower folding was chosen because most users of the Internet
|
||||
today enter host names in lowercase. This step fulfills canon-2.1 and
|
||||
canon-2.2 from [IDNComp].
|
||||
|
||||
4) Canonicalize -- Normalize the characters. This step fulfils canon-1.2
|
||||
from [IDNComp].
|
||||
|
||||
5) Resolution of the prepared name -- This must be specified in a
|
||||
different IDN document.
|
||||
|
||||
The above steps MUST be performed in the order given in order to comply
|
||||
with this specification.
|
||||
|
||||
|
||||
3. Prohibited Input
|
||||
|
||||
Before the text can be processed, it must be checked for prohibited
|
||||
characters. There is a variety of prohibited characters, as described in
|
||||
this section.
|
||||
|
||||
Note that one of the goals of IDN is to allow the widest possible set of
|
||||
host names as long as those host names do not cause other problems, such
|
||||
as possible ambiguity. Specifically, experience with current DNS names
|
||||
have shown that there is a desire for host names that include personal
|
||||
names, company names, and spoken phrases. A goal of this section is to
|
||||
prohibit as few characters that might be used in these contexts as
|
||||
possible while making sure that characters that might easily cause
|
||||
confusion or ambiguity are prohibited.
|
||||
|
||||
Note that every character listed in this section MUST NOT be transmitted
|
||||
on the DNS service interface. Although the checking is being performed
|
||||
before case folding and canonicalization, those steps cannot result in
|
||||
any of these characters if these characters are not in the input stream.
|
||||
[[[NOTE: THIS STATEMENT NEEDS TO BE CHECKED ALGORITHMICALLY.]]] If a DNS
|
||||
server receives a request containing a prohibited character, then the
|
||||
IDN protocol MUST return an error message.
|
||||
|
||||
|
||||
Note that some characters listed in one section would also appear in
|
||||
other sections. Each character is only listed once.
|
||||
|
||||
3.1 prohib-1: Identical and near-identical characters
|
||||
|
||||
Many characters in [ISO10646] are identical or nearly identical to other
|
||||
characters. These were often included for compatibility with other
|
||||
character sets.
|
||||
|
||||
The characters prohibited because they are identical or nearly identical
|
||||
to allowed characters are:
|
||||
|
||||
00AD SOFT HYPHEN
|
||||
00D7 MULTIPLICATION SIGN
|
||||
01C3 LATIN LETTER RETROFLEX CLICK
|
||||
02B0-02FF [SPACING MODIFIER LETTERS]
|
||||
066D ARABIC FIVE POINTED STAR
|
||||
1806 MONGOLIAN TODO SOFT HYPHEN
|
||||
2010 HYPHEN
|
||||
2011 NON-BREAKING HYPHEN
|
||||
2012 FIGURE DASH
|
||||
2013 EN DASH
|
||||
2014 EM DASH
|
||||
2160-217F [ROMAN NUMERALS]
|
||||
FB1D-FB4F [HEBREW PRESENTATION FORMS]
|
||||
FB50-FDFF [ARABIC PRESENTATION FORMS A]
|
||||
FE20-FE2F [COMBINING HALF MARKS]
|
||||
FE30-FE4F [CJK COMPATIBILITY FORMS]
|
||||
FE50-FE6F [SMALL FORM VARIANTS]
|
||||
FE70-FEFC [ARABIC PRESENTATION FORMS B]
|
||||
FF00-FFEF [HALFWIDTH AND FULLWIDTH FORMS]
|
||||
|
||||
3.2 prohib-2: Separators
|
||||
|
||||
Horizontal and vertical spacing characters would make it unclear where a
|
||||
host name begins and ends. The prohibited spacing characters are:
|
||||
|
||||
0020 SPACE
|
||||
00A0 NO-BREAK SPACE
|
||||
1680 OGHAM SPACE MARK
|
||||
2000-200B [SPACES]
|
||||
2028 LINE SEPARATOR
|
||||
2029 PARAGRAPH SEPARATOR
|
||||
202F NARROW NO-BREAK SPACE
|
||||
3000 IDEOGRAPHIC SPACE
|
||||
|
||||
Allowing periods and period-like characters as characters within a name
|
||||
part would also cause similar confusion. The prohibited periods,
|
||||
characters that look like periods, and characters that canonicalize to a
|
||||
period or to a period-like character are:
|
||||
|
||||
002E FULL STOP
|
||||
06D4 ARABIC FULL STOP
|
||||
2024 ONE DOT LEADER
|
||||
2025 TWO DOT LEADER
|
||||
2026 HORIZONTAL ELLIPSIS
|
||||
2488 DIGIT ONE FULL STOP
|
||||
2489 DIGIT TWO FULL STOP
|
||||
248A DIGIT THREE FULL STOP
|
||||
248B DIGIT FOUR FULL STOP
|
||||
248C DIGIT FIVE FULL STOP
|
||||
248D DIGIT SIX FULL STOP
|
||||
248E DIGIT SEVEN FULL STOP
|
||||
248F DIGIT EIGHT FULL STOP
|
||||
2490 DIGIT NINE FULL STOP
|
||||
2491 NUMBER TEN FULL STOP
|
||||
2492 NUMBER ELEVEN FULL STOP
|
||||
2493 NUMBER TWELVE FULL STOP
|
||||
2494 NUMBER THIRTEEN FULL STOP
|
||||
2495 NUMBER FOURTEEN FULL STOP
|
||||
2496 NUMBER FIFTEEN FULL STOP
|
||||
2497 NUMBER SIXTEEN FULL STOP
|
||||
2498 NUMBER SEVENTEEN FULL STOP
|
||||
2499 NUMBER EIGHTEEN FULL STOP
|
||||
249A NUMBER NINETEEN FULL STOP
|
||||
249B NUMBER TWENTY FULL STOP
|
||||
33C2 SQUARE AM
|
||||
33C2 SQUARE AM
|
||||
33C7 SQUARE CO
|
||||
33D8 SQUARE PM
|
||||
33D8 SQUARE PM
|
||||
|
||||
3.3 prohib-3: Non-displaying and non-spacing characters
|
||||
|
||||
There are many characters that cannot be seen in the ISO 10646 character
|
||||
set. These include control characters, non-breaking spaces, formatting
|
||||
characters, and tagging characters. These characters would certainly
|
||||
cause confusion if allowed in host names.
|
||||
|
||||
0000-001F [CONTROL CHARACTERS]
|
||||
007F DELETE
|
||||
0080-009F [CONTROL CHARACTERS]
|
||||
070F SYRIAC ABBREVIATION MARK
|
||||
180B MONGOLIAN FREE VARIATION SELECTOR ONE
|
||||
180C MONGOLIAN FREE VARIATION SELECTOR TWO
|
||||
180D MONGOLIAN FREE VARIATION SELECTOR THREE
|
||||
180E MONGOLIAN VOWEL SEPARATOR
|
||||
200C ZERO WIDTH NON-JOINER
|
||||
200D ZERO WIDTH JOINER
|
||||
200E LEFT-TO-RIGHT MARK
|
||||
200F RIGHT-TO-LEFT MARK
|
||||
202A LEFT-TO-RIGHT EMBEDDING
|
||||
202B RIGHT-TO-LEFT EMBEDDING
|
||||
202C POP DIRECTIONAL FORMATTING
|
||||
202D LEFT-TO-RIGHT OVERRIDE
|
||||
202E RIGHT-TO-LEFT OVERRIDE
|
||||
206A INHIBIT SYMMETRIC SWAPPING
|
||||
206B ACTIVATE SYMMETRIC SWAPPING
|
||||
206C INHIBIT ARABIC FORM SHAPING
|
||||
206D ACTIVATE ARABIC FORM SHAPING
|
||||
206E NATIONAL DIGIT SHAPES
|
||||
206F NOMINAL DIGIT SHAPES
|
||||
FEFF ZERO WIDTH NO-BREAK SPACE
|
||||
FFF9 INTERLINEAR ANNOTATION ANCHOR
|
||||
FFFA INTERLINEAR ANNOTATION SEPARATOR
|
||||
FFFB INTERLINEAR ANNOTATION TERMINATOR
|
||||
FFFC OBJECT REPLACEMENT CHARACTER
|
||||
FFFD REPLACEMENT CHARACTER
|
||||
|
||||
3.4 prohib-4: Private use characters
|
||||
|
||||
Because private-use characters do not have defined meanings, they are
|
||||
prohibited. The private-use characters are:
|
||||
|
||||
E000-F8FF [PRIVATE USE, PLANE 0]
|
||||
|
||||
3.5 prohib-5: Punctuation
|
||||
|
||||
The following characters are reserved or delimiters in URLs [RFC2396]
|
||||
and [RFC2732]:
|
||||
|
||||
" # $ % & + , . / : ; < = > ? @ [ ]
|
||||
|
||||
3.5.1 Characters from URLs
|
||||
|
||||
The following punctuation characters are prohibited because they are
|
||||
reserved or delimiters in URLs.
|
||||
|
||||
0022 QUOTATION MARK
|
||||
0023 NUMBER SIGN
|
||||
0024 DOLLAR SIGN
|
||||
0025 PERCENT SIGN
|
||||
0026 AMPERSAND
|
||||
002B PLUS SIGN
|
||||
002C COMMA
|
||||
002E FULL STOP
|
||||
002F SOLIDUS
|
||||
003A COLON
|
||||
003B SEMICOLON
|
||||
003C LESS-THAN SIGN
|
||||
003D EQUALS SIGN
|
||||
003E GREATER-THAN SIGN
|
||||
003F QUESTION MARK
|
||||
0040 COMMERCIAL AT
|
||||
005B LEFT SQUARE BRACKET
|
||||
005D RIGHT SQUARE BRACKET
|
||||
|
||||
3.5.2 Characters that canonicalize to characters from URLs
|
||||
|
||||
The following punctuation characters are prohibited because their
|
||||
normalization contains one or more of the characters from section 3.5.1.
|
||||
|
||||
037E GREEK QUESTION MARK
|
||||
2048 QUESTION EXCLAMATION MARK
|
||||
2049 EXCLAMATION QUESTION MARK
|
||||
207A SUPERSCRIPT PLUS SIGN
|
||||
207C SUPERSCRIPT EQUALS SIGN
|
||||
208A SUBSCRIPT PLUS SIGN
|
||||
208C SUBSCRIPT EQUALS SIGN
|
||||
2100 ACCOUNT OF
|
||||
2101 ADDRESSED TO THE SUBJECT
|
||||
2105 CARE OF
|
||||
2106 CADA UNA
|
||||
|
||||
3.5.3 Characters that look like characters from URLs
|
||||
|
||||
The following are prohibited because they look indistinguishable from
|
||||
the characters listed in section 3.5.1.
|
||||
|
||||
037E GREEK QUESTION MARK
|
||||
0589 ARMENIAN FULL STOP
|
||||
060C ARABIC COMMA
|
||||
061B ARABIC SEMICOLON
|
||||
066A ARABIC PERCENT SIGN
|
||||
201A SINGLE LOW-9 QUOTATION MARK
|
||||
2030 PER MILLE SIGN
|
||||
2031 PER TEN THOUSAND SIGN
|
||||
2033 DOUBLE PRIME
|
||||
2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
|
||||
2044 FRACTION SLASH
|
||||
203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
|
||||
203D INTERROBANG
|
||||
3001 IDEOGRAPHIC COMMA
|
||||
3002 IDEOGRAPHIC FULL STOP
|
||||
3003 DITTO MARK
|
||||
3008 LEFT ANGLE BRACKET
|
||||
3009 RIGHT ANGLE BRACKET
|
||||
3014 LEFT TORTOISE SHELL BRACKET
|
||||
3015 RIGHT TORTOISE SHELL BRACKET
|
||||
301A LEFT WHITE SQUARE BRACKET
|
||||
301B RIGHT WHITE SQUARE BRACKET
|
||||
|
||||
3.5.4 Other punctuation
|
||||
|
||||
The following punctuation are prohibited because they are unlikely to
|
||||
be used in names and may be confusing to users or to character-entry
|
||||
processes:
|
||||
|
||||
005C REVERSE SOLIDUS
|
||||
|
||||
3.6 prohib-6: Symbols
|
||||
|
||||
[UniData] has non-normative categories for symbols. The four symbol
|
||||
categories are:
|
||||
|
||||
Symbol, Currency: Currency symbols could appear in company names and
|
||||
spoken phrases, so they are not prohibited.
|
||||
|
||||
Symbol, Modifier: Stand-alone modifiers might appear in personal names,
|
||||
company names, and spoken phrases, so they are not prohibited.
|
||||
|
||||
Symbol, Math: It is very unlikely that there are any significant
|
||||
personal names, company names, or spoken phrases that contain
|
||||
mathematical symbols. Further, many of these symbols are the same or
|
||||
similar to other punctuation, thereby leading to ambiguity. For this
|
||||
reason, math-specific symbols are prohibited. These prohibited math
|
||||
symbols are:
|
||||
|
||||
00AC NOT SIGN
|
||||
00B1 PLUS-MINUS SIGN
|
||||
2200-22FF [MATHEMATICAL OPERATORS]
|
||||
|
||||
Further, the following characters canonicalize to characters in the
|
||||
above math list, and therefore are also prohibited:
|
||||
|
||||
00BC VULGAR FRACTION ONE QUARTER
|
||||
00BD VULGAR FRACTION ONE HALF
|
||||
00BE VULGAR FRACTION THREE QUARTERS
|
||||
207B SUPERSCRIPT MINUS
|
||||
208B SUBSCRIPT MINUS
|
||||
2153 VULGAR FRACTION ONE THIRD
|
||||
2154 VULGAR FRACTION TWO THIRDS
|
||||
2155 VULGAR FRACTION ONE FIFTH
|
||||
2156 VULGAR FRACTION TWO FIFTHS
|
||||
2157 VULGAR FRACTION THREE FIFTHS
|
||||
2158 VULGAR FRACTION FOUR FIFTHS
|
||||
2159 VULGAR FRACTION ONE SIXTH
|
||||
215A VULGAR FRACTION FIVE SIXTHS
|
||||
215B VULGAR FRACTION ONE EIGHTH
|
||||
215C VULGAR FRACTION THREE EIGHTHS
|
||||
215D VULGAR FRACTION FIVE EIGHTHS
|
||||
215E VULGAR FRACTION SEVEN EIGHTHS
|
||||
215F FRACTION NUMERATOR ONE
|
||||
33A7 SQUARE M OVER S
|
||||
33A8 SQUARE M OVER S SQUARED
|
||||
33AE SQUARE RAD OVER S
|
||||
33AF SQUARE RAD OVER S SQUARED
|
||||
33C6 SQUARE C OVER KG
|
||||
|
||||
Symbol, Other: This category covers a multitude of symbols, few of which
|
||||
would ever appear in personal names, company names, and spoken phrases.
|
||||
The rest of the prohibited symbols are:
|
||||
|
||||
2190-21FF [ARROWS]
|
||||
2300-23FF [MISCELLANEOUS TECHNICAL]
|
||||
2400-243F [CONTROL PICTURES]
|
||||
2440-245F [OPTICAL CHARACTER RECOGNITION]
|
||||
2500-257F [BOX DRAWING]
|
||||
2580-259F [BLOCK ELEMENTS]
|
||||
25A0-25FF [GEOMETRIC SHAPES]
|
||||
2600-267F [MISCELLANEOUS SYMBOLS]
|
||||
2700-27BF [DINGBATS]
|
||||
2800-287F [BRAILLE PATTERNS]
|
||||
|
||||
3.7 Additional prohibited characters
|
||||
|
||||
3.7.1 Unassigned characters
|
||||
|
||||
All characters not yet assigned in [ISO10646] are prohibited. Although
|
||||
this may at first seem trivial, it is extremely important because
|
||||
characters that may be assigned in the future might have properties that
|
||||
would cause them to be prohibited or might have case-folding properties.
|
||||
As is the case of all prohibited characters, if a DNS server receives a
|
||||
request containing an unassigned character, then the IDN protocol MUST
|
||||
return an error message.
|
||||
|
||||
3.7.2 Surrogate characters
|
||||
|
||||
So far, all proposals for binary encodings of internationalized name
|
||||
parts have specified UTF-8 as the encoding format. In such an encoding,
|
||||
surrogate characters MUST NOT be used. Therefore, for UTF-8 encodings,
|
||||
the following are prohibited:
|
||||
|
||||
D800-DFFF [SURROGATE CHARACTERS]
|
||||
|
||||
3.7.3 Uppercase characters with no lowercase mappings
|
||||
|
||||
There are many uppercase characters in [ISO10646] which do not have
|
||||
lowercase equivalents in [UniData]. Therefore, they are prohibited on
|
||||
input because they would get through the case mapping step while still
|
||||
being in uppercase.
|
||||
|
||||
The characters that are prohibited on input because they are uppercase
|
||||
but have no lowercase mappings are:
|
||||
|
||||
03D2 GREEK UPSILON WITH HOOK SYMBOL
|
||||
03D3 GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
|
||||
03D4 GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
|
||||
04C0 CYRILLIC LETTER PALOCHKA
|
||||
10A0-10C5 [GEORGIAN CAPITAL LETTERS]
|
||||
|
||||
Note that many characters in the range U+1200 to U+213A, the letterlike
|
||||
symbols, also are uppercase but have no lowercase mappings. However,
|
||||
they are not listed here because the entire range is already prohibited
|
||||
in section 3.6.
|
||||
|
||||
3.7.4 Radicals and Ideographic Description
|
||||
|
||||
Some Han characters can be informally defined in terms of ideographic
|
||||
descriptions. However, ideographic descriptions can lead to multiple
|
||||
character streams leading to the same character in a fashion that does
|
||||
not canonicalize. Thus, the radicals for ideographic description and the
|
||||
ideographic description characters themselves are prohibited. These
|
||||
characters are:
|
||||
|
||||
2E80-2EFF [CJK RADICALS SUPPLEMENT]
|
||||
2F00-2FDF [KANGXI RADICALS]
|
||||
2FF0-2FFF [IDEOGRAPHIC DESCRIPTION CHARACTERS]
|
||||
|
||||
3.8 Summary of prohibited characters
|
||||
|
||||
The following is a collected list from the previous sections.
|
||||
|
||||
0000-001F [CONTROL CHARACTERS]
|
||||
0020 SPACE
|
||||
0022 QUOTATION MARK
|
||||
0023 NUMBER SIGN
|
||||
0024 DOLLAR SIGN
|
||||
0025 PERCENT SIGN
|
||||
0026 AMPERSAND
|
||||
002B PLUS SIGN
|
||||
002C COMMA
|
||||
002E FULL STOP
|
||||
002E FULL STOP
|
||||
002F SOLIDUS
|
||||
003A COLON
|
||||
003B SEMICOLON
|
||||
003C LESS-THAN SIGN
|
||||
003D EQUALS SIGN
|
||||
003E GREATER-THAN SIGN
|
||||
003F QUESTION MARK
|
||||
0040 COMMERCIAL AT
|
||||
005B LEFT SQUARE BRACKET
|
||||
005C REVERSE SOLIDUS
|
||||
005D RIGHT SQUARE BRACKET
|
||||
007F DELETE
|
||||
0080-009F [CONTROL CHARACTERS]
|
||||
00A0 NO-BREAK SPACE
|
||||
00AC NOT SIGN
|
||||
00AD SOFT HYPHEN
|
||||
00B1 PLUS-MINUS SIGN
|
||||
00BC VULGAR FRACTION ONE QUARTER
|
||||
00BD VULGAR FRACTION ONE HALF
|
||||
00BE VULGAR FRACTION THREE QUARTERS
|
||||
00D7 MULTIPLICATION SIGN
|
||||
01C3 LATIN LETTER RETROFLEX CLICK
|
||||
02B0-02FF [SPACING MODIFIER LETTERS]
|
||||
037E GREEK QUESTION MARK
|
||||
037E GREEK QUESTION MARK
|
||||
03D2 GREEK UPSILON WITH HOOK SYMBOL
|
||||
03D3 GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
|
||||
03D4 GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
|
||||
04C0 CYRILLIC LETTER PALOCHKA
|
||||
0589 ARMENIAN FULL STOP
|
||||
060C ARABIC COMMA
|
||||
061B ARABIC SEMICOLON
|
||||
066A ARABIC PERCENT SIGN
|
||||
066D ARABIC FIVE POINTED STAR
|
||||
06D4 ARABIC FULL STOP
|
||||
070F SYRIAC ABBREVIATION MARK
|
||||
10A0-10C5 [GEORGIAN CAPITAL LETTERS]
|
||||
1680 OGHAM SPACE MARK
|
||||
1806 MONGOLIAN TODO SOFT HYPHEN
|
||||
180B MONGOLIAN FREE VARIATION SELECTOR ONE
|
||||
180C MONGOLIAN FREE VARIATION SELECTOR TWO
|
||||
180D MONGOLIAN FREE VARIATION SELECTOR THREE
|
||||
180E MONGOLIAN VOWEL SEPARATOR
|
||||
2000-200B [SPACES]
|
||||
200C ZERO WIDTH NON-JOINER
|
||||
200D ZERO WIDTH JOINER
|
||||
200E LEFT-TO-RIGHT MARK
|
||||
200F RIGHT-TO-LEFT MARK
|
||||
2010 HYPHEN
|
||||
2011 NON-BREAKING HYPHEN
|
||||
2012 FIGURE DASH
|
||||
2013 EN DASH
|
||||
2014 EM DASH
|
||||
201A SINGLE LOW-9 QUOTATION MARK
|
||||
2024 ONE DOT LEADER
|
||||
2025 TWO DOT LEADER
|
||||
2026 HORIZONTAL ELLIPSIS
|
||||
2028 LINE SEPARATOR
|
||||
2029 PARAGRAPH SEPARATOR
|
||||
202A LEFT-TO-RIGHT EMBEDDING
|
||||
202B RIGHT-TO-LEFT EMBEDDING
|
||||
202C POP DIRECTIONAL FORMATTING
|
||||
202D LEFT-TO-RIGHT OVERRIDE
|
||||
202E RIGHT-TO-LEFT OVERRIDE
|
||||
202F NARROW NO-BREAK SPACE
|
||||
2030 PER MILLE SIGN
|
||||
2031 PER TEN THOUSAND SIGN
|
||||
2033 DOUBLE PRIME
|
||||
2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
|
||||
203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
|
||||
203D INTERROBANG
|
||||
2044 FRACTION SLASH
|
||||
2048 QUESTION EXCLAMATION MARK
|
||||
2049 EXCLAMATION QUESTION MARK
|
||||
206A INHIBIT SYMMETRIC SWAPPING
|
||||
206B ACTIVATE SYMMETRIC SWAPPING
|
||||
206C INHIBIT ARABIC FORM SHAPING
|
||||
206D ACTIVATE ARABIC FORM SHAPING
|
||||
206E NATIONAL DIGIT SHAPES
|
||||
206F NOMINAL DIGIT SHAPES
|
||||
207A SUPERSCRIPT PLUS SIGN
|
||||
207B SUPERSCRIPT MINUS
|
||||
207C SUPERSCRIPT EQUALS SIGN
|
||||
208A SUBSCRIPT PLUS SIGN
|
||||
208B SUBSCRIPT MINUS
|
||||
208C SUBSCRIPT EQUALS SIGN
|
||||
2100 ACCOUNT OF
|
||||
2101 ADDRESSED TO THE SUBJECT
|
||||
2105 CARE OF
|
||||
2106 CADA UNA
|
||||
2153 VULGAR FRACTION ONE THIRD
|
||||
2154 VULGAR FRACTION TWO THIRDS
|
||||
2155 VULGAR FRACTION ONE FIFTH
|
||||
2156 VULGAR FRACTION TWO FIFTHS
|
||||
2157 VULGAR FRACTION THREE FIFTHS
|
||||
2158 VULGAR FRACTION FOUR FIFTHS
|
||||
2159 VULGAR FRACTION ONE SIXTH
|
||||
215A VULGAR FRACTION FIVE SIXTHS
|
||||
215B VULGAR FRACTION ONE EIGHTH
|
||||
215C VULGAR FRACTION THREE EIGHTHS
|
||||
215D VULGAR FRACTION FIVE EIGHTHS
|
||||
215E VULGAR FRACTION SEVEN EIGHTHS
|
||||
215F FRACTION NUMERATOR ONE
|
||||
2160-217F [ROMAN NUMERALS]
|
||||
2190-21FF [ARROWS]
|
||||
2200-22FF [MATHEMATICAL OPERATORS]
|
||||
2300-23FF [MISCELLANEOUS TECHNICAL]
|
||||
2400-243F [CONTROL PICTURES]
|
||||
2440-245F [OPTICAL CHARACTER RECOGNITION]
|
||||
2488 DIGIT ONE FULL STOP
|
||||
2489 DIGIT TWO FULL STOP
|
||||
248A DIGIT THREE FULL STOP
|
||||
248B DIGIT FOUR FULL STOP
|
||||
248C DIGIT FIVE FULL STOP
|
||||
248D DIGIT SIX FULL STOP
|
||||
248E DIGIT SEVEN FULL STOP
|
||||
248F DIGIT EIGHT FULL STOP
|
||||
2490 DIGIT NINE FULL STOP
|
||||
2491 NUMBER TEN FULL STOP
|
||||
2492 NUMBER ELEVEN FULL STOP
|
||||
2493 NUMBER TWELVE FULL STOP
|
||||
2494 NUMBER THIRTEEN FULL STOP
|
||||
2495 NUMBER FOURTEEN FULL STOP
|
||||
2496 NUMBER FIFTEEN FULL STOP
|
||||
2497 NUMBER SIXTEEN FULL STOP
|
||||
2498 NUMBER SEVENTEEN FULL STOP
|
||||
2499 NUMBER EIGHTEEN FULL STOP
|
||||
249A NUMBER NINETEEN FULL STOP
|
||||
249B NUMBER TWENTY FULL STOP
|
||||
2500-257F [BOX DRAWING]
|
||||
2580-259F [BLOCK ELEMENTS]
|
||||
25A0-25FF [GEOMETRIC SHAPES]
|
||||
2600-267F [MISCELLANEOUS SYMBOLS]
|
||||
2700-27BF [DINGBATS]
|
||||
2800-287F [BRAILLE PATTERNS]
|
||||
2E80-2EFF [CJK RADICALS SUPPLEMENT]
|
||||
2F00-2FDF [KANGXI RADICALS]
|
||||
2FF0-2FFF [IDEOGRAPHIC DESCRIPTION CHARACTERS]
|
||||
3000 IDEOGRAPHIC SPACE
|
||||
3001 IDEOGRAPHIC COMMA
|
||||
3002 IDEOGRAPHIC FULL STOP
|
||||
3003 DITTO MARK
|
||||
3008 LEFT ANGLE BRACKET
|
||||
3009 RIGHT ANGLE BRACKET
|
||||
33A7 SQUARE M OVER S
|
||||
33A8 SQUARE M OVER S SQUARED
|
||||
33AE SQUARE RAD OVER S
|
||||
33AF SQUARE RAD OVER S SQUARED
|
||||
33C2 SQUARE AM
|
||||
33C2 SQUARE AM
|
||||
33C6 SQUARE C OVER KG
|
||||
33C7 SQUARE CO
|
||||
33D8 SQUARE PM
|
||||
33D8 SQUARE PM
|
||||
D800-DFFF [SURROGATE CHARACTERS]
|
||||
E000-F8FF [PRIVATE USE, PLANE 0]
|
||||
FB1D-FB4F [HEBREW PRESENTATION FORMS]
|
||||
FB50-FDFF [ARABIC PRESENTATION FORMS A]
|
||||
FE20-FE2F [COMBINING HALF MARKS]
|
||||
FE30-FE4F [CJK COMPATIBILITY FORMS]
|
||||
FE50-FE6F [SMALL FORM VARIANTS]
|
||||
FE70-FEFC [ARABIC PRESENTATION FORMS B]
|
||||
FEFF ZERO WIDTH NO-BREAK SPACE
|
||||
FF00-FFEF [HALFWIDTH AND FULLWIDTH FORMS]
|
||||
FFF9 INTERLINEAR ANNOTATION ANCHOR
|
||||
FFFA INTERLINEAR ANNOTATION SEPARATOR
|
||||
FFFB INTERLINEAR ANNOTATION TERMINATOR
|
||||
FFFC OBJECT REPLACEMENT CHARACTER
|
||||
FFFD REPLACEMENT CHARACTER
|
||||
Unassigned characters
|
||||
|
||||
|
||||
4. Case Folding
|
||||
|
||||
After it has been verified that the input text has none of the
|
||||
characters prohibited for case folding, the case-folding step itself is
|
||||
quite straight-forward. For each character in the input, if there is a
|
||||
lowercase mapping for that character in [UniData], the input character
|
||||
is changed to the mapped lowercase letter.
|
||||
|
||||
|
||||
5. Canonicalization
|
||||
|
||||
After case folding, the input string is normalized using form KC, as
|
||||
described in [UTR15].
|
||||
|
||||
6. IDN Table Revisions
|
||||
|
||||
A table consisting of all characters allowed and prohibited and the
|
||||
rules for case folding and canonicalization will be created based on the
|
||||
content of the [UniData] and on the content of this document. This table
|
||||
will be the authority for implementations to follow and will be
|
||||
normatively referenced by this document. Such a table will enable the
|
||||
IDN protocol to have versions independent of the revisions to Unicode
|
||||
and/or to ISO 10646 because the revision of IDN and its deployment may
|
||||
not in sync with revisions to Unicode and ISO 10646.
|
||||
|
||||
In a future draft of this document, IANA will be asked to keep this
|
||||
table, with an initial version number of 1. Each new version of the
|
||||
table will have a new, higher version number.
|
||||
|
||||
|
||||
7. Security Considerations
|
||||
|
||||
Much of the security of the Internet relies on the DNS. Thus, any change
|
||||
to the characteristics of the DNS can change the security of much of the
|
||||
Internet.
|
||||
|
||||
Host names are used by users to connect to Internet servers. The
|
||||
security of the Internet would be compromised if a user entering a
|
||||
single internationalized name could be connected to different servers
|
||||
based on different interpretations of the internationalized host name.
|
||||
|
||||
|
||||
8. References
|
||||
|
||||
[IDNComp] Paul Hoffman, "Comparison of Internationalized Domain Name
|
||||
Proposals", draft-ietf-idn-compare.
|
||||
|
||||
[IDNReq] James Seng, "Requirements of Internationalized Domain Names",
|
||||
draft-ietf-idn-requirement.
|
||||
|
||||
[ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information
|
||||
technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part
|
||||
1: Architecture and Basic Multilingual Plane. Five amendments and a
|
||||
technical corrigendum have been published up to now. UTF-16 is described
|
||||
in Annex Q, published as Amendment 1. 17 other amendments are currently
|
||||
at various stages of standardization. [[[ THIS REFERENCE NEEDS TO BE
|
||||
UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]]
|
||||
|
||||
[Normalize] Character Normalization in IETF Protocols,
|
||||
draft-duerst-i18n-norm-03
|
||||
|
||||
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", March 1997, RFC 2119.
|
||||
|
||||
[RFC2396] Tim Berners-Lee, et. al., "Uniform Resource Identifiers (URI):
|
||||
Generic Syntax", August 1998, RFC 2396.
|
||||
|
||||
[RFC2732] Robert Hinden, et. al., Format for Literal IPv6 Addresses in
|
||||
URL's, December 1999, RFC 2732.
|
||||
|
||||
[STD13] Paul Mockapetris, "Domain names - implementation and
|
||||
specification", November 1987, STD 13 (RFC 1035).
|
||||
|
||||
[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version
|
||||
3.0", ISBN 0-201-61633-5. Described at
|
||||
<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.
|
||||
|
||||
[UniData] The Unicode Consortium. UnicodeData File.
|
||||
<ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt>.
|
||||
|
||||
[UTR15] Mark Davis and Martin Duerst. Unicode Normalization Forms.
|
||||
Unicode Technical Report #15.
|
||||
<http://www.unicode.org/unicode/reports/tr15/>.
|
||||
|
||||
|
||||
A. Acknowledgements
|
||||
|
||||
Many people from the IETF IDN Working Group and the Unicode Technical
|
||||
Committee contributed ideas that went into the first draft of this
|
||||
document. Mark Davis was particularly helpful in some of the early
|
||||
ideas.
|
||||
|
||||
|
||||
B. Changes From Previous Versions of this Draft
|
||||
|
||||
This is the -00 version, so there are no changes.
|
||||
|
||||
|
||||
C. IANA Considerations
|
||||
|
||||
There are no specific IANA considerations in this draft, but there will
|
||||
be in a future draft of this document.
|
||||
|
||||
|
||||
D. Author Contact Information
|
||||
|
||||
Paul Hoffman
|
||||
Internet Mail Consortium and VPN Consortium
|
||||
127 Segre Place
|
||||
Santa Cruz, CA 95060 USA
|
||||
paul.hoffman@imc.org and paul.hoffman@vpnc.org
|
||||
|
||||
Marc Blanchet
|
||||
Viagenie inc.
|
||||
2875 boul. Laurier, bur. 300
|
||||
Ste-Foy, Quebec, Canada, G1V 2M2
|
||||
Marc.Blanchet@viagenie.qc.ca
|
||||
1988
doc/draft/draft-ietf-idn-nameprep-02.txt
Normal file
1988
doc/draft/draft-ietf-idn-nameprep-02.txt
Normal file
File diff suppressed because it is too large
Load diff
269
doc/draft/draft-ietf-idn-uri-00.txt
Normal file
269
doc/draft/draft-ietf-idn-uri-00.txt
Normal file
|
|
@ -0,0 +1,269 @@
|
|||
INTERNET-DRAFT Martin Duerst
|
||||
draft-ietf-idn-uri-00 W3C/Keio University
|
||||
Expires July 2001 January 6, 2001
|
||||
|
||||
|
||||
Internationalized Domain Names in URIs and IRIs
|
||||
|
||||
Status of this Memo
|
||||
|
||||
This document is an Internet-Draft and is in full conformance with all
|
||||
provisions of Section 10 of RFC2026.
|
||||
|
||||
Internet-Drafts are working documents of the Internet Engineering Task
|
||||
Force (IETF), its areas, and its working groups. Note that other
|
||||
groups may also distribute working documents as Internet-Drafts.
|
||||
|
||||
Internet-Drafts are draft documents valid for a maximum of six months
|
||||
and may be updated, replaced, or obsoleted by other documents at any
|
||||
time. It is inappropriate to use Internet- Drafts as reference
|
||||
material or to cite them other than as "work in progress."
|
||||
|
||||
The list of current Internet-Drafts can be accessed at
|
||||
http://www.ietf.org/ietf/1id-abstracts.txt.
|
||||
|
||||
The list of Internet-Draft Shadow Directories can be accessed at
|
||||
http://www.ietf.org/shadow.html.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This document is a first draft for the provisions necessary to
|
||||
upgrade the definitions of URIs [RFC 2396] and IRIs (Internationalized
|
||||
Resource Identifiers, [IRI]) to work with internationalized domain
|
||||
names.
|
||||
|
||||
|
||||
1. Introduction
|
||||
|
||||
Internet domain names serve to identify hosts and services on the
|
||||
Internet in a convenient way. The IETF IDN working group is currently
|
||||
working on extending the character repertoire usable in domain names
|
||||
beyond a subset of US-ASCII.
|
||||
|
||||
One of the most important places where domain names appear are
|
||||
Uniform Resource Identifiers (URIs, [RFC 2396], as modified by
|
||||
[RFC2732]). However, in the current definition of the generic URI
|
||||
syntax, the restrictions on domain names are 'hard-coded'. This
|
||||
document proposes to relax these restrictions by updating the syntax,
|
||||
and defines how internationalized domain names are encoded in URIs.
|
||||
|
||||
URIs themselves are restricted to a subset of US-ASCII. However,
|
||||
there is a proposal for relieving these restrictions by creating
|
||||
a new protocol element called an IRI (Internationalized Resource
|
||||
Identifier [IRI]). While IRIs in general allow the use of non-ASCII
|
||||
characters, the syntax of IRIs has the same restriction for domain
|
||||
names as the syntaxt of URIs. This document proposes to relax these
|
||||
restrictions, too, in a way that is compatible with the new syntax
|
||||
for URIs. This means that encoding an internationalized domain name in
|
||||
an URI and encoding the same name in an IRI will produce an URI and an
|
||||
IRI that can be converted into each other using the procedures defined
|
||||
in [IRI] for these conversions.
|
||||
|
||||
2. URI syntax changes
|
||||
|
||||
The syntax of URIs [RFC2326] currently contains the following rules
|
||||
relevant to domain names:
|
||||
|
||||
hostname = *( domainlabel "." ) toplabel [ "." ]
|
||||
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
|
||||
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
|
||||
|
||||
The later two rules are changed as follows:
|
||||
|
||||
domainlabel = escalphanum | escalphanum *( escalphanum | "-" )
|
||||
escalphanum
|
||||
toplabel = escalpha | escalpha *( escalphanum | "-" )
|
||||
escalphanum
|
||||
|
||||
and the following rules are added:
|
||||
|
||||
escalphanum = escaped8 | alphanum
|
||||
escalpha = elcaped8 | alpha
|
||||
escaped8 = "%" hexdig8 HEXDIG
|
||||
hexdig8 = <<HEXDIG greater than 7>>
|
||||
|
||||
The %HH escaping is used to encode characters outside the repertoire
|
||||
of US-ASCII. This is done by first encoding the characters in UTF-8
|
||||
[RFC 2279], resulting in a sequence of octets, and then escaping these
|
||||
octets.
|
||||
|
||||
Using UTF-8 assures that this encoding interoperates with IRIs (see
|
||||
Section 3). It is also alligned with the recommendations in [RFC 2277]
|
||||
and [RFC 2718], and is consistent with the URN syntax [RFC2141] as
|
||||
well as recent URL scheme definitions that define encodings of
|
||||
non-ASCII characters based on (e.g., IMAP URLs [RFC 2192] and POP URLs
|
||||
[RFC 2384]).
|
||||
|
||||
Please note that the use of UTF-8 for encoding internationalized
|
||||
domain names in URIs is independent of the choice of encoding chosen
|
||||
for these names in the DNS protocol. In case something else than UTF-8
|
||||
is chosen for the later, a future version of this document may give
|
||||
instructions for the conversion if deemed necessary.
|
||||
|
||||
The above syntax rules do not extend the possible domain names based
|
||||
on US-ASCII characters. This may have to be changed in case the IDN
|
||||
WG should decide to allow such extensions.
|
||||
|
||||
The above rules also do not allow escaping of US-ASCII characters,
|
||||
although this is allowed in the other parts of an URI (except for the
|
||||
special provisions in case of reserved characters). Allowing such
|
||||
escaping would make the syntax rules quite a bit more complicated,
|
||||
would mean that the restrictions on US-ASCII characters can be
|
||||
circumvented by using escaping, or would lead to much simpler syntax
|
||||
rules that don't express these restrictions anymore. Even in case
|
||||
escaping of US-ASCII characters is allowed in order to simplify
|
||||
processing, it should be noted that it is always better not to escape
|
||||
US-ASCII characters in domain names because of the possibility that
|
||||
a resolver cannot unescape them. At least purely US-ASCII domain names
|
||||
would then always be resolved by such a processor.
|
||||
|
||||
While only the restrictions on US-ASCII characters are expressed in the
|
||||
rules above, all the other restrictions on internationalized
|
||||
domain names that will be defined by the IDN WG MUST be respected.
|
||||
|
||||
The work of the IDN WG currently includes some procedures for name
|
||||
preparation. Before encoding an internationalized domain name in an
|
||||
URI, this preparation step SHOULD be applied. However, the resolver
|
||||
MUST also apply name preparation.
|
||||
|
||||
|
||||
2. IRI syntax changes
|
||||
|
||||
The syntax of IRIs [IRI] currently contains the following rules
|
||||
relevant to domain names:
|
||||
|
||||
hostname = *( domainlabel "." ) toplabel [ "." ]
|
||||
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
|
||||
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
|
||||
|
||||
The later two rules are changed as follows:
|
||||
|
||||
domainlabel = intalphanum | intalphanum *( intalphanum | "-" )
|
||||
intalphanum
|
||||
toplabel = intalpha | intalpha *( intalphanum | "-" )
|
||||
intalphanum
|
||||
|
||||
and the following rules are added:
|
||||
|
||||
intalphanum = ichar | alphanum | escaped8
|
||||
intalpha = ichar | alpha | escaped8
|
||||
escaped8 = "%" hexdig8 HEXDIG
|
||||
hexdig8 = <<HEXDIG greater than 7>>
|
||||
|
||||
where ichar, as in [IRI], is:
|
||||
|
||||
ichar = << any character of UCS [ISO10646] beyond
|
||||
U+0080, subject to limitations in Section
|
||||
3.1. of [IRI] >>
|
||||
|
||||
With respect to the allowed domain names based on US-ASCII characters,
|
||||
the same considerations as in Section 2 apply.
|
||||
|
||||
As in Section 2, all the other restrictions on internationalized
|
||||
domain names that will be defined by the IDN WG MUST be respected.
|
||||
Also, before encoding an internationalized domain name in an IRI,
|
||||
name preparation SHOULD be applied. However, the IRI resolver MUST
|
||||
also apply name preparation.
|
||||
|
||||
It is expected that the rules in Section 3.1 of [IRI] will be less
|
||||
restrictive than the rules for internationalized domain names, so that
|
||||
no escaping is necessary. Nevertheless, escaping is allowed for cases
|
||||
where not all characters can be directly represented.
|
||||
|
||||
|
||||
4. Security Considerations
|
||||
|
||||
Besides the security considerations of [RFC 2396] and [IRI] and those
|
||||
applying to the various aspects of internationalized domain names in
|
||||
general, there are currently no known security problems.
|
||||
|
||||
|
||||
Acknowledgements
|
||||
|
||||
To be done.
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
Copyright (C) The Internet Society, 1997. All Rights Reserved.
|
||||
|
||||
This document and translations of it may be copied and furnished to
|
||||
others, and derivative works that comment on or otherwise explain it
|
||||
or assist in its implementation may be prepared, copied, published
|
||||
and distributed, in whole or in part, without restriction of any
|
||||
kind, provided that the above copyright notice and this paragraph
|
||||
are included on all such copies and derivative works. However, this
|
||||
document itself may not be modified in any way, such as by removing
|
||||
the copyright notice or references to the Internet Society or other
|
||||
Internet organizations, except as needed for the purpose of
|
||||
developing Internet standards in which case the procedures for
|
||||
copyrights defined in the Internet Standards process must be
|
||||
followed, or as required to translate it into languages other
|
||||
than English.
|
||||
|
||||
The limited permissions granted above are perpetual and will not be
|
||||
revoked by the Internet Society or its successors or assigns.
|
||||
|
||||
This document and the information contained herein is provided on an
|
||||
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
||||
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
||||
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
||||
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
|
||||
|
||||
|
||||
Author's address
|
||||
|
||||
Martin J. Duerst
|
||||
W3C/Keio University
|
||||
5322 Endo, Fujisawa
|
||||
252-8520 Japan
|
||||
duerst@w3.org
|
||||
http://www.w3.org/People/D%C3%BCrst/
|
||||
Tel/Fax: +81 466 49 1170
|
||||
|
||||
Note: Please write "Duerst" with u-umlaut wherever
|
||||
possible, e.g. as "Dürst" in XML and HTML.
|
||||
|
||||
|
||||
References
|
||||
|
||||
[IRI] L. Masinter, M. Duerst, "Internationalized Resource Identifiers
|
||||
(IRI)", Internet Draft, January 2001,
|
||||
<http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-06.txt>,
|
||||
work in progress.
|
||||
|
||||
[ISO10646] ISO/IEC, Information Technology - Universal Multiple-Octet
|
||||
Coded Character Set (UCS) - Part 1: Architecture and Basic
|
||||
Multilingual Plane, Oct. 2000, with amendments.
|
||||
|
||||
[RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate
|
||||
Requirement Levels", March 1997.
|
||||
|
||||
[RFC 2141] R. Moats, "URN Syntax", May 1997.
|
||||
|
||||
[RFC 2192] C. Newman, "IMAP URL Scheme", September 1997.
|
||||
|
||||
[RFC 2277] H. Alvestrad, "IETF Policy on Character Sets and
|
||||
Languages".
|
||||
|
||||
[RFC 2279] F. Yergeau. "UTF-8, a transformation format of ISO 10646.",
|
||||
January 1998.
|
||||
|
||||
[RFC 2384] R. Gellens, "POP URL Scheme", August 1998.
|
||||
|
||||
[RFC 2396] T.Berners-Lee, R.Fielding, L.Masinter. "Uniform Resource
|
||||
Identifiers (URI): Generic Syntax." August, 1998.
|
||||
|
||||
[RFC 2640] B. Curtis, "Internationalization of the File Transfer
|
||||
Protocol", July 1999.
|
||||
|
||||
[RFC 2718] L. Masinter, H. Alvestrand, D. Zigmond, R. Petke,
|
||||
"Guidelines for new URL Schemes", November 1999.
|
||||
|
||||
[RFC 2732] R. Hinden, B. Carpenter, L. Masinter, "Format for Literal
|
||||
IPv6 Addresses in URL's", December 1999.
|
||||
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load diff
Loading…
Reference in a new issue