<?xml version="1.0" encoding="US-ASCII"?>

<!-- This version is identical to unicode-escapes-07c (the source for
     the version approved by IESG), except that all tracking comments
	 and crefs other than "please remove" instructions have been
	 removed.  The file name still contains a tracking suffix, now
	 -07cx.  
	 -->
	
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

<!ENTITY rfc3629 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml'>
<!ENTITY rfc3492 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3492.xml'>
<!ENTITY rfc5234 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5234.xml'>
<!ENTITY rfc2277 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2277.xml'>
<!ENTITY rfc2119 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>

]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc rfcedstyle="yes" ?>
<?rfc subcompact="no" ?>

<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>

<!-- Expand crefs and put them inline -->
<?rfc comments='yes' ?>
<?rfc inline='yes' ?>  

<rfc number="5137" category="bcp" seriesNo="137">
  <front>
    <title abbrev="Unicode Escapes">
      ASCII Escaping of Unicode Characters</title>
<author initials="J.C." surname="Klensin" fullname="John C Klensin">
  <organization/>
  <address>
    <postal>
      <street>1770 Massachusetts Ave, #322</street>
      <city>Cambridge</city> <region>MA</region> <code>02140</code>
      <country>USA</country> </postal>
    <phone>+1 617 245 1457 </phone>
    <email>john-ietf@jck.com</email>
  </address>
</author>

  <date month="February" year="2008"/>
    <keyword>Text</keyword>
    <keyword>Internationalization</keyword>
    <keyword>ASCII</keyword>
    <keyword>Unicode</keyword>
    <keyword>UTF-8</keyword>
    <keyword>Encoding</keyword>
	
    <abstract>
      <t> There are a number of circumstances in which an escape
		 mechanism is needed in conjunction with a protocol to encode
		 characters that cannot be represented or transmitted
		 directly.  With ASCII coding, the traditional escape has
		 been either the decimal or hexadecimal numeric value of the
		 character, written in a variety of different ways.  The move
		 to Unicode, where characters occupy two or more octets and
		 may be coded in several different forms, has further
		 complicated the question of escapes.  This document discusses
		 some options now in use and discusses considerations for
		 selecting one for 
		 use in new IETF protocols, and protocols that are now being
		 internationalized. </t>
    </abstract>

  </front>

  <middle>
<section title="Introduction">
   <section title="Context and Background">
         <t> There are a number of circumstances in which an escape
		 mechanism is needed in conjunction with a protocol to encode
		 characters that cannot be represented or transmitted
		 directly.  With ASCII <xref target="ASCII"/> coding, the
		 traditional escape has 
		 been either the decimal or hexadecimal numeric value of the
		 character, written in a variety of different ways.  For
		 example, in different contexts, we have seen %dNN or %NN for
		 the decimal form,  %NN, %xNN, X'nn', and %X'NN' for the hexadecimal
		 form.  "%NN" has become popular in recent years to represent
		 a hexadecimal value without further qualification, perhaps as
		 a consequence of its use in URLs and their prevalence.
		 There are even some applications around in which octal 
		 forms are used and, while they do not generalize well, the
		 MIME Quoted-Printable and Encoded-word forms can be thought
		 of as yet another set of escapes.
		 So, even for the fairly simple cases of ASCII and
		 standard built by extending ASCII, such as the
		 ISO 8859 <!-- ?? need ref -->
		 family, we have been living with several different
		 escaping forms, each the result of some history.</t>
	  <t>When one moves to
		 Unicode <xref target="Unicode"/> <xref target="ISO10646"/>,
		 where characters occupy two or
		 more octets and may be coded in several different forms, the
		 question of escapes becomes even more complicated.
		 <!-- following material per Mark, 20071031 -->
		 Unicode represents characters as code points: numeric values
		 from 0 to hex 10FFFF.
		 When referencing code points in flowing text, they are
		 represented using the so-called "U+" notation, as values
		 from U+0000 to U+10FFFF. When serialized into octets, these
		 code points can be represented in different forms:
		 <list style="symbols">
			<t>in UTF-8 with one to four octets
			   <xref target="RFC3629"/></t>
			<t>in UTF-16 with two or four octets (or one or two
			   seizets -- 16-bit units)</t>
			<t>in UTF-32 with exactly four octets (or one 32-bit
			   unit)</t>
			</list>
		 When escaping characters, we have seen fairly extensive use
		 of hexadecimal representations of both the serialized forms
		 and variations on the U+ notation, known as code point
		 escapes.</t>  <!-- end of Mark insertions ... material that
	  follows deleted as part of his changes...
		 In particular, we have seen fairly extensive use of both
		 hexadecimal representations of the
		 UTF-8 encoding <xref target="RFC3629"/> of a
		 character and variations on the U+NNNN[N[N]] notation (i.e.,
		 "U+" and four to six hexadecimal digits)
		 commonly used in conjunction with the Unicode Standard.</t>
	  -->
	  <t>In accordance with existing best-practices
		 recommendations <xref target="RFC2277"/>, new protocols that
		 are required to carry textual content for human use SHOULD
		 be designed in 
		 such a way that the full repertoire of Unicode characters
		 may be represented in that text.</t>
<?rfc needLines="5" ?>	  
<t>This document proposes that existing protocols being
		 internationalized, and those that need an escape mechanism, SHOULD
		 use some contextually appropriate 
		 variation on references to code points as described in
		 <xref target="CodePointEncodings"/> unless other
		 considerations outweigh those described here. </t>
	  <t>This recommendation is not applicable to protocols that
		 already accept native UTF-8 or some other encoding of
		 Unicode.  In general, when protocols are internationalized,
		 it is preferable to accept those forms rather than using
		 escapes.  This recommendation applies to cases, including
		 transition arrangements, in which that is not practical.</t>
	  <t>In addition to the protocol contexts addressed in this
		 specification, escapes to represent Unicode characters also appear
		 in presentations to users, i.e., in user interfaces (UI).  The
		 formats specified in, and the reasoning of, this document may
		 be applicable in UI contexts as well, but this is not a
		 proposal to standardize UI or presentation forms.</t>
	  <t>This document does not make general recommendations for
		 processing Unicode strings or for their contents.  It
		 assumes that the strings that one might want to escape are
		 valid and reasonable and that the definition of "valid and
		 reasonable" is the province of other documents.
		 Recommendations about general treatment of Unicode strings may
		 be found in many places, including the Unicode Standard
		 itself and the
		 W3C Character Model <xref target="W3C-CharMod"/>, as well as
		 specific rules in individual protocols.</t>
	  </section>

 <section title="Terminology" anchor="terminology">
 <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in
   <xref target="RFC2119"/>.</t>
   <t>Additional Unicode-specific terminology appears in
	  <xref target="UnicodeGlossary"/>, but is not necessary for
	  understanding this specification.</t>
   </section>

   <section title="Discussion List">
	  <t> Discussion of this document should be addressed to the
		 discuss@apps.ietf.org mailing list.</t>
	  <!-- use apps-review@ietf.org to request a reviewer to be
	       assigned when ready, per Lisa, email 20070117 -->
	  </section>
   </section>

 <section title="Encodings that Represent Unicode Code Points: Code Position versus UTF-8 or UTF-16 Octets"
		 anchor="CodePointEncodings">

		  <t> There are two major families of ways to escape
			 Unicode characters.  One uses the code point in some
			 representation (see the next section), 
			 the other encodes the octets of the UTF-8 encoding or
			 some other encoding in some representation.  Some other
			 options are 
			 possible, but they have been rare in practice.  This
			 specification recommends that, in the absence of
			 compelling reasons to do otherwise, the Unicode code
			 points SHOULD be used rather than a representation of UTF-8 (or
			 UTF-16) octets.  There are several reasons for this,
			 including: 
		  <list style="symbols">
			 <t>One reason for the success of many IETF protocols is
				that they use human-interpretable text forms to
				communicate, rather than encodings that generally
				require computer programs (or hand simulation of
				algorithms) to decode.   This suggests that the
				presentation form should reference the Unicode tables
				for characters and to do so as simply as possible.</t>
			 <!-- Peter Constable replacement paragraph follows
			 <t>The nature of UTF-8 implies that a decimal or
				hexadecimal numeral representation of UTF-8 requires
				conversion to the UTF-8 form, then conversion from the
				UTF-8 form to a Unicode character position form in
				order to look the character up in a table. -->
			 <t>Because of the nature of UTF-8, for a human to
				interpret a decimal or hexadecimal numeral
				representation of UTF-8 octets requires one or more
				decoding steps to determine a Unicode code point that
				can used to look up the character in a table.
			 That may
				be appropriate in some cases where the goal is really
				to represent the UTF-8 form but, in general, it just
				obscures desired information and makes errors more
				likely and debugging harder.</t>
			 <t>Except for characters in the ASCII subset of Unicode
				(U+0000 through U+007F), the code point
				form is generally more compact than forms based on
				coding UTF-8 octets, sometimes much more compact.</t>
			 </list></t>
		  <t>The same considerations that apply to representation of
			 the octets of UTF-8 encoding also apply to more compact
			 ACE encodings such as 
			 the "bootstring" encoding <xref target="RFC3492"/> with
			 or without its "Punycode" profile.</t>
		  <t>Similar considerations apply to UTF-16 encoding, such as
			 the \uNNNN form used in
			 Java (See <xref target="Java-form"/>).  While those forms
			 are equivalent to code point references for the Basic
			 Multilingual Plane (BMP, Plane 0), a two-stage decoding
			 process is needed to handle surrogates to access higher
			 planes.</t>
		  </section>

    <section title="Referring to Unicode Characters">
	   <t> Regardless of what decisions are made about escapes for
		  Unicode characters in protocol or similar contexts, text
		  referring to a Unicode code point SHOULD use the
		  U+NNNN[N[N]] syntax, as specified in
		  the Unicode Standard, where the NNNN... string consists of
		  hexadecimal numbers.  Text actually containing a Unicode
		  character SHOULD use a syntax more suitable for automated
		  processing. </t>
	   </section>
	   
	<section title="Syntax for Code Point Escapes"
			anchor="EscapeSyntax">
	   <t> There are many options for code point escapes, some of
		  which are summarized below.  All are equivalent in content
		  and semantics -- the differences lie in syntax.   The best
		  choice of syntax for a particular protocol or other
		  application depends on that application: one form may
		  simply "fit" better in a given context than others.  It is
		  clear, however, that hexadecimal values are preferable to
		  other alternatives: Systems based on decimal or octal
		  offsets SHOULD NOT be used.</t>
	   <t>Since this specification does not recommend one specific
		  syntax, protocol specifications that use escapes MUST
		  define the syntax they are using, including any necessary
		  escapes to permit the escape sequence to be used
		  literally.</t> 
	   <t>The application designer selecting a format should consider
		  at least the following factors:
		  <list style="symbols">
			 <t> If similar or related protocols already use one form,
				it may be best to select that form for consistency and
				predictability.</t>
			 <t> A Unicode code point can fall in the range from
				U+0000 to U+10FFFF.  Different escape systems may use
				four, five, six, or eight hexadecimal digits.   To
				avoid clever syntax tricks and the consequent risk of
				confusion and errors, forms that use explicit string
				delimiters are generally preferred over other
				alternatives.  In many contexts, symmetric paired
				delimiters are easier to recognize and understand than
				visually unrelated ones.</t>
			 <t>Syntax forms starting in "\u", without explicit
				delimiters, have been used in several different escape
				systems, including the four or eight digit syntax
				of C <xref target="ISO-C"/> 
				(see <xref target="C-form"/>), the UTF-16 encoding
				of Java <xref target="Java"/>
				(see <xref target="Java-form"/>), and some arrangements that
				may follow the "\u" with four, five, or six digits.
				The possible confusion about which option is actually
				being used may argue against use of any of these
				forms.</t>
			 <t> Forms that require decoding surrogate pairs share
				most of the problems that appear with encoding of
				UTF-8 octets.  Internet protocols SHOULD NOT use
				surrogate pairs.</t>
			 </list></t>
	</section>
<?rfc needLines="10" ?>
	<section
		title="Recommended Presentation Variants for Unicode Code Point
			Escapes">
		  <t>There are a number of different ways to represent a
			 Unicode code point position.  None of them appears to
			 be "best" for all contexts.   In addition, when an escape
			 is needed for the escape mechanism itself, the optimal
			 one of those might differ from one context to another.</t>
		  <t> Some forms that are in popular use and that might
			 reasonably be considered for use in a given protocol
			 are described below and identified with a current-use
			 context when feasible.  The two in this section are
			 recommended for use in Internet Protocols.  Other popular
			 ones appear in <xref target="NotRecommended"/> with some
				discussion of their disadvantages.</t>
	<section title="Backslash-U with Delimiters"
			 anchor="BackslashU">
	     <t>  One of  the recommended forms is a variation of the
				many forms that start in "\u"
				(See, e.g., <xref target="C-form"/>, below>), but uses
				   explicit delimiters for the reasons discussed
				   elsewhere.</t>
			<t> Specifically, in ABNF <xref target="RFC5234"/>,
	<list style="hanging">
	   <t hangText="EmbeddedUnicodeChar =">
		  %x5C.75.27 4*6HEXDIG %x27
		  <vspace blankLines="0"/>
		  ; starting with lowercase "\u" and "'" and ending with "'".
		  <vspace blankLines="0"/>
		  ;  Note that the encodings are considered to be abstractions
		  <vspace blankLines="0"/>
		  ;  for the relevant characters, not designations of specific
		  <vspace blankLines="0"/>
		  ;  octets.</t>
	   <t hangText="HEXDIG =">
		  "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" /
		   "9" / "A" / "B" / "C" / "D" / "E" / "F"
	      <vspace blankLines="0"/>
	      ; effectively identical with definition in RFC 5234.</t>
	   </list>
			Protocol designers of applications using this form should
			specify a way to escape the introducing backslash ("\"),
			if needed. "\\" is one obvious possibility, but not
			the only one.</t>
	</section>

	<section title="XML and HTML">
		<t>The other recommended form is the one used in XML.  It
		   uses the form  <![CDATA["&#xNNNN;"]]>.  Like the Perl
		   form (<xref target="PerlString"/>), this form has a clear
		   ending delimiter, reducing 
		   ambiguity.  HTML uses a similar form, but the semicolon may
		   be omitted in some cases.  If that is done, the advantages
		   of the delimiter disappear so that the HTML form without
		   the semicolon SHOULD NOT be used.  However, this format is
		   often considered ugly and awkward outside of its
		   native HTML, XML, and similar contexts.</t>
<?rfc needLines="5" ?>
		<t>In ABNF:
		   <list style="hanging">
			  <t hangText="EmbeddedUnicodeChar = ">
				 <![CDATA[
				 %x26.23.78 2*6HEXDIG %x3B]]>
				 <vspace blankLines="0"/>
				 <![CDATA[; starts with "&#x" and ends with ";"]]>
			  </t>
			  </list>
			  Note that a literal "&amp;" can be expressed by
			  <![CDATA["&#x26;"]]>
			  when using this style.</t>
		</section>
		</section>

		<section title="Forms that Are Normally Not Recommended"
				 anchor="NotRecommended">
		   
		  <section title="The C Programming Language: Backslash-U"
				   anchor="C-form">
			 <t>  The forms
				<list style="empty">
				<t>\UNNNNNNNN (for any Unicode character) and</t>
			    <t>\uNNNN (for Unicode characters in plane 0)</t>
				</list>
				are utilized in the C Programming
			Language <xref target="ISO-C"/> when an ASCII escape for embedded
			Unicode characters is needed.</t>
			<t>There are disadvantages of this form that may be
			   significant.  First, the use of a case variation
			   (between "u" for the four-digit form and "U" for the
			   eight-digit form) may not seem natural in environments
			   where uppercase and lowercase characters are
			   generally considered equivalent and might be
			   confusing to people  
			   who are not very familiar with Latin-based alphabets
			   (although those people might have even more trouble
			   reading relevant English text and explanations).
			   Second, as discussed in <xref target="EscapeSyntax"/>,
			   the very fact that there are several different 
			   conventions that start in \u or \U may become a source
			   of confusion as people make incorrect assumptions about
			   what they are looking at.</t>
	</section>
	
	<section title="Perl: A Hexadecimal String" anchor="PerlString">
	   <t> Perl uses the form \x{NNNN...}.  The advantage of
		   this form is that there are explicit delimiters,
		   resolving the issue of having variable-length
		   strings or using the case-change mechanism of the
		   proposed form to distinguish between Plane 0 and
		   more general forms.  Some other programming
		   languages would tend to favor X'NNNN...' forms for
		   hexadecimal strings and perhaps U'NNNN...' for
		   Unicode-specific strings, but those forms do not
		   seem to be in use around the IETF.</t>
		   <t> Note that there is a possible ambiguity in how
			  two-character or low-numbered sequences in this
			  notation are understood, i.e., that octets in the range
			  \x(00) through \x(FF) may be construed as being in the
			  local character set, not as Unicode code points.
			  Because of this apparent ambiguity, and because IETF
			  documents do not contain provision for pragmas (see
			  <xref target="PERLUniIntro"/> for more information about
			  the "encoding" pragma in Perl and other details), the
			  Perl forms should be used with extreme caution, if at
			  all.</t>
		</section>
	<section title="Java: Escaped UTF-16" anchor="Java-form">
		<t>Java <xref target="Java"/> uses the form \uNNNN, but as a
		   reference to UTF-16 values, not to Unicode code points.  While
		   it uses a syntax similar to that described
		   in <xref target="C-form"/>, this relationship to UTF-16
		   makes it, in many respects, more similar to the encodings
		   of UTF-8 discussed above than to an escape that designates
		   Unicode code points.  Note that the UTF-16 form, and hence,
		   the Java escape notation, can represent
		   characters outside Plane 0 (i.e., above U+FFFF)
		   only by the use of surrogate pairs, raising some of the same
		   issues as the use of UTF-8 octets discussed above.
		   For characters in Plane 0, the Java form is
		   indistinguishable from the Plane 0-only form
		   described in <xref target="C-form"/>.  If only for that
		   reason, it SHOULD NOT be used as an escape except in those
		   Java contexts in which it is natural.</t>
		</section>
	   </section>

    <section title="Security Considerations" anchor="Security">
    <t> This document proposes a set of rules for encoding
	   Unicode characters when other considerations do not apply.
	   Since all of the recommended encodings are unambiguous and
	   normalization issues are 
	   not involved, it should not introduce any security issues that
	   are not present as a result of simple use of non-ASCII
	   characters, no matter how they are encoded.  The mechanisms
	   suggested should slightly lower the risks of confusing users
	   with encoded characters by making the identity of the
	   characters being used somewhat more obvious than some of the
	   alternatives.</t>
	<t>An escape mechanism such as the one specified in this
	   document can allow characters to be represented in more
	   than one way. Where software interprets the escaped form, there is
	   a risk that security checks, and any necessary checks for,
	   e.g., minimal or normalized forms, are done at the wrong
	   point.</t>
    </section>
	
    <section title="Acknowledgments" anchor="Acknowledgments">
    <t> This document was produced in response to a series of
	   discussions within the IETF Applications Area and as part of
	   work on email internationalization and internationalized domain
	   name updates.  It is a synthesis of a large number of
	   discussions, the comments of the participants in which are
	   gratefully acknowledged.   The help of Mark Davis in
	   constructing a list of alternative presentations and selecting
	   among them was especially important.</t>
	<t> Tim Bray, Peter Constable, Stephane Bortzmeyer, Chris Newman,
	   Frank Ellermann, Clive D.W. Feather,
	   Philip Guenther, Bjoern Hoehrmann, 
	   Simon Josefsson, Bill McQuillan,
	   der Mouse, <!-- der Mouse <mouse@Rodents.Montreal.QC.CA> -->
	   Phil Pennock, and Julian Reschke
	   provided careful reading and some corrections 
	   and suggestions on the various versions of this document.  Taken together, their
	   suggestions motivated the significant revision of this document
	   and its recommendations between version -00 and version
	   -01 and further improvements in the subsequent versions.</t> 
    </section>

  </middle>
 
  <back>
<references title="Normative References">

	  &rfc3629;   <!-- UTF-8 -->
	  &rfc2119;   <!-- requirement levels -->
	  &rfc5234;   <!-- ABNF -->

<!-- Frank suggsts a normative ref to W3C CharMod:   ????
	  There should be a normative reference to the [CharMod] bible, and
	  especially to its conformance criteria C042 up to C048 starting at
	  http://www.w3.org/TR/charmod/#C042
	     -->

<reference anchor="Unicode">
	<front>
		 <title abbrev="Unicode 5.0">
	 The Unicode Standard, Version 5.0
		 </title>
	<author>
	<organization>The Unicode Consortium</organization>
	<address />
	</author>
	<date year="2006" />
	</front>
	<annotation>
	(Addison-Wesley, 2006. ISBN 0-321-48091-0).</annotation>
</reference>

<reference anchor="ISO10646">
	<front>
	<title> Information Technology -- Universal Multiple-
		Octet Coded Character Set (UCS)</title>
	<author>
	<organization>International Organization for Standardization</organization>
	<address />
	</author>
	<date year="2003" month="December" />
	</front>
	<seriesInfo name="ISO/IEC" value="10646:2003" />
</reference>
 
    </references>

<references title="Informative References">

   &rfc2277;   <!-- IETF Charset policy -->
   &rfc3492;   <!-- punycode -->

<reference anchor="UnicodeGlossary"
		   target="http://www.unicode.org/glossary">
	<front>
		 <title abbrev="Unicode Glossary">
	 Glossary of Unicode Terms
		 </title>
	<author>
	<organization>The Unicode Consortium</organization>
	<address />
	</author>
	<date year="2007" day="26" month="June" />
	</front>
</reference>   

<reference anchor='ASCII'>
        <front>
          <title>USA Code for Information Interchange</title>
          <author>
            <organization abbrev="ANSI">
              American National Standards Institute
                (formerly United States of America Standards Institute)
	    </organization>
          </author>
          <date year="1968"/>
        </front>
      <seriesInfo name="ANSI" value="X3.4-1968" />
      <annotation>ANSI X3.4-1968 has been replaced by newer
	  versions with slight modifications, but the 1968 version
	  remains definitive for the Internet. </annotation>
      </reference>

<reference anchor="W3C-CharMod"
		   target="http://www.w3.org/TR/charmod/">
        <front>
          <title>Character Model for the World Wide Web 1.0</title>
          <author initials="M.J." surname="Duerst" fullname="Martin J.
				  Duerst">
            <organization abbrev="W3C">W3C</organization>
          </author>
          <date year="2005" month="February" day="15"/>
        </front>
       <seriesInfo name="W3C" value="Recommendation" />
	   </reference>

<reference anchor="ISO-C">
	<front>
	<title>Information technology --  Programming languages -- C</title> 
	<author>
	<organization>International Organization for Standardization</organization>
	<address />
	</author>
	<date year="1999" />  <!-- Confirmed Jan 2005 -->
	</front>
	<seriesInfo name="ISO/IEC" value="9899:1999" />
</reference>

<reference anchor="Java"
  target="http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#95413p">
	<front>
	<title>Java Language Specification, Third Edition</title> 
	<author>
	<organization>Sun Microsystems, Inc.</organization>
	<address />
	</author>
	<date year="2005" />
	</front>
</reference>

<reference anchor="PERLUniIntro"
   target="http://perldoc.perl.org/perluniintro.html">
   <!-- N.B.: per Phil Pennock 'PerlIntro' usually refers to a
   different document -->
   <front>
	  <title>perluniintro</title>
	  <author surname="Hietaniemi" initials="J"
			  fullname="Jarkko Hietaniemi">
		 <organization /> <address />
		 </author>
	  <date year="2002"/>
	  </front>
   <seriesInfo name="Perl documentation" value=" 5.8.8" />
   </reference>
		
</references>
<?rfc needLines="30" ?>
	   <appendix title="Formal Syntax for Forms Not Recommended">
	  <t>While the syntax for the escape forms that are not
		 recommended above (see <xref target="NotRecommended"/>) are
		 not given inline in the hope of discouraging their use, they
		 are provided in this appendix in the hope that those who
		 choose to use them will do so consistently.  The reader is
		 cautioned that some of these forms are not defined precisely
		 in the original specifications and that others have evolved
		 over time in ways that are not precisely consistent.
		 Consequently, these definitions are not normative and may not
		 even precisely match reasonable interpretations of their
		 sources.</t>
	  <t>The definition of "HEXDIG" for the forms that follow appears
		 in <xref target="BackslashU"/>.</t>

	  <section title="The C Programming Language Form">
		 			<t> Specifically, in ABNF <xref target="RFC5234"/>,
	<list style="hanging">
	   <t hangText="EmbeddedUnicodeChar =">
		  BMP-form / Full-form</t>
	   <t hangText="BMP-form =">
		  %x5C.75 4HEXDIG    ; starting with lowercase "\u"
		  <vspace blankLines="0"/>
		  ;  The encodings are considered to be abstractions for the 
		  <vspace blankLines="0"/>
		  ; relevant characters, not designations of specific octets.
		  </t>
	   <t hangText="Full-form =">
		  %x5C.55 8HEXDIG   ; starting with uppercase "\U"
		  </t>
	   </list></t>
	</section>
	
	<section title="Perl Form">
	   <t>
		   <list style="hanging">
			  <t hangText="EmbeddedUnicodeChar = ">
				 <![CDATA[
				 %x5C.78 "{" 2*6HEXDIG "}"  ; starts with "\x" ]]>
			  </t>
			  </list></t>
	</section>
	<section title="Java Form">
	   <t>
		  <list style="hanging">
			  <t hangText="EmbeddedUnicodeChar = ">
				 <![CDATA[
				 %x5C.7A 4HEXDIG  ; starts with "\u" ]]>
			  </t>
			  </list></t>
        </section>
	</appendix>


</back>
</rfc>
