chapter02.tex

% -*- coding: utf-8 -*-
% This file is part of TeX by Topic
% Copyright 2007-2014 Victor Eijkhout
% Translated by LiYanrui@bbs.ctex.org
% Translated by zoho@bbs.ctex.org
\documentclass{book}

\input{preamble}
\setcounter{chapter}{1}

\begin{document}

%\chapter{Category Codes and Internal States}\label{mouth}
\chapter{类别码与内部状态}\label{mouth}

%When characters are read, 
%\TeX\ assigns them
%category codes. The reading mechanism has three internal
%states, and transitions between these states are affected
%by category codes of characters in the input.
%This chapter describes how \TeX\ reads its input and
%how the category codes of characters influence the
%reading behaviour. Spaces and line ends are discussed.
在读取字符时，\TeX\ 以类别码赋之。\TeX\ 的输入处理器有三种内部状态，
而且输入处理器在这三种内部状态之间的转换以字符的类别码作为表征。
本章主要讲述 \TeX\ 如何读取字符以及类别码如何影响它的读取行为，
附加讨论一下有关空格与行尾的问题。

%\label{cschap:endlinechar}\label{cschap:ignorespaces}\label{cschap:catcode}\label{cschap:char32}\label{cschap:obeylines}\label{cschap:obeyspaces}
%\begin{inventory}
%\item [\cs{endlinechar}]  
%      The character code of the end-of-line character 
%      appended to input lines.
%      \IniTeX\ default:~13.
%\item [\cs{par}]  
%      Command to close off a paragraph and go into vertical mode.
%      Is generated by empty lines.
\label{cschap:endlinechar}\label{cschap:ignorespaces}\label{cschap:catcode}
\label{cschap:char32}\label{cschap:obeylines}\label{cschap:obeyspaces}
\begin{inventory}
\item [\cs{endlinechar}]  
      添加到输入行末尾的行结束符的字符码。\IniTeX\ 默认为~13。
\item [\cs{par}]  
      结束当前段落并进入竖直模式。可以用空行生成。

%\item [\cs{ignorespaces}]   
%      Command that reads and expands until something is
%      encountered that is not a \gr{space token}.
\item [\cs{ignorespaces}]   
      读取并展开直到遇到非 \gr{space token}。

%\item [\cs{catcode}] 
%      Query or set category codes.
\item [\cs{catcode}] 
      查询或者设置类别码。

%\item [\cs{ifcat}]  
%      Test whether two characters have the same category code.
\item [\cs{ifcat}]  
      检测两个字符的类别码是否相同。

%\item [\cs{\char32}]
%      Control space.
%      Insert the same amount of space that a space token would
%      when \cs{spacefactor}${}=1000$.
\item [\cs{\textvisiblespace}]
      控制空格。插入与 \cs{spacefactor}${}=1000$ 时的空格记号相同大小的空白。

%\item [\cs{obeylines}]
%      Macro in plain \TeX\ to make line ends significant.
\item [\cs{obeylines}]
      用于保留行结束符的 Plain \TeX\ 宏。

%\item [\cs{obeyspaces}]
%      Macro in plain \TeX\ to make (most) spaces significant.
%\end{inventory}
\item [\cs{obeyspaces}]
      用于保留（大多数）空格的 Plain \TeX\ 宏。
\end{inventory}

%\section{Introduction}
\section{概述}

%\TeX's input processor scans input lines from a file or terminal, and
%makes tokens out of the characters.
%The input processor can be viewed as
%a simple finite state automaton with three internal states; 
%depending on the state its scanning behaviour may differ.
%This automaton will be treated here both from the point of view of the
%internal states and of the category codes governing the
%transitions.
\TeX\ 的输入处理器从文件或终端中扫描输入的文本行，将字符转化为记号。输入处理器可
视为一种简单的有限状态自动机，具有三种内部状态，不同的状态对应不同的扫描行为。本章
分别从内部状态和类别码这两个角度考察这个自动机。

%\section{Initial processing}
\section{初始处理}

%Input from a file (or from the user terminal, but this
%will not be mentioned specifically
%most of the time) is handled one line at a time.
%Here follows a discussion of what exactly is an input line
%for \TeX.
\TeX\ 对输入文件（也可能是来自终端的输入，但实际很少有人使用，下文不再刻意提它）%
是逐行处理的，因此首先要讨论 \TeX\ 输入处理器是如何识别输入行的。

%Computer systems differ with respect to 
%\index{line! input}\index{line! end}\index{machine independence}
%the exact definition of an input
%\mdqon
%line. The carriage return/""line feed
%\mdqoff
%\message{slash-dash}%
%sequence terminating a line is most common,
%but some systems use just a line feed, and
%some systems with fixed record length (block) storage do not have
%a line terminator at all. Therefore \TeX\ has its
%own way of terminating an input line.
不同的计算机系统对输入行有不同的定义。
\index{line!input}\index{line!end}\index{machine independence}
最常见的方式是采用回车符加换行符作为行终止符，但是有些系统只使用换行符，
还有一些系统是固定宽度的输入行（块存储）而根本不使用终止符。
为了对这些系统一视同仁，\TeX\ 必须要掌控输入行的终止方式，大致步骤如下：

%\begin{enumerate}
%\item An input line is read from an input file  (minus the
%line terminator, if any).
%\item Trailing spaces are removed (this is for the systems
%with block storage, and it prevents confusion because these
%spaces are hard to see in an editor).
%\item The \csterm endlinechar\par, by default \gram{return}
%(code~13) is appended.
%If the value of \cs{endlinechar} is negative
%\label{append:elc}%
%or more than~255 (this was 127 in versions of \TeX\ older
%than version~3; see page~\pageref{2vs3} for more differences),
%no character is appended. 
%The effect then is the same as
%if the line were to end with a comment character.
%\end{enumerate}
\begin{enumerate}
\item 从输入文件读取一行（去掉输入行终止符，如果有的话）。
\item 移除行尾空格（这是针对采用块存储的系统的操作，而且也避免了混乱，
因为在编辑器中行尾空格通常是不可见的）。
\item 将 \csterm endlinechar\par （默认为\gram{return}，其 \ascii\ 码为 13）%
添加到输入行尾部。如果 \cs{endlinechar} 的值为负值或者大于 255%
（在 \TeX\ 3 之前则为大于 127；见第~\pageref{2vs3}~页介绍的更多差异），
\label{append:elc}%
行尾不需要添加字符；其效果与该行以注释符结尾相同。
\end{enumerate}


%Computers may also differ in the character encoding
%(the most common schemes are \ascii{} and \ebcdic{}), so \TeX\
%converts the characters that are read from the file to its
%own character codes. These codes are then used exclusively,
%so that \TeX\ will perform the same on any system.
%For more on this, see Chapter~\ref{char}.
不同的计算机系统可能在字符编码方面也存在区别%
（最常见的编码是 \ascii{} 和 \ebcdic{}），
因此 \TeX\ 必须要将文件输入的字符编码转换为它的内部编码，
藉此 \TeX\ 可以兼容任何系统中的字符编码。更多内容详见第~\ref{char}~章。

%\section{Category codes}
\section{类别码}

%Each of the 256 character codes (0--255) has an
%associated \indexterm{category code}, though not necessarily always the same one.
%There are 16 categories, numbered 0--15. 
%When scanning the input, \TeX\
%thus forms character-code--category-code pairs.
%The input processor sees only these pairs; from them are formed
%character tokens, control sequence tokens, and parameter tokens.
%These tokens are then passed to \TeX's expansion and execution
%processes.
256 个字符码（0--255）的每一个都关联一个不尽相同的\indexterm{类别码}。
共有 16 个类别，编号从 0 到 15。在扫描输入行的过程中，
\TeX\ 会生成(字符码,类别码)对。\TeX\ 的输入处理器的眼里只有(字符码,类别码)对，
从中生成字符记号、控制序列记号和参数记号。
这些记号随后被传送到 \TeX\ 的展开处理器与执行处理器。

%A~character token is a character-code--category-code
%pair that is passed unchanged.
%A~control sequence token consists of one or more characters
%preceded by an escape character; see below.
%Parameter tokens are also explained below.
字符记号是(字符码,类别码)对，它在展开处理器与执行处理器中不会被改变。
控制序列记号是由一个或多个前缀为转义符的字符构成，详见下文。
参数记号的解释也详见下文。

%This is the list of the categories, together with a brief
%description. More elaborate explanations follow in this and
%later chapters.
%\begin{enumerate} \message{set counter}%\SetCounter:item=-1
%\setcounter{enumi}{-1}
%\item\label{ini:esc}\index{category!0} Escape character; this signals
%  the start of a control sequence. \IniTeX\ makes the backslash
%  \verb-\- (code~92) an escape character.
%\item\index{category!1} Beginning of group; such a character causes
%  \TeX\ to enter a new level of grouping. The plain format makes the
%  open brace \verb-{- \mdqon a beginning"-of-group character.  \mdqoff
%\item\index{category!2} End of group; \TeX\ closes the current level
%  of grouping.  Plain \TeX\ has the closing brace \verb-}- as
%  end-of-group character.
%\item\index{category!3} Math shift; this is the opening and closing
%  delimiter for math formulas. Plain \TeX\ uses the dollar
%  sign~\verb-$- for this.
%\item\index{category!4} Alignment tab; the column (row) separator in
%  tables made with \cs{halign} (\cs{valign}). In plain \TeX\ this is
%  the ampersand~\verb-&-.
%\item\index{category!5}\label{ini:eol} End of line; a character that
%  \TeX\ considers to signal the end of an input line.
%  \IniTeX\ assigns this code to the \gram{return}, that is, code~13.
%  Not coincidentally, 13~is also the value that \IniTeX\ assigns to
%  the \cs{endlinechar} parameter; see above.
%\item\index{category!6} Parameter character; this indicates parameters
%  for macros.  In plain \TeX\ this is the hash sign~\verb-#-.
%\item\index{category!7} Superscript; this precedes superscript
%  expressions in math mode. It is also used to denote character codes
%  that cannot be entered in an input file; see below.  In plain
%  \TeX\ this is the circumflex~\verb-^-.
%\item\index{category!8} Subscript; this precedes subscript expressions
%  in math mode.  In plain \TeX\ the underscore~\verb-_- is used for
%  this.
%\item\index{category!9} Ignored; characters of this category are
%  removed from the input, and have therefore no influence on further
%  \TeX\ processing. In plain \TeX\ this is the \gr{null} character,
%  that is, code~0.
%\item\index{category!10}\label{ini:sp} Space; space characters receive
%  special treatment.  \IniTeX\ assigns this category to the \ascii{}
%  \gr{space} character, code~32.
%\item\index{category!11}\label{ini:let} Letter; in \IniTeX\ only the
%  characters \n{a..z}, \n{A..Z} are in this category. Often, macro
%  packages make some `secret' character (for instance~\n@) into a
%  letter.
%\item\index{category!12}\label{ini:other} Other; \IniTeX\ puts
%  everything that is not in the other categories into this
%  category. Thus it includes, for instance, digits and punctuation.
%\item\index{category!13} Active; active characters function as a
%  \TeX\ command, without being preceded by an escape character.  In
%  plain \TeX\ this is only the tie character~\verb-~-, which is
%  defined to produce an unbreakable space; see page~\pageref{tie}.
%\item\index{category!14}\label{ini:comm} Comment character; from a
%  comment character onwards, \TeX\ considers the rest of an input line
%  to be comment and ignores it. In \IniTeX\ the per cent sign \verb-%-
%  is made a comment character.
%\item\index{category!15}\label{ini:invalid} Invalid character; this
%  category is for characters that should not appear in the
%  input. \IniTeX\ assigns the \ascii\ \gr{delete} character, code~127,
%  to this category.
%\end{enumerate}
下面是 16 个类别列表的大致解释，更多的细节知识散布于后文以及后续各章之中。
\begin{enumerate} \message{set counter}%\SetCounter:item=-1
\setcounter{enumi}{-1}
\item\label{ini:esc}\index{category!0}转义符：
用于表示控制序列的开始。\IniTeX\ 使用反斜线 \verb-\-（ASCII 码为~92）作为转义符。

\item\index{category!1}组开始符：
此类字符可让 \TeX\ 进入新一层的编组。在 Plain \TeX\ 中，组开始符默认是 \verb-{-。

\item\index{category!2}组结束符：
此类字符可让 \TeX\ 结束当前层的编组。在 Plain \TeX\ 中，组结束符默认是 \verb-}-。

\item\index{category!3}数学切换符：
置于数学公式两侧，向 \TeX\ 表示这是数学公式。在 Plain \TeX\ 中，
数学切换符默认为 \verb-$-。

\item\index{category!4}制表符：
在 \cs{halign}（\cs{valign}）制作的表格中作为列（行）的分割符。
在 Plain \TeX\ 中，制表符默认为 \verb-&-。

\item\index{category!5}\label{ini:eol}行结束符：
用于表示此处为输入行的结束之处。
\IniTeX\ 默认将 \gram{return} 字符（ASCII 码为~13）视为行结束符，
所以 \IniTeX\ 将 13 作为 \cs{endlinechar} 的值并非巧合；见下面所述。 

\item\index{category!6}参数符：
用于表示宏的参数。Plain \TeX\ 默认使用~\verb-#-~作为参数符。

\item\index{category!7}上标符：
在数学模式中用于表示上标，也可用于表示那些无法直接在文本中输入的字符；
见下面所述。Plain \TeX\ 默认使用 \verb-^- 作为上标符。

\item\index{category!8}下标符：
在数学模式中用于表示下标。Plain \TeX\ 使用下划线 \verb-_- 作为下标符。

\item\index{category!9}可忽略符：
\TeX\ 将会从输入中去掉此类字符，因此它不会影响 \TeX\ 的后续处理。
Plain \TeX\ 使用 \gr{null} 字符（ASCII 码为0）作为可忽略符。

\item\index{category!10}\label{ini:sp}空格符：
这个符号会受到 \TeX\ 的特殊礼遇，它默认被 \IniTeX\ 赋予 \gr{space} 字符（ASCII 码为 32）。

\item\index{category!11}\label{ini:let}字母符：
对于该类字符，\IniTeX\ 只定义了 \n{a..z} 和 \n{A..Z} 这些。通常在写宏包的时候，
为了避免宏名冲突，宏包作者通常会将某些非字母符（例如~\n@）打扮为字母符而使用。

\item\index{category!12}\label{ini:other}其他字符：
\IniTeX\ 将不属于其他 15 类的字符归到该类，最常见的是数字、标点符号等。

\item\index{category!13}活动符：
活动符在功能上相当于 \TeX\ 控制序列，但是它不需要转义符作为前缀。
在 Plain \TeX\ 中只有~\verb-~-~是活动符，用于产生不可断行的空格；
见第~\pageref{tie}~页。

\item\index{category!14}\label{ini:comm}注释符：
\TeX\ 将忽略从注释符开始的该行所有字符。\IniTeX\ 使用分号~\verb-%-~作为注释符。

\item\index{category!15}\label{ini:invalid}无效符：
这个字符类是为那些不应该在 \TeX\ 输入中出现的字符而设置的。
\IniTeX\ 将 \gr{delete} 字符（ASCII 码为~127）归入此类。
\end{enumerate}

%The user can change the mapping 
%of character codes to category codes
%with the \csterm catcode\par\ command (see Chapter~\ref{gramm}
%for the explanation of concepts such as~\gr{equals}):
%\begin{disp}\cs{catcode}\gram{number}\gr{equals}\gram{number}.\end{disp}
%In such a statement, the first number is often given in the form
%\begin{disp}\verb>`>\gr{character}\quad or\quad \verb>`\>\gr{character}\end{disp}
%both of which denote the character code of the character
%(see pages \pageref{char:code} and~\pageref{int:denotation}).
用户可以修改任意字符的类别码，途径是使用 \csterm catcode\par\ 命令%
（见第~\ref{gramm}~章对诸如 \gr{equals} 的概念的解释）：
\begin{disp}\cs{catcode}\gram{number}\gr{equals}\gram{number}.\end{disp}
此语句的第一个参数是需要修改类别码的字符的编码，它通常可用下面形式给出：
\begin{disp}\verb>`>\gr{character}\quad 或\quad \verb>`\>\gr{character}\end{disp}
这两种写法都表示该字符的字符码（见第~\pageref{char:code}~和~\pageref{int:denotation}~页）。

%The plain format defines
%\csterm active\par
%\begin{verbatim}
%\chardef\active=13
%\end{verbatim} 
%so that one can write statements such as
%\begin{verbatim}
%\catcode`\{=\active
%\end{verbatim}
%The \cs{chardef} command is  treated
%on pages \pageref{chardef} and~\pageref{num:chardef}.
Plain \TeX\ 格式将 \csterm active\par 定义为：
\begin{verbatim}
\chardef\active=13
\end{verbatim} 
因此你可以像下面这样写
\begin{verbatim}
\catcode`\{=\active
\end{verbatim}
上面的 \cs{chardef} 命令将在第 \pageref{chardef} 和 \pageref{num:chardef} 页中介绍。

%The \LaTeX\ format has the control sequences
%\begin{verbatim}
%\def\makeatletter{\catcode`@=11 }
%\def\makeatother{\catcode`@=12 }
%\end{verbatim}
%in order to switch on and off the `secret' character~\n@
%(see below).
\LaTeX\ 格式有下面这样的控制序列：
\begin{verbatim}
\def\makeatletter{\catcode`@=11 }
\def\makeatother{\catcode`@=12 }
\end{verbatim}
它可用于开启或关闭“隐秘”字符 \n@（见下述）。

%The \cs{catcode} command can also be used to query category
%codes: in 
%\begin{verbatim}
%\count255=\catcode`\{
%\end{verbatim}
%it yields a number, which can be assigned.
\cs{catcode} 命令也可用于查询类别码，例如：
\begin{verbatim}
\count255=\catcode`\{
\end{verbatim}
所得类别码存储于第 255 号计数寄存器。

%Category codes can be tested by
%\begin{disp}\cs{ifcat}\gr{token$_1$}\gr{token$_2$}\end{disp}
%\TeX\ expands whatever is after \cs{ifcat} until two 
%unexpandable tokens are found; these are then compared
%with respect to their category codes. Control sequence
%tokens are considered to have category code~16\index{category!16},
%which makes them all equal to each other, and unequal to
%all character tokens.
%Conditionals are treated further in Chapter~\ref{if}.
类别码可使用以下命令进行测试：
\begin{disp}\cs{ifcat}\gr{token$_1$}\gr{token$_2$}\end{disp}
无论 \cs{ifcat} 之后跟随的是一些什么东西，\TeX\ 都会将其展开，
直至发现两个不可展开的记号为止，然后去比较这两个记号的类别码是否相等。
控制序列的类别码被视为~16\index{category!16}，
这样它们的类别码都是相等的，而控制序列与字符记号的类别码总不相等。
条件语句在第~\ref{if}~章中将会仔细介绍.

%\section{From characters to tokens}
\section{从字符到记号}

%The input processor
%of \TeX\ scans input lines from a file or from the
%user terminal, and converts the characters in the input
%to tokens. There are three types of tokens.
%\begin{itemize}\item Character tokens: any character that is
%	passed on its own to \TeX's
%further levels of processing with an appropriate
%category code attached.
%\item Control sequence tokens, of which there are two kinds:
%	an escape character 
%\ldash that is,\message{ldash nobreak?}
%a character of category~0\index{category!0} \rdash  followed
%by a string of `letters' is
%lumped together into a {\em control word}, which is a single token.
%An escape character followed by a single character that is not of
%category~11\index{category!11}, letter, is made into a 
%\indextermsub{control}{symbol}.
%If the distinction between control word and control symbol is
%irrelevant, both are called 
%\indextermsub{control}{sequence}.
\TeX\ 的输入处理器对来自文件或用户终端的输入行进行扫描，
将其中的字符转化为记号。记号的类型分为以下三种：
\begin{itemize}
\item 字符记号：任何本身会被传递到 \TeX\ 后续处理器并具有相应的类别码的字符。
\item 控制序列记号：这种记号分为两种类型，第一种类型是{\em 控制词}，
由转义符\index{category!0}（即类别码为 0 的字符）后跟一串`字母'而成；
第二种类型是{\em 控制符}，由转义符后跟任何非字母（即类别码不是 11）
\index{category!11}的单个字符组成。在没必要区分控制词与控制符时，
可以将它们统称为{\em 控制序列}\index{控制!控制序列}。

%The control symbol that results from an escape character followed
%\csterm \char32\par
%by a space character is called 
%\indextermbus{control}{space}.
由转义符与一个空格字符 \csterm\char32\par 构成的控制序列，称为{\em 控制空格}。
\index{控制!控制空格}

%\item Parameter tokens: a parameter character \ldash that is, a
%  character of category~6\index{category!6}, by default~\verb=#=
%  \rdash followed by a digit \n{1..9} is replaced by a parameter
%  token.  Parameter tokens are allowed only in the context of macros
%  (see Chapter~\ref{macro}).
\item 参数记号：由参数字符（类别码为 6，Plain \TeX\ 中默认为 \verb=#=）%
尾随一位在 \n{1..9} 中的数字构成。参数记号只能在宏的环境中出现%
（见第~\ref{macro}~章）。

%A macro parameter character followed by another macro parameter
%character (not necessarily with the same character code)
%is replaced by a single character token.
%This token has category~6 (macro parameter), and the character
%code of the second parameter character.
%The most common instance is of this is
%replacing \n{\#\#} by~\n{\#$_6$}, where the subscript
%denotes the category code.
在宏的替换文本中，如果一个宏参数字符之后又跟随了一个宏参数字符%
（字符码可以不相同），那么它们会被替换为单个字符记号，
其类别码为 6（宏参数），字符码等于第 2 个参数字符的编码。
常见情形是输入行内的 \n{\#\#} 会被替换为 \n{\#$_6$}，这里的下标表示类别码。

%\end{itemize}
\end{itemize}

%\section{The input processor as a finite state automaton}
%\label{input:states}
\section{输入处理器视为有限状态自动机}
\label{input:states}

%\TeX's input processor can be considered to be a finite state 
%automaton with three \indextermbus{internal}{states},
%that is, at any moment in time it is in one of three states,
%and after transition to another state there is no memory of the
%previous states. 
\TeX\ 的输入处理器可视为三态的有限状态自动机，也就是说在任意的瞬间，
它都处于这三种\indextermbus{内部}{状态}的某一种状态之中，
并且在转移到另一种状态之后，对于前一状态没有任何记忆。

%\subsection{State {\italic N}: new line}
\subsection{状态 {\italic N}：新行}

%State {\italic N} is entered at the beginning of each new input line,
%and that is the only time \TeX\ is in this state.  In state~{\italic
%  N} all space tokens (that is, characters of
%category~10\index{category!10}) are ignored; an end-of-line character
%is converted into a \cs{par} token.  All other tokens bring \TeX\ into
%state~{\italic M}.
在每个输入行的开始处，\TeX\ 输入处理器便会进入状态 {\italic N}，
这是它唯一可进入这一状态的时刻。在这一状态中，
所有的空格记号（也就是类别码为 10 的字符\index{category!10}）会被忽略；
行结束符会被转化为 \cs{par} 记号。如果遇到其他记号，
那么输入处理器所处状态便会切换为状态 {\italic M}。

%\subsection{State {\italic S}: skipping spaces}
\subsection{状态 {\italic S}：忽略空格}

%State {\italic S} is entered in any mode after a control word or
%control space (but after no other control symbol),
%or, when in state~{\italic M}, after a space.
%In this state all subsequent spaces or end-of-line characters
%in this input line are discarded.
在任何状态的控制词或控制空格（其他控制符不在这一范畴）之后，
或者在状态 {\italic M} 的空格字符之后，输入处理器便会进入状态 {\italic S}。
在这一状态中，所有的后续空格或行结束符会被丢弃。

%%\spoint State {\italic M}: middle of line
%\subsection{State {\italic M}: middle of line}
%\spoint State {\italic M}: middle of line
\subsection{状态 {\italic M}：行内}

%By far the most common state is~{\italic M}, `middle of line'.
%It is entered after characters of categories
%1--4, 6--8, and 11--13, and after control symbols
%other than control space.
%An end-of-line character encountered in this state
%results in a space token.
显然状态 {\italic M} 是最寻常的状态。当 \TeX\ 的输入处理器遇到类别码为
1--4，6--8 以及 11--13 的字符或者控制符（不包括控制空格），
在其之后便进入状态 {\italic M}。在状态 {\italic M} 中，
如果输入处理器遇到了行结束符，它会将其转化为一个空格记号。

%%% \input figflow \message{left align flow diagram}
%%% \vskip12pt plus 1pt minus 4pt\relax %before spoint skip
%%% \begin{tdisp}%\PopIndentLevel
%%% \leavevmode\relax
%%% %\figmouth
%%% \message{fig mouth missing}
%%% \end{tdisp}
%% \input figflow \message{left align flow diagram}
%% \vskip12pt plus 1pt minus 4pt\relax %before spoint skip
%% \begin{tdisp}%\PopIndentLevel
%% \leavevmode\relax
%% %\figmouth
%% \message{fig mouth missing}
%% \end{tdisp}

%\input figs1
%\begin{quotation}
%  \figmouth
%\end{quotation}
\input figs1
\begin{quotation}
  \figmouth
\end{quotation}

%%\point[hathat] Accessing the full character set
%\section{Accessing the full character set}
%\label{hathat}
%\point[hathat] Accessing the full character set
\section{所有字符皆可信手拈来}
\label{hathat}

%Strictly speaking, \TeX's input processor
%is not a finite state automaton.
%This is because during the scanning of the input line
%all trios consisting of two {\sl equal\/} superscript characters 
%\index{\char94\char94\ replacement}
%(category code~7\index{category!7}) and a subsequent character
%(with character code~$<128$)
%are replaced by a single character with a character
%code in the range 0--127,
%differing by 64 from that of the original character.
严格地讲，\TeX\ 输入处理器并非有限状态自动机。这是因为在扫描输入行期间，
两个{\sl 相同\/}上标字符（类别码为 7 \index{\char94\char94\ replacement}）%
尾随一个编码小于 128 的字符（姑且称之为原字符）组成的三元组会被替换为字符码在
 0--127 之间的字符，新字符的编码与原字符的编码相差 64。

%This mechanism can be used, for instance, to access positions in a font
%corresponding to character codes that cannot
%be input, for instance because they are \ascii{} control characters.
%The most obvious examples are the \ascii{} \gr{return}
%and \gr{delete} characters; the corresponding 
%positions 13 and 127 in a font are
%accessible as \verb>^^M> and~\verb>^^?>.
%However, since the category of \verb>^^?> is 15\index{category!15}, invalid,
%that has to be changed before character 127 can be accessed.
这种字符访问机制主要用于访问那些难以输入的字符，例如像 \ascii\ 码中的
\gr{return} 和 \gr{delete} 字符。可分别使用 \verb>^^M> 和 \verb>^^?> 进行访问。
不过，由于 \verb>^^?> 的类别码是 15 \index{category!15}，属于无效符，
因此要访问编码为 127 的字符，必须先修改 \verb>^^?> 的类别码。

%In \TeX3 this mechanism has been 
%modified and extended to access 256 characters:
%any quadruplet \verb-^^xy- where both \n x and \n y are lowercase
%hexadecimal digits \n0--\n9, \n a--\n f, 
%is replaced by a character in the
%range 0--255, namely the character the number of which is
%represented hexadecimally as~\n{xy}.
%This imposes a slight restriction on the applicability
%of the earlier mechanism: if, for instance, \verb>^^a>
%is typed to produce character~33, then a following
%\n0--\n9, \n{a}--\n{f} will be misunderstood.
\TeX3 修改和扩展了这个机制以访问 256 个字符：
任何四元组 \verb-^^xy-，其中 \n x 和 \n y 为小写十六进制数字 \n0--\n9, \n a--\n f，
被替换为一个在 0--255 之间的字符，即十六进制表示为~\n{xy}~的字符。
这也稍微限制了前面机制的使用：设若键入了 \verb>^^a> 以生成字符 \verb>!>，
接着再键入 \n0--\n9 或 \n{a}--\n{f} 将被错误理解。

%While this process makes \TeX's input processor
%somewhat more powerful
%than a true finite state automaton,
%it does not interfere with the rest of
%the scanning. Therefore it is conceptually simpler to pretend that
%such a replacement of triplets or quadruplets
%of characters, starting with~\verb>^^>, is performed in advance. 
%In actual practice this is not possible,
%because an
%input line may assign category code~7\index{category!7} to some 
%character other than the circumflex, thereby 
%influencing its further processing.
这种字符访问机制使得 \TeX\ 的输入处理器比真正的有限状态自动机更强大，
并且不会妨碍 \TeX\ 输入处理器的其余扫描过程。
因而，为了更容易理解此概念，可以假装认为这种对 \verb>^^>
引导的三元组或四元组的字符替换是提前进行的。
实际上这是不可能的，因为输入行内有可能会将非上标符的类别码也设为 7
\index{category!7}，这样便会影响后续的处理了。

%%\point Transitions between internal states
%\section{Transitions between internal states}
%\point Transitions between internal states
\section{内部状态切换}

%Let us now discuss the effects on the internal state
%of \TeX's input processor when
%certain category codes are encountered in the input. 
现在我们来关注一下不同类别码的字符对 \TeX\ 的输入处理器内部状态的影响。

%%\spoint 0: escape character
%\subsection{0: escape character}
%\index{escape!character|see{character, escape}}
%\spoint 0: escape character
\subsection{0：转义符}
\index{escape!character|see{character, escape}}

%When an \indextermbus{escape}{character} is encountered,
%\TeX\ starts forming a control sequence token.
%Three different types of control sequence can result,
%depending on the category code of the character that
%follows the escape character.
在遇到{\em 转义符}\index{字符!转义符}时，\TeX\ 便开始形成一个控制序列记号。
控制序列记号有三种类型，依赖于转义符后面的字符的类别码。

%\begin{itemize}\item
%If the character following the escape is of category~11\index{category!11},
%letter, then \TeX\ combines the escape,
%that character and all following
%characters of category~11, into a control word.
%After that \TeX\
%goes into state~{\italic S}, skipping spaces.
%\item
%With a character of category~10\index{category!10}, space, a control
%symbol called control space results, and \TeX\ goes into
%state~{\italic S}.
%\item
%With a character of any other category code 
%a control symbol results, and \TeX\ goes into state~{\italic M},
%middle of line.
%\end{itemize}
\begin{itemize}
\item 如果转义符之后的字符的类别码为 11\index{category!11}，即字母，
那么 \TeX\ 便会将转义符、类别码为 11 的字符以及后续所有类别码为 11 的字符捆绑为一个控制词，
然后进入状态 {\italic S}，即忽略空格状态。
\item 如果转义符之后的字符的类别码为 10\index{category!10}，即空格，
那么 \TeX\ 便会产生一个控制空格，然后进入状态 {\italic S}。
\item 如果转义符之后的字符为其他类别码，那么 \TeX\ 便形成一个控制符，
然后 \TeX\ 进入状态 {\italic M}，即行内状态。
\end{itemize}

%The letters of a control sequence name have to be all on one line;
%a control sequence name is not continued on the next line
%if the current line ends with a comment sign, or if (by letting
%\cs{endlinechar} be outside the range~0--255) 
%there is no terminating character.
控制序列的名称所包含的字符必须居于同一行。即使当前行以注释符结束，
或者当前行没有行结束符（通过将 \cs{endlinechar} 设定到 0--255 之外实现），
控制序列字符也不能跨过两行。

%%\spoint 1--4, 7--8, 11--13: non-blank characters
%\subsection{1--4, 7--8, 11--13: non-blank characters}
%\spoint 1--4, 7--8, 11--13: non-blank characters
\subsection{1–4, 7–8, 11–13：非空字符}

%Characters of category codes 1--4, 7--8, and 11--13 are made
%into tokens, and \TeX\ goes into state~{\italic M}.
类别码属于 1-4、7-8、11-13 的字符会被转化为记号，然后 \TeX\ 进入状态 {\italic M}。

%%\spoint 5: end of line
%\subsection{5: end of line}
%\spoint 5: end of line
\subsection{5：行结束符}

%Upon encountering an end-of-line character, 
%\TeX\ discards the rest of the
%line, and starts processing the next line,
%in state~{\italic N}. If the current state was~{\italic N},
%that is, if the
%line so far contained at most spaces, a~\cs{par} token
%is inserted; if the state was~{\italic M}, a~space token is inserted,
%and in state~{\italic S} nothing is inserted.
遇到行结束符时，\TeX\ 会忽略当前行的剩余部分，然后进入状态 {\italic N} 开始处理下一行。
如果当前状态是 {\italic N}，即当前行只有空格，\TeX\ 就插入 \cs{par} 记号；
如果当前状态是 {\italic M}，那么就插入一个空格记号；
如果当前状态是 {\italic S}，就不插入任何记号。

%Note that by `end-of-line character' a character with category
%code~5 is meant. This is not necessarily the \cs{endlinechar},
%nor need it appear at the end of the line.
%See below for further remarks on line ends.
注意“行结束符”是类别码为 5 的字符，它可以不是\cs{endlinechar}，
也不必出现在行尾。要明白它，请继续阅读下文。

%%\spoint 6: parameter
%\subsection{6: parameter}
%\spoint 6: parameter
\subsection{6：参数符}

%A \indextermbus{parameter}{character} \ldash usually~\verb=#= \rdash  can be
%followed by either a digit \n{1..9} 
%in the context of macro definitions
%\altt
%or by another parameter character. 
%In the first case a `parameter token' results,
%in the second case only a single parameter character
%is passed on as a character token for further processing.
%In either case \TeX\ goes into state~{\italic M}.
在宏定义中，{\em 参数符}\index{字符}{参数符}通常为 \verb=#=，
其后可跟随数字 \n{1..9} 或者另一个参数符，前者产生的是“参数记号”，
后者产生的是单个字符记号。这两种情况都会导致 \TeX\ 都会进入状态 {\italic M}。

%A parameter character can also appear on its own in an
%alignment preamble (see Chapter~\ref{align}).
参数符在 Plain \TeX\ 中也被用于构建阵列的模板行（见第~\ref{align}~章）。

%%\spoint 7: superscript
%\subsection{7: superscript}
%\spoint 7: superscript
\subsection{7：上标符}

%A superscript character is handled like most non-blank
%characters, except in the case where it is followed
%by a  superscript character of the same character code.
%The process
%that replaces these two characters plus the following character
%(possibly two characters in \TeX3) by another character
%was described above.
上标符会像非空字符那样被处理，除非其后尾随一个相同字符码的上标符。
两个上标符及其尾随字符构成的三元或四元组的字符替换功能在前文已有阐述。

%%\spoint 9: ignored character
%\subsection{9: ignored character}
%\spoint 9: ignored character
\subsection{9：可忽略符}

%Characters of category 9 are ignored; \TeX\ remains in the same state.
类别码为 9 的字符会被忽略，并且 \TeX\ 会保持其状态不变。

%%\spoint 10: space
%\subsection{10: space}
%\spoint 10: space
\subsection{10：空格符}

%A token with category code 10 \ldash this is called a \gr{space token},
%irrespective of the character code \rdash 
%is ignored in states {\italic N} and~{\italic S} 
%(and the state does not change); 
%in state~{\italic M} \TeX\ goes into state~{\italic S}, inserting
%a token that has category~10 and character code~32 
%(\ascii{} space).
%This implies that the character code of the space token may change
%from the character that was actually input.
类别码为 10 的记号称为 \gr{space token}（空格记号），不管其字符码是什么。
在状态 {\italic N} 和 {\italic S}  中，\TeX\ 会忽略空格记号（而且其状态不变）；
在状态 {\italic M} 中 \TeX\ 会将它替换为类别码为 10 字符码为 32 的字符（\ascii\ 空格符），
并进入状态 {\italic S}。这意味着空格记号的字符码可能与从输入字符的编码不同。

%%\spoint 14: comment
%\subsection{14: comment}
%\spoint 14: comment
\subsection{14：注释符}

%A comment character causes \TeX\ to discard 
%the rest of the line, including the comment character.
%In particular, the end-of-line character is not seen,
%so even if the comment was encountered in state~{\italic M}, no space
%token is inserted.
注释符可使 \TeX\ 忽略输入行的后续文本，其中包含注释符本身。
特别地，注释符将导致 \TeX\ 看不到输入行的行结束符，
所以即使在状态 {\italic M} 中遇到注释符，\TeX\ 也不会插入空格记号。

%%\spoint 15: invalid
%\subsection{15: invalid}
%\spoint 15: invalid
\subsection{15：无效符}

%Invalid characters cause an error message. \TeX\ remains in
%the state it was in.
%However, in the context of a control symbol an invalid character
%is acceptable. Thus \verb>\^^?> does not cause any error messages.
无效符会导致 \TeX\ 报错。\TeX\ 的状态会停留在无效字符之前的状态。
不过，在控制符中的无效符是可以接受的，譬如 \verb>\^^?> 就不会导致 \TeX\ 报错。

%%\point[cat12] Letters and other characters
%\section{Letters and other characters}
%\label{cat12}
%\point[cat12] Letters and other characters
\section{字母符与其他字符}
\label{cat12}

%In most programming languages identifiers can consist
%of both letters and digits (and possibly some other
%character such as the underscore), but control sequences in \TeX\
%are only allowed to be formed out of characters of category~11,
%letter. Ordinarily, the digits and punctuation symbols have
%category~12, other character.
%However, there are contexts where \TeX\ itself
%generates a string of characters, all of which have
%category code~12, even if that is not their usual
%category code.
大部分编程语言的标识符可由字母与数字构成（也可能包含其他字符，例如下划线），
但是 \TeX\ 的控制词只能由类别码为 11 的字符形成。默认情况下，
数字与标点符号的类别码为 12（其他字符）。不过 \TeX\
可以产生各字符的类别码均为 12 的字符串，
尽管这些字符的原始类别码并非 12。

%This happens when the operations 
%\cs{string},
%\cs{number},
%\cs{romannumeral},
%\cs{jobname},
%\cs{fontname},
%\cs{meaning},
%and \cs{the}
%are used to generate a stream of character tokens.
%If any of the characters delivered by such a command
%is a space character (that is, character code~32), 
%it receives category code~10, space.
类别码为 12 的字符串可用
\cs{string}、
\cs{number}、
\cs{romannumeral}、
\cs{jobname}、
\cs{fontname}、
\cs{meaning}
以及 \cs{the} 等命令生成。
这些命令所产生的字符串中如果包含空格符，其类别码为 10。

%For the extremely rare case where a hexadecimal digit has been
%hidden in a control sequence, \TeX\ allows \n A$_{12}$--\n F$_{12}$
%to be hexadecimal digits, in addition to the ordinary
%\n A$_{11}$--\n F$_{11}$ (here
%the subscripts denote the category codes).
在极个别情况下十六进制数字会隐藏在控制序列中，
因而除了通常的 \n A$_{11}$--\n F$_{11}$ 之外，
\TeX\ 还允许 \n A$_{12}$--\n F$_{12}$ 作为十六进制数字（这里的下标表示类别码）。

%For example,
%\begin{disp}\verb>\string\end>\quad gives four character tokens\quad
%\n{\char92$_{12}$e$_{12}$n$_{12}$d$_{12}$} \end{disp}
%Note that the \indextermbus{escape}{character}~\texttt{\char`\\}$_{12}$\label{use:escape}
%is used in the output only because the
%value of \cs{escapechar} is the character code for the
%backslash. Another value of \cs{escapechar} leads to another
%character in the output of \cs{string}. 
%The \cs{string} command is treated further in Chapter~\ref{char}.
看下面的示例：
\begin{disp}\verb>\string\end>\quad 可以得到字符记号 \quad
\n{\char92$_{12}$e$_{12}$n$_{12}$d$_{12}$} \end{disp}
注意{\em 转义符}\index{字符!转义符} \texttt{\char`\\}$_{12}$\label{use:escape}
出现在输出中是因为 \cs{escapechar} 的值等于反斜线的字符码。
将 \cs{escapechar} 改为另一个值将使得 \cs{string} 输出另一个字符. 
这个 \cs{string} 命令将在第~\ref{char}~章中进一步介绍。

%Spaces can wind up in control sequences:
%\begin{disp}\verb>\csname a b\endcsname>\end{disp} gives a control sequence
%token in which one of the three characters is a space.
%Turning this control sequence token into a string of characters
%\begin{disp}\verb>\expandafter\string\csname a b\endcsname>\end{disp}
%gives \n{\char92$_{12}$a$_{12}$\char32$_{10}$b$_{12}$}.
空格是可以封到控制序列中的，例如
\begin{disp}\verb>\csname a b\endcsname>\end{disp}
给出的是一个控制序列记号，其中三个字符有一个是空格符。
将这个控制序列转化为字符串
\begin{disp}\verb>\expandafter\string\csname a b\endcsname>\end{disp}
可得 \n{\char92$_{12}$a$_{12}$\char32$_{10}$b$_{12}$}.

%As a more practical example, suppose there exists a sequence
%of input files \n{file1.tex}, \n{file2.tex}\label{ex:jobnumber},
%and we want to
%write a macro that finds the number of the input file
%that is being processed. One approach would be to write
%\begin{verbatim}
%\newcount\filenumber  \def\getfilenumber file#1.{\filenumber=#1 }
%\expandafter\getfilenumber\jobname.
%\end{verbatim}
%where the letters \n{file} in the parameter text of the
%macro (see Section~\ref{param:text}) absorb that part of the
%jobname, leaving the number as the sole parameter.
举个更实用一些的例子，假设有一系列输入文件 \n{file1.tex}、
\n{file2.tex}\label{ex:jobnumber}等。我们想写一个宏统计输入文件的序号，
一种方法是：
\begin{verbatim}
\newcount\filenumber  \def\getfilenumber file#1.{\filenumber=#1 }
\expandafter\getfilenumber\jobname.
\end{verbatim}
宏参数中的字符 \n{file}（见第~\ref{param:text}~节）会吸走
\cs{jobname} 中的 \n{file} 部分，
从而留下文件编号作为唯一的参数。

%However, this is slightly incorrect: the letters \n{file} resulting
%from the \cs{jobname} command have category code~12, instead of
%11 for the ones in the definition of \cs{getfilenumber}.
%This can be repaired as follows:
%\begin{verbatim}
%{\escapechar=-1
% \expandafter\gdef\expandafter\getfilenumber
%       \string\file#1.{\filenumber=#1 }
%}
%\end{verbatim}
%Now the sequence \verb>\string\file> gives the four
%letters \n{f$_{12}$i$_{12}$l$_{12}$e$_{12}$}; 
%the \cs{expandafter} commands let this be executed prior to
%the macro definition;
%the backslash is omitted because we put\handbreak \verb>\escapechar=-1>.
%Confining this value to a group makes it necessary to use~\cs{gdef}.
但是上述代码有误，宏参数中的 \n{file} 字符串的类别码为 11，
而 \cs{jobname} 中的 \n{file} 字符串的类别码为 12，
所以需要对上述代码进行以下修正：
\begin{verbatim}
{\escapechar=-1
 \expandafter\gdef\expandafter\getfilenumber
       \string\file#1.{\filenumber=#1 }
}
\end{verbatim}
注意 \verb>\string\file> 得到 \n{f$_{12}$i$_{12}$l$_{12}$e$_{12}$} 这 4 个字符，
而 \cs{expandafter} 命令让 \verb>\string\file> 在宏定义之前先行展开，
并且 \verb>\escapechar=-1> 让 \TeX\ 忽略反斜线。
由于 \cs{escapechar} 设定被限制在编组内部，我们需要使用 \cs{gdef} 进行宏定义。

%\section{The \lowercase{\n{\char92par}} token}
\section{\protect\cs{par} 记号}

%\TeX\ inserts a \csterm par\par\ token into the input after
%an \indextermbus{empty}{line}, that is, when 
%encountering a character with category code~5,
%end of line, in state~{\italic N}.
%It is good to realize when exactly this happens:
%since \TeX\ leaves state~{\italic N}
%when it encounters any token but a space,
%a~line giving a \cs{par} can only contain characters
%of category~10. In particular, it cannot end with a comment
%character. Quite often this fact is used the other way around:
%if an empty line is wanted for the layout of the input
%one can put a comment sign on that line.
\TeX\ 在遇到{\em 空白行}\index{行!空白行}之后，
即在状态 {\italic N} 时遇到类别码为 5 的字符（行结束符）之后，
就会向输入中插入一个 \csterm par\par 记号。最好是明白这是如何发生的：
因为 \TeX\ 在遇到空格符之外的任何字符都会离开状态{\italic N}，
所以能够形成 \cs{par} 的输入行所包含字符的类别码肯定皆为 10；
特别地，该行不能包含注释符。
此事实常常以另一种方式被用到：如果输入格式中需要保留空白行，
我们可以给该行加上一个注释符。

%Two consecutive empty lines generate two \cs{par} tokens.
%For all practical purposes this is equivalent to one \cs{par},
%because after the first one \TeX\ enters vertical mode, and
%in vertical mode a \cs{par} only
%exercises the page builder,
%and clears the paragraph shape parameters.
连续两个空行产生两个 \cs{par} 记号，实际上它们等同于一个 \cs{par} 记号，
这是因为在第一个 \cs{par} 之后，\TeX\ 进入竖直模式，而在竖直模式中的
\cs{par} 只会触发 \TeX\ 的页面构建器以及清除段落形状参数。

%A \cs{par} is also inserted into the input when \TeX\ sees a
%\gram{vertical command} in unrestricted horizontal mode.
%After the \cs{par} has been read and expanded, the
%vertical command is examined anew (see Chapters~\ref{hvmode}
%and~\ref{par:end}).
在非受限水平模式中遇到 \gram{vertical command}（竖直命令）时，
\TeX\ 也会向输入中插入一个 \cs{par} 记号，并对其读取和展开，
然后再重新处理竖直命令（见第~\ref{hvmode}~和~\ref{par:end}~章）。

%The \cs{par} token may also be inserted by the \cs{end}
%command that finishes off the run of \TeX; see Chapter~\ref{output}.
\cs{end} 命令也会插入 \cs{par} 记号，然后结束 \TeX\ 的运行；见第~\ref{output}~章。

%It is important to realize that \TeX\ does what it normally does
%when encountering an empty line
%(which is ending a paragraph)
%only because of the default definition of the \cs{par} token.
%By redefining \cs{par} the behaviour
%caused by empty lines and vertical commands can be changed completely,
%and  interesting special effects can be achieved.
%In order to continue to be able  to cause the actions normally
%associated with \cs{par}, the synonym \cs{endgraf} is
%available in the plain format. See further Chapter~\ref{par:end}.
要知道 \TeX\ 在遇到空白行时通常所作的事情（结束当前段落）取决于 \cs{par} 记号的默认定义。
如果重定义 \cs{par}，那么空白行和竖直命令的行为可能就完全不同了，
甚至可以藉此实现一些不同寻常的效果。为了能够得到与 \cs{par} 相同的行为，
Plain \TeX\ 提供了 \cs{par} 的“同义词” \cs{endgraf}。详见第~\ref{par:end}~章。

%The \cs{par} token is not allowed to be part of a macro
%argument, unless the macro has been declared to be \cs{long}.
%A \cs{par} in the argument of a non-\cs{long} macro
%prompts \TeX\ to give a `runaway argument' message.
%Control sequences that have been \cs{let} to \cs{par}
%(such as \cs{endgraf}) are allowed, however.
\cs{par} 记号不可以出现在宏参量中，除非是使用 \cs{long} 定义的宏。
对于非 \cs{long} 定义的宏，如果 \cs{par} 出现在参量中，
\TeX\ 会给出“runaway argument”的错误信息。不过，使用 \cs{let}
定义的指向 \cs{par} 记号的控制序列（例如 \cs{endgraf}）则可以出现。

%\section{Spaces}
\section{空格}

%This section treats some of the aspects of the
%\indextermbus{space}{character} and \indextermbus{space}{token} in the
%initial processing stages of \TeX. The topic of spacing in text
%typesetting is treated in Chapter~\ref{space}.
这一节讨论空格字符\index{字符!空格符}的一些表现，
以及 \TeX\ 初始化进程中的空格记号\index{记号!空格记号}。
至于文本排版的空格，将在第~\ref{space}~章中讨论。


%\subsection{Skipped spaces}
\subsection{被忽略的空格}

%From the discussion of the internal states of \TeX's 
%input processor
%it is clear that some spaces in the input never reach the
%output; in fact they never get past the input processor.
%These are for instance the spaces at the beginning
%of an input line, and the spaces following the one
%that lets \TeX\ switch to state~{\italic S}.
从对 \TeX\ 输入处理器的内部状态的讨论中，
容易知道有些空格是不可能被输出的；实际上它们甚至都无法通过输入处理器。
例如输入行开头的空格，还有放在让 \TeX\ 进入状态 {\italic S} 的字符之后的空格。

%On the other hand, line ends can generate spaces (which are not
%in the input) that may wind up in the output.
%There is a third kind of space: the spaces that get past the
%input processor,
%or are even generated there, but still do not wind up in the
%output. These are the \gram{optional spaces} that the 
%syntax of \TeX\ allows in various places.
另一方面，行结束符可以产生空格，并且可被输出。还有第三种空格：
它可以通过输入处理器，甚至可在输入处理器中生成，
但是依然没有机会被输出，它们便是 \gram{optional spaces}（可选空格），
\TeX\ 语法的多个地方都允许出现这种空格。

%%\spoint Optional spaces
%\subsection{Optional spaces}
%\spoint Optional spaces
\subsection{可选空格}

%The syntax of \TeX\ has the concepts of \indextermbus{optional}{spaces}
%and `one optional space':
%\begin{disp}\gr{one optional space} $\longrightarrow$
%\gr{space token} $|$ \gr{empty}\nl
%\gr{optional spaces} $\longrightarrow$
%\gr{empty} $|$ \gr{space token}\gr{optional spaces}\end{disp}
%In general, \gr{one optional space} is allowed after
%numbers and glue specifications, while \gr{optional spaces} are
%allowed whenever a space can occur inside a number
%(for example, between a minus sign and the digits of the number)
%or glue specification (for example, between \n{plus} and \n{1fil}).
%Also, the definition of \gr{equals} allows \gr{optional spaces}
%before the \n= sign.
\TeX\ 的语法中有“{\em 可选空格}”\index{空格!可选空格}与“{\em 单个可选空格}”的概念：
\begin{disp}\gr{one optional space} $\longrightarrow$
\gr{space token} $|$ \gr{empty}\nl
\gr{optional spaces} $\longrightarrow$
\gr{empty} $|$ \gr{space token}\gr{optional spaces}\end{disp}
通常，\gr{one optional space} 允许出现在数值以及粘连描述之后，
而 \gr{optional spaces} 允许出现在数值内部（比如在负号和数字之间）%
或者粘连描述内部（比如在 \n{plus} 和 \n{1fil} 之间）可以有空格的地方。
另外，在 \gr{equals} 的定义中也允许 \gr{optional spaces} 出现在 \n= 号前后。

%Here are some examples of optional spaces.
下面是可选空格的一些例子：

%\begin{itemize} 
%\item A number can be delimited by \gr{one optional space}. 
%This prevents accidents (see Chapter~\ref{number}), 
%and it speeds up processing, as \TeX\ can 
%detect more easily where the \gram{number} being read ends.
%Note, however, that not every `number' is a \gram{number}:
%for instance the {\tt 2} in \cs{magstep2} is not a number,
%but the  single token that is the parameter of the
%\cs{magstep} macro. Thus a space or line end after this
%is significant. Another example is a parameter number,
%for example~\n{\#1}: since at most nine parameters are allowed, scanning
%one digit after the parameter character suffices.
\begin{itemize} 
\item 数值可被 \gr{one optional space} 分割。这样防止了偶然的失误（见第~\ref{number}~章）%
并加速了处理过程，因为 \TeX\ 检测 \gram{number} 在何处终止更容易。不过，
要注意并非每个“数值”都是 \gram{number}：例如 \cs{magstep2} 中的 {\tt 2} 并非数字，
而是单个记号并且是 \cs{magstep} 宏的参量，因此在其之后的空格或行结束符是有效的。
另一个例子是宏参数中的数字，例如 \n{\#1}：因为一个宏最多允许有 9 个参数，
只需在参数字符之后扫描一位数字即可。

%\item From the grammar of \TeX\ 
%it follows that the
%keywords \n{fill} and \n{filll}
%consist of \n{fil} and
%separate {\tt l}$\,$s, each of which is a keyword
%(see page~\pageref{keywords} for a more elaborate discussion),
%and hence can be followed by optional spaces. 
%Therefore forms such as \hbox{\n{fil L l}} are also valid.
%This is a potential source of strange accidents.
%In most cases, appending a \cs{relax} token prevents
%such mishaps.
\item 根据 \TeX\ 的语法，关键字 \n{fill} 和 \n{filll}
是由 \n{fil} 关键字以及一两个单独的 {\tt l} 关键字构成的%
（见第~\pageref{keywords}~页的更详细讨论），
因此其中允许可选空格的存在；比如 \n{fil L l} 也是有效的关键字。
不过这也许会导致 \TeX\ 误解你的本意，
对于大多数情形，在这种关键字后添加一个 \cs{relax} 可以防止这种灾难。

%\item The primitive command \csterm ignorespaces\par\ 
%may come in handy as the final command in a macro definition.
%As it gobbles up
%optional spaces, it can be used to prevent spaces following the
%closing brace of an argument from winding up in the output
%inadvertently. For example, in
%\begin{verbatim}
%\def\item#1{\par\leavevmode
%    \llap{#1\enspace}\ignorespaces}
%\item{a/}one line \item{b/} another line \item{c/}
%yet another
%\end{verbatim} 
%the \cs{ignorespaces} prevents spurious
%spaces in the second and third item.
%An empty line
%after \cs{ignorespaces} will still insert a \cs{par}, however.
%\end{itemize}
\item 在宏定义末尾使用原始命令 \csterm ignorespaces\par\ 可能会比较方便。
由于它可以吞噬可选空格，使用它可避免把参量的右花括号后的空格无意中带入输出中。
例如下面这个例子：
\begin{verbatim}
\def\item#1{\par\leavevmode
    \llap{#1\enspace}\ignorespaces}
\item{a/}one line \item{b/} another line \item{c/}
yet another
\end{verbatim} 
其中 \cs{ignorespaces} 吞掉了第二、三项的那些不希望被输出的空格。
不过 \cs{ignorespaces} 之后的空行仍然会插入 \cs{par} 记号。
\end{itemize}

%%\spoint Ignored and obeyed spaces
%\subsection{Ignored and obeyed spaces}
%\spoint Ignored and obeyed spaces
\subsection{被忽略的和被保留的空格}

%After control words spaces are ignored. This is not an
%instance of optional spaces, but it is due to the fact that
%\TeX\ goes into state~{\italic S}, skipping spaces, after control
%words. Similarly an end-of-line character is skipped
%after a control word.
控制词之后的空格会被忽略。不过这个不是可选空格的例子，
只是因为 \TeX\ 在控制词之后会进入状态 {\italic S} 而已。
同样，控制词之后的行结束符也会被忽略。

%Numbers are delimited by only \gr{one optional space},
%but still
%\begin{disp}\n{a\char92 count0=3\char32\char32b}\quad gives\quad `ab',\end{disp}
%because \TeX\ goes into state~{\italic S} after the first
%space token. The second space is therefore skipped 
%in the input processor of \TeX; it never becomes a space token.
数值只能被 \gr{one optional space} 定界，但是
\begin{disp}\n{a\char92 count0=3\char32\char32b}\quad 仍然给出 \quad `ab',\end{disp}
这是因为 \TeX\ 在第一个空格记号之后会进入状态 {\italic S}，
因此第二个空格永远也不可能变成空格记号。

%Spaces are skipped furthermore when \TeX\ is in state~{\italic N},
%newline. When \TeX\ is processing in vertical mode
%space tokens (that is, spaces that were not skipped)
%are ignored. For example, the space inserted (because of the line end)
%after the first box in
%\begin{verbatim}
%\par
%\hbox{a}
%\hbox{b}
%\end{verbatim}
%has no effect.
当 \TeX\ 在状态 {\italic N} 中时，空格会被忽略。
当 \TeX\ 在竖直模式中时，空格记号（就是那些起初未被忽略的空格）会被忽略。
例如下面第一个盒子之后（由行结束符生成的）的空格会被忽略：
\begin{verbatim}
\par
\hbox{a}
\hbox{b}
\end{verbatim}

%Both plain \TeX\ and \LaTeX\ define a command \cs{obeyspaces}
%\altt
%that makes spaces significant: after one space other spaces are no
%longer ignored. In both cases the basis is
%\altt
%\begin{verbatim}
%\catcode`\ =13 \def {\space}
%\end{verbatim}
%However, there is a difference between the two cases:
%in plain \TeX\ \begin{verbatim}
%\def\space{ }
%\end{verbatim}
%while in \LaTeX\ \begin{verbatim}
%\def\space{\leavevmode{} }
%\end{verbatim}
%although the macros bear other names there.
Plain \TeX\ 和 \LaTeX\ 都定义了一个 \cs{obeyspaces} 宏，
这使得空格都是有效的，比如控制词后的空格以及空格后的空格都不会被忽略。
这个宏的基本实现方式为
\footnote{译注：原文最后两个段落的描述有些错乱，已经稍作修订。}
\begin{verbatim}
\def\space{ }
\catcode`\ =13 \def {\space}
\end{verbatim}

%The difference between the two macros becomes
%apparent in the context of \cs{obeylines}:
%each line end is then a \cs{par} command, implying that
%each next line is started in vertical mode.
%An active space is expanded by the plain macro to a space token, 
%which is ignored in vertical mode.
%The active spaces in \LaTeX\ will immediately switch to horizontal
%mode, so that each space is significant.
在实现多行抄录环境时，还需要另一个 \cs{obeylines} 宏：
它将每个行结束符定义为 \cs{par} 命令，使得下面各行都在竖直模式中开始。
此时活动空格展开的空格记号在竖直模式中将会被忽略，即空白行将会被删除。
为此我们可以修改上述 \verb>\space> 宏的定义如下：
\begin{verbatim}
\def\space{\leavevmode{} }
\end{verbatim}
这样，活动空格将会让 \TeX\ 立即切换到水平模式，从而保留了每个空格。

%\subsection{More ignored spaces}
\subsection{空格被忽略的其他情形}

%There are three further places where \TeX\ will ignore space tokens.
%\alt
%\begin{enumerate}
%\item When \TeX\ is looking for
%an undelimited macro argument it will accept the
%first token (or group) that is not a space. This is treated
%in Chapter~\ref{macro}.
还有三种情况会导致 \TeX\ 忽略空格记号：
\alt
\begin{enumerate}
\item 在寻找非定界的宏参量时，\TeX\ 会接受第一个非空格的记号（或编组）作为参量。
这将在第~\ref{macro}~章中介绍。

%\item In math mode space tokens are ignored (see Chapter~\ref{math}).
\item 在数学模式中，空格记号会被忽略（见第~\ref{math}~章）。

%\item After an alignment tab character spaces are ignored
%(see Chapter~\ref{align}).
%\end{enumerate}
\item 在阵列制表符之后，空格记号会被忽略（见第~\ref{align}~章）。
\end{enumerate}

%\subsection{\gr{space token}}
\subsection{\gr{space token}}

%Spaces are anomalous in \TeX.
%For instance, the \cs{string} operation 
%assigns category code~12\index{category!12} to all
%characters except spaces; they receive category~10\index{category!10}.
%Also, as was said above, \TeX's input processor converts (when in
%state~{\italic M}) all tokens with category code~10 into real spaces:
%they get character code~32.
%Any character token with category~10 is called
%\gram{space token}\indexterm{space! token}.
%Space tokens with character
%code not equal to 32 are called \indextermbus{funny}{spaces}.
空格在 \TeX\ 中有些反常。
例如，\cs{string} 操作会对所有的字符赋以类别码12 \index{category!12}，
唯独对空格例外，它还是坚持自己的类别码为 10 \index{category!10}。
还有在前文中提到过的，\TeX\ 的输入处理器（在状态 {\italic M} 中）%
会将所有类别码为 10 的记号转化为真正的空格：它们的字符编码为 32。
任何类别码为~10~的空格称为 \gram{space token}\index{空格!空格记号}。
那些字符编码不是 32 的空格记号被称为{\em 滑稽空格}\index{空格!滑稽空格}。

%\begin{example} After giving the character \n Q 
%the category code of a space character, 
%and using it in a definition
%\begin{verbatim}
%\catcode`Q=10 \def\q{aQb}
%\end{verbatim}
%we get
%\begin{verbatim}
%\show\q
%macro:-> a b
%\end{verbatim}
%because the input processor
%changes the character code of the funny space
%in the definition.
%\end{example}
\begin{example}
将空格字符的类别码赋予字符 \n{Q}，并在宏定义中使用它：
\begin{verbatim}
\catcode`Q=10 \def\q{aQb}
\end{verbatim}
那么，我们可得到
\begin{verbatim}
\show\q
macro:-> a b
\end{verbatim}
这是因为输入处理器改变了宏定义中滑稽空格的字符编码。
\end{example}

%Space tokens with character codes other than 32 can be
%created using, for instance, \cs{uppercase}.
%However, `since the various forms of
%space tokens are almost identical in behaviour, there's no
%point dwelling on the details'; see~\cite{Knuth:TeXbook}~p.~377.
字符码不为 32 的空格记号可以用 \cs{uppercase} 等命令生成。
然而，`由于各种不同的空格记号的表现几乎是一致的，
纠缠于细节毫无意义'；见~\cite{Knuth:TeXbook} 第~377~页。

%%\spoint Control space
%\subsection{Control space}
%\spoint Control space
\subsection{控制空格}

%The `control space' command \verb-\-\n{\char32}
%\cstoidx\char32\par\
%contributes the amount of space that a \gr{space token} would
%when the \verb=\spacefactor= is~1000.
%A~control space
%is not treated like a space token, or like a macro
%expanding to one (which is how \cs{space} is defined in plain \TeX).
%For instance, \TeX\ ignores spaces
%at the beginning of an input line, but
%control space is a \gr{horizontal command}, so it 
%makes \TeX\ switch from vertical to horizontal mode
%(and insert an indentation box).
%See  Chapter~\ref{space} for the space factor, and
%chapter~\ref{hvmode} for horizontal and vertical modes.
`控制空格'命令 \verb*-\ - 给出的空白的大小与
\verb=\spacefactor= 等于 1000 时 \gr{space token} 给出的一样。
控制空格不能当成空格记号来用，也不能像宏一样展开成为空格记号%
（像 Plain \TeX\ 定义的 \cs{space} 那样）。
例如，\TeX\ 会忽略输入行开头的空格，但是控制空格是一个 \gr{horizontal command}，
因此它使得 \TeX\ 从竖直模式切换到水平模式（并插入缩进盒子）。
见第~\ref{space}~章介绍的空白因子，
以及第~\ref{hvmode}~章介绍的水平和竖直模式。

%%\spoint `\n{\char32}'
%\subsection{`\n{\char32}'}
%\spoint `\n{\char32}'
\subsection{可见空格}

%The explicit symbol `\n{\char32}' for a space
%is character~32 in the Computer Modern typewriter typeface.
%However, switching to \cs{tt} is not sufficient to get
%spaces denoted this way, because spaces will still
%receive special treatment in the input processor.
显式的空格符号`\verb*- -'是计算机现代打字机字体中字符编码为 32 的字符，
但仅使用 \cs{tt} 是无法将其显现出来的，因为空格在输入处理器中受到了特别处理。

%One way to
%let spaces be typeset by \n{\char32}
%is to set
%\begin{verbatim}
%\catcode`\ =12
%\end{verbatim}
%\TeX\ will then take a space as the instruction to
%typeset character number~32. Moreover, subsequent spaces
%are not skipped, but also typeset this way: state~{\italic S}
%is only entered after a character with category code~10.
%Similarly, spaces after a control sequence are made
%visible by changing the category code of the space character.
使空格字符 \verb*- - 现形的一种方法是设置
\begin{verbatim}
\catcode`\ =12
\end{verbatim}
这样 \TeX\  便会将空格字符作为编码为 32 的字符排印出来，
而且后续的空格也不再被忽略，同样会被排印出来：
状态 {\italic S} 只是在类别码为 10 的字符之后才会出现。
类似地，控制序列之后的空格也因为类别码改变而被显现出来。

%\section{More about line ends}
\section{行结束符的更多知识}

%\TeX\ accepts lines from an input file, excluding any line
%terminator that may be used.
%Because of this, \TeX's behaviour here is not dependent
%on the operating system and the \indextermsub{line}{terminator}
%it uses (\key{CR}-\key{LF},
%\key{LF}, or none at all for block storage).
%From the input line any trailing spaces are removed.
%The reason for this is historic; it has to do with 
%the block storage mode on \key{IBM} mainframe computers.
%For some computer-specific problems with end-of-line
%characters, see~\cite{B:ctrl-M}.
\TeX\ 从输入文件中获得文本行，并从中消除行终止符。正是这一行为，
使得 \TeX\ 不依赖于各个操作系统特定的\indextermsub{行}{终止符}%
（\key{CR}-\key{LF}，\key{LF}，或者在块存储系统中根本不存在）。
文本行末尾的空白字符也会被移除。这样处理是由于历史原因：
它与 \key{IBM} 大型计算机的块存储模式有关。
在 \cite{B:ctrl-M} 中介绍了行尾符造成的一些计算机问题。

%A~terminator character is then appended
%with a character code of \cs{endlinechar}, 
%unless this parameter has a value that
%is negative or more than~255. 
%Note that this terminator character
%need not have category code~5\index{category!5}, end of line.
完成上述处理后，\TeX\ 会将 \cs{endlinechar} 所表示的字符置于行尾，
除非 \cs{endlinechar} 的字符码为负数或者大于 255。
注意这个行结束符也可以不是类别码为 5 的字符\index{category!5}。

%\subsection{Obeylines}
\subsection{保持各行}

%Every once in a while it is desirable that the line ends in
%\message{Check spurious space obeylines+1}%
%\cstoidx obeylines\par\howto Change the meaning of the line end\par
%the input correspond to those in the output.
%The following piece of code does the trick:
%\begin{verbatim}
%\catcode`\^^M=13 %
%\def^^M{\par}% 
%\end{verbatim}
%The \cs{endlinechar} character is here made active,
%and its meaning becomes \cs{par}.
%The comment signs prevent \TeX\ from seeing the terminator of the
%\alt
%lines of this definition, and expanding it since it is active.
有时候会期望会希望输入文本中的行结束符能够在排版输出后保持。
\message{Check spurious space obeylines+1}%
\cstoidx obeylines\par\howto Change the meaning of the line end\par
下面的代码可以可以解决这一问题：
\begin{verbatim}
\catcode`\^^M=13 %
\def^^M{\par}% 
\end{verbatim}
这里，\cs{endlinechar} 成为活动符，其含义变为 \cs{par}。
上述代码中的注释符用于阻止 \TeX\ 看到代码末尾的行终止符，
以防它将其作为活动字符而展开。

%However, it takes some care to embed this code in a macro.
%The definition
%\begin{verbatim}
%\def\obeylines{\catcode`\^^M=13 \def^^M{\par}}
%\end{verbatim}
%will be misunderstood:
%\TeX\ will discard everything
%after the second \verb>^^M>, because this has category code~5.
%Effectively, this line is then
%\begin{verbatim}
%\def\obeylines{\catcode`\^^M=13 \def
%\end{verbatim}
%To remedy this,
%the definition itself has to be
%performed in a context where \verb>^^M> is an active
%character:
%\begin{verbatim}
%{\catcode`\^^M=13 %
% \gdef\obeylines{\catcode`\^^M=13 \def^^M{\par}}%
%}
%\end{verbatim}
%Empty lines in the  input are not taken into account
%in this definition: these disappear, because two consecutive \cs{par}
%tokens are (in this case) equivalent to one. 
%A slightly modified definition for the line end as
%\begin{verbatim}
%\def^^M{\par\leavevmode}
%\end{verbatim}
%remedies this:
%now every line end forces \TeX\ to start a paragraph. For empty
%lines this will then be an empty paragraph.
然而，将上述代码嵌入宏定义时要小心，比如
\begin{verbatim}
\def\obeylines{\catcode`\^^M=13 \def^^M{\par}}
\end{verbatim}
是会被 \TeX\ 误解的：\TeX\ 将丢弃第二个 \verb>^^M> 之后的所有字符，
因为此时 \verb>^^M> 类别码为 5，而非 13。
也就是说，这一行实际上变成
\begin{verbatim}
\def\obeylines{\catcode`\^^M=13 \def
\end{verbatim}
要修正上述问题，需要为 \verb>^^M> 营造一个可作为活动字符使用的环境：
\begin{verbatim}
{\catcode`\^^M=13 %
 \gdef\obeylines{\catcode`\^^M=13 \def^^M{\par}}%
}
\end{verbatim}
不过这个定义还是有缺陷，因为输入文本中的空行会被忽略。
这是因为连续两个 \cs{par} 记号会被当成一个。
对上述定义稍作改进即可解决这个问题，如下：
\begin{verbatim}
\def^^M{\par\leavevmode}
\end{verbatim}
这样，输入文本中的每一行都会开启一个新段落，空行则开启一个空段落。

%%\spoint Changing the \cs{\endlinechar}
%\subsection{Changing the \cs{endlinechar}}
%\spoint Changing the \cs{\endlinechar}
\subsection{改变 \cs{endlinechar}}

%Occasionally you may want to change the \cs{endlinechar}, or
%the \cs{catcode} of the ordinary line terminator \verb.^^M.,
%for instance to obtain special effects such as macros where 
%the argument is terminated by the line end.
%See page~\pageref{pick:eol} for a worked-out example.
有时，你可能想改变 \cs{endlinechar} 或者 \verb.^^M. 的类别码%
以获得一些特殊效果，例如让宏的参量用行结束符定界。
参考第~\pageref{pick:eol}~页给出的例子。

%There are  a couple of traps. Consider the following:
%\begin{verbatim}
%{\catcode`\^^M=12 \endlinechar=`\^^J \catcode`\^^J=5
%...
%... }
%\end{verbatim}
%This causes unintended output of both character~13 (\verb-^^M-)
%and~10 (\verb-^^J-), caused by the line terminators of the
%first and last line.
这里有几个陷阱。首先考虑下面的写法：
\begin{verbatim}
{\catcode`\^^M=12 \endlinechar=`\^^J \catcode`\^^J=5
...
... }
\end{verbatim}
这将导致无意中输出了第~13~号（\verb-^^M-）与第~10~号（\verb-^^J-）字符，
由于第一行和最后一行的行终止符。

%Terminating the first and  last line with a comment works,
%but replacing the first line by the two lines
%\begin{verbatim}
%{\endlinechar=`\^^J \catcode`\^^J=5
%\catcode`\^^M=12
%\end{verbatim}
%is also a solution.
在第一行和最后一行末尾加上注释符可以解决此问题，
但还有另一种方法是将第一行拆成下面两行
\begin{verbatim}
{\endlinechar=`\^^J \catcode`\^^J=5
\catcode`\^^M=12
\end{verbatim}

%Of course, in many cases it is not necessary to substitute
%another end-of-line character; a~much simpler solution 
%is then to put
%\begin{verbatim}
%\endlinechar=-1 
%\end{verbatim}
%which treats all lines as if they end with a comment.
当然，在多数情况下没必要将行结束符替换为另一个字符；设置
\begin{verbatim}
\endlinechar=-1 
\end{verbatim}
就等同于各行都以注释符结尾。

%%\spoint More remarks about the end-of-line character
%\subsection{More remarks about the end-of-line character}
%\spoint More remarks about the end-of-line character
\subsection{行结束符的更多注记}

%The character that \TeX\ appends at the end of an input line
%is treated like any other character. Usually one is not aware
%of this, as its category code is special, but there are a few
%ways to let it be processed in an unusual way.
\TeX\ 和其他字符一样对待添加到行尾的字符。通常我们不会注意到它，
因为它的类别码比较特殊，但是有一些方法可以特殊地处理它。

%\begin{example} Terminating an input line with \verb>^^> will
%(ordinarily, when \cs{endlinechar} is~13) give `M' in the output, 
%which is the 
%\ascii{} character with code~13+64.
%\end{example}
\begin{example}
把 \verb>^^> 置于文本行的末尾（假定 \cs{endlinechar} 保持默认值为 13），
将输出字符 `M'，它是编码为~13+64~的 \ascii\ 字符。
\end{example}

%\begin{example} If \verb>\^^M> has been defined,
%terminating an input line with a backslash will execute this command.
%The plain format defines
%\begin{verbatim}
%\def\^^M{\ }
%\end{verbatim}
%which makes a `control return' equivalent to a control space.
%\end{example}
\begin{example} 如果已经定义了 \verb>\^^M>，在输入行中用反斜线结尾将执行此命令。
在 Plain 格式中定义
\begin{verbatim}
\def\^^M{\ }
\end{verbatim}
这使得`控制换行'与控制空格等价。
\end{example}

%%\point More about the input processor
%\section{More about the input processor}
%\point More about the input processor
\section{输入处理器的更多知识}

%%\spoint The input processor as a separate process
%\subsection{The input processor as a separate process}
%\spoint The input processor as a separate process
\subsection{输入处理器作为独立进程}

%\TeX's levels of processing are all working at the
%same time and incrementally, but conceptually they can often be
%considered to be separate processes that each accept the
%completed output of the previous stage. The juggling with
%spaces provides a nice illustration for this.
\TeX\ 处理器的各个层面都是同时运行的，但是在概念上它们常被视为依次独立运行，
前者的输出是后者的输入。空格的花招可以展示出这一规律。

%Consider the definition
%\begin{verbatim}
%\def\DoAssign{\count42=800}
%\end{verbatim}
%and the call
%\begin{verbatim}
%\DoAssign 0
%\end{verbatim}
%The input processor, the part
%of \TeX\ that builds tokens, in scanning this call
%skips the space before the zero, so the expansion of this
%call is
%\begin{verbatim}
%\count42=8000
%\end{verbatim}
%It would be incorrect to reason
%`\cs{DoAssign} is read, then expanded, the space delimits the
%number 800, so 800 is assigned and the zero is printed'.
%Note that the same would happen if the zero appeared on the next line.
例如定义一个宏：
\begin{verbatim}
\def\DoAssign{\count42=800}
\end{verbatim}
然后调用它：
\begin{verbatim}
\DoAssign 0
\end{verbatim}
输入处理器作为 \TeX\ 构建记号列表的层面会忽略 0 之前的所有空格，
因此上述宏的展开的结果是：
\begin{verbatim}
\count42=8000
\end{verbatim}
不要认为 \cs{DoAssign} 被读取然后展开，接着寄存器被赋值为 800，
因此 \cs{DoAssign} 之后的那个 0 会被排印出来。
注意即使最后的 0 出现在下一行结果也一样。

%Another illustration shows that optional spaces appear in a different
%stage of processing from that for skipped spaces:
%\begin{disp}\verb>\def\c.{\relax}>\nl
%     \verb>a\c.>{\tt\char32 b}\end{disp}
%expands to
%\begin{disp}\n{a\cs{relax}\char32 b}\end{disp}
%which gives as output\begin{disp} `a b'\end{disp}
%because spaces after the \cs{relax} control sequence are only
%skipped when the line is first read, not when it is expanded.
%The fragment
%\begin{disp} \verb-\def\c.{\ignorespaces}-\nl \verb-a\c. b-\end{disp}
%on the other hand, expands to
%\begin{disp}\n{a\cs{ignorespaces}\char32 b}\end{disp}
%Executing the \cs{ignorespaces} command removes the subsequent
%space token, so the output is \begin{disp} `ab'.\end{disp}
%In both definitions
%the period after \cs{c} is a delimiting token; it is used here
%to prevent spaces from being skipped.
再来看下面这个让可选空格字符在多个处理层面中出现的例子：
\begin{disp}\verb>\def\c.{\relax}>\nl
     \verb>a\c.>{\tt\char32 b}\end{disp}
它的展开结果为
\begin{disp}\n{a\cs{relax}\char32 b}\end{disp}
输出结果为
\begin{disp} `a b'\end{disp}
这是因为 \cs{relax} 之后的空格仅仅在文本行被读取时可能会被忽略，
在 \c. 展开为 \cs{relax} 之后不会被忽略。另一方面，下面例子：
\begin{disp} \verb-\def\c.{\ignorespaces}-\nl \verb-a\c. b-\end{disp}
会被展开为
\begin{disp}\n{a\cs{ignorespaces}\char32 b}\end{disp}
在执行处理器中 \cs{ignorespaces} 会移除它后面的空格，所以输出结果会是
\begin{disp} `ab'.\end{disp}
在上述两个例子中，\cs{c} 之后的西文句号是一个定界记号，
用于保护控制序列之后的空格不被输入处理器吃掉。

%%\spoint The input processor not as a separate process
%\subsection{The input processor not as a separate process}
%\spoint The input processor not as a separate process
\subsection{输入处理器不作为单独进程}

%Considering the tokenizing of \TeX\ to be a separate process
%is a convenient view, but sometimes it leads to confusion.
%The line
%\begin{verbatim}
%\catcode`\^^M=13{}
%\end{verbatim}
%makes the line end active,
%and subsequently gives an `undefined control sequence' error
%for the line end of this line itself. Execution of the commands
%on the line thus influences the scanning process of that
%same line.
将 \TeX\ 对输入文本的记号化过程视为一个独立进程是比较普遍的看法，
但是有时会出现反常的现象。例如
\begin{verbatim}
\catcode`\^^M=13{}
\end{verbatim}
使得行结束符变成活动符，随后 \TeX\ 便会报错“未定义的控制序列”，
即对文本行中的命令的执行影响到 \TeX\ 输入处理器对该行文本的扫描过程。

%By contrast,
%\begin{verbatim}
%\catcode`\^^M=13
%\end{verbatim}
%does not give an error.
%The reason for this is that \TeX\ reads the line end while it is still
%scanning the number~13; that is, at a time when the assignment
%has not been performed yet.
%The line end is then converted to the optional space character
%delimiting the number to be assigned.
与此相反，
\begin{verbatim}
\catcode`\^^M=13
\end{verbatim}
却不会出错。这是因为 \TeX\ 输入处理器是在扫描数值 13 时读到行结束符，
也就是说在那时赋值还未完成，因此行结束符会被视为数值的定界符，即可选空格。

%%\spoint Recursive invocation of the input processor
%\subsection{Recursive invocation of the input processor}
%\spoint Recursive invocation of the input processor
\subsection{输入处理器的递归调用}

%Above, the activity of replacing a parameter
%character plus a digit by a parameter token was described
%as something similar to the lumping together of letters
%into  a control sequence token. Reality is somewhat more
%complicated than this. \TeX's token scanning mechanism
%is invoked both for input from file and for input from
%lists of tokens such as the macro definition. Only in the
%first case is the terminology of internal states applicable.
前文中谈到，参数符加数字会被替换为一个参数记号，
这种替换行为类似于将一些字符捆绑为控制序列记号的行为。
实际上情况比这复杂得多。从文件输入和从记号列（比如宏定义）输入都会调用
\TeX\ 的记号扫描机制，但内部状态的变化只适用于前者。

%Macro parameter characters are treated the same in both
%cases, however. If this were not the case it would
%not be possible to write things such as
%\begin{verbatim}
%\def\a{\def\b{\def\c####1{####1}}}
%\end{verbatim}
%See page \pageref{nest:def} for an explanation of such
%nested definitions.
但是，宏参数符在两种情况下会被以相同方式处理，
否则 \TeX\ 便无法处理下面这样的宏定义
\begin{verbatim}
\def\a{\def\b{\def\c####1{####1}}}
\end{verbatim}
见第~\pageref{nest:def}~页对这种嵌套定义的解释。

%%\point The \verb@- convention
%\section{The \n{@} convention}
%\point The \verb@- convention
\section{\n{@} 约定}

%Anyone who has ever browsed through either the plain format or
%the \LaTeX\ format will have noticed that a lot of control sequences
%contain an `at' sign:~\verb-@-. These are control sequences that
%are meant to be inaccessible to the ordinary user.
如果读过 Plain 或 \LaTeX\ 格式的源代码，
就会注意到许多控制序列都包含`at'符号 \verb-@-，
这意味着这些控制序列不可被普通用户直接使用。

%Near the beginning of the format files the instruction
%\begin{verbatim}
%\catcode`@=11
%\end{verbatim}
%occurs, making the at sign into a letter,
%meaning that it can be used in control sequences. Somewhere near the
%end of the format definition the at sign is made `other' again:
%\begin{verbatim}
%\catcode`@=12
%\end{verbatim}
在靠近格式文件的起始处有
\begin{verbatim}
\catcode`@=11
\end{verbatim}
它将 \verb-@- 变为字母字符，从而可以用于组成控制序列。
而在靠近格式文件的结尾处有
\begin{verbatim}
\catcode`@=12
\end{verbatim}
它将 \verb-@- 恢复为其他字符。

%Now why is it that users cannot
%call a control sequence with an at sign
%directly, although they can call macros that contain lots of those
%`at-definitions'? The reason is that the control sequences
%containing an \n@ are internalized by \TeX\ at definition time,
%after which they are a token, not a string of characters. 
%Macro expansion then
%just inserts such tokens, and at that time the category codes
%of the constituent characters do not matter any more.
为何我们可以调用那些由带有 \verb-@- 字符的控制序列所构成的宏，
而不能直接调用带有 \verb-@- 字符的控制序列呢？
原因是带有 \verb-@- 字符的控制序列在定义时已经被转换为记号，
不再是字符串，而宏展开时直接将这些控制序列替换为那些记号即可，
这个过程与控制序列字符的类别码无关。

%\endofchapter
%%%%% end of input file [mouth]

\end{document}