GENERAL PROBLEM OF DESCRIBING SYNTAX AND SEMANTICS

  • Language, whether natural or artificial, is a set of strings of characters from some alphabet. The strings of a language are called sentences or statements.
  • The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language.
  • For example, a large and complex collection of rules for specifying the syntax of its sentences.
  • The syntax of programming languages, for simplicity’s sake, often do not include descriptions of the lowest-level syntactic units. These small units are called lexemes.
  • lexemes can be given by a lexical specification, which is usually separate from the syntactic description of the language.
  • The lexemes of a programming language include its numeric literals, operators, and special words, among others.
  • One can think of programs as strings of lexemes rather than of characters.
  • Lexemes are partitioned into groups: for example, the names of variables, methods, classes, and so forth in a programming language form a group called identifiers.
  • Each lexeme group is represented by a name, or token. A token of a language is a category of its lexemes.
  • For example, an identifier is a token that can have lexemes, or instances, such as sum and total.
  • A token has only a single possible lexeme. For example, the token for the arithmetic operator symbol + has just one possible lexeme. Consider the following statement:
  • index = 2 * count + 17;

  • The lexemes and tokens of this statement are
  • Lexemes Tokens
    index identifier
    = equal_sign
    2 int_literal
    * Mult_op
    count identifier
    + plus_op
    17 int_literal
    ; semicolon
  • Languages can be formally defined in two distinct ways: by recognition and by generation.

Language Recognizers:

  • We have a language L that uses an alphabet ∑ of characters. To define L formally using the recognition method, we would need to construct a mechanism R, called a recognition device, capable of reading strings of characters from the alphabet ∑.
  • R would indicate whether a given input string was or was not in L. R would either accept or reject the given string.
  • Such devices are like filters, separating legal sentences from those that are incorrectly formed. If R, when fed any string of characters over ∑, accepts it only if it is in L, then R is a description of L.
  • Recognition devices, however, are not used to enumerate all of the sentences of a language—they have a different purpose.
  • The syntax analysis part of a compiler is a recognizer for the language the compiler translates.
  • The recognizer need not test all possible strings of characters from some set to determine whether each is in the language. it need only determine whether given programs are in the language.
  • The syntax analyzer determines whether the given programs are syntactically correct.

Language Generators:

  • A language generator is a device that can be used to generate the sentences of a language. The generator as having a button that produces a sentence of the language every time it is pushed.
  • Because the particular sentence that is produced by a generator when its button is pushed is unpredictable, a generator seems to be a device of limited usefulness as a language descriptor.
  • The syntax-checking portion of a compiler (a language recognizer) is not as useful a language description for a programmer because it can be used only in trial-and-error mode.
  • For example, to determine the correct syntax of a particular statement using a compiler, the programmer can only submit a speculated version and note whether the compiler accepts it. On the other hand, it is often possible to determine whether the syntax of a particular statement is correct by comparing it with the structure of the generator.The syntax-checking portion of a compiler (a language recognizer) is not as useful a language description for a programmer because it can be used only in trial-and-error mode.
  • There is a close connection between formal generation and recognition devices for the same language.

Comments