FORMAL METHODS OF DESCRIBING SYNTAX

The formal language-generation mechanisms, usually called grammars, that are commonly used to describe the syntax of programming languages.

Backus-Naur Form and Context-Free Grammars

In the middle to late 1950s, two men, Noam Chomsky and John Backus, in unrelated research efforts, developed the same syntax description formalism, which subsequently became the most widely used method for programming language syntax.

Context-Free Grammars

  • Two of these grammar classes, named context-free and regular, turned out to be useful for describing the syntax of programming languages.
  • The forms of the tokens of programming languages can be described by regular grammars.
  • The syntax of whole programming languages, with minor exceptions, can be described by context-free grammars.

Origins of Backus-Naur Form

  • ALGOL 58 was presented by John Backus, a prominent member of the ACM-GAMM group, at an international conference in 1959 (Backus, 1959).
  • This revised method of syntax description became known as Backus-Naur Form, or simply BNF.
  • BNF is a natural notation for describing syntax.
  • In fact, something similar to BNF was used by Panini to describe the syntax of Sanskrit several hundred years before Christ.
  • The use of BNF in the ALGOL 60 report was not immediately accepted by computer users, it soon became and is still the most popular method of concisely describing programming language syntax.
  • BNF is nearly identical to Chomsky’s generative devices for context-free languages, called context-free grammars.

BNF Fundamentals

  • A metalanguage is a language that is used to describe another language.
  • BNF is a metalanguage for programming languages.
  • BNF uses abstractions for syntactic structures. A simple assignment statement, for example, might be represented by the abstraction. The actual definition can be given by
  • The text on the left side of the arrow, which is aptly called the left-hand side (LHS), is the abstraction being defined.
       <assign> → <var> = <expression>
  • The text to the right of the arrow is the definition of the LHS. It is called the right-hand side (RHS) and consists of some mixture of tokens, lexemes, and references to other abstractions.
  • This particular rule specifies that the abstraction is defined as an instance of the abstraction, followed by the lexeme =, followed by an instance of the abstraction.
  • One example sentence whose syntactic structure is described by the rule is
       total = subtotal1 + subtotal2
  • The abstractions in a BNF description, or grammar, are often called nonterminal symbols, or simply non-terminals, and the lexemes and tokens of the rules are called terminal symbols, or simply terminals.
  • Nonterminal symbols can have two or more distinct definitions, representing two or more possible syntactic forms in the language.
  • Multiple definitions can be written as a single rule, with the different definitions separated by the symbol |, meaning logical OR.
  • For example, an if statement can be described with the rules
    <if_stmt> → if (<logic_expr>) <stmt>
    <if_stmt> → if (<logic_expr>) <stmt> else <stmt>
    or with the rule
    <if_stmt> → if (<logic_expr>) <stmt>
          | if (<logic_expr>) <stmt> else <stmt>
  • These rules, represents either a single statement or a compound statement.

Describing Lists

  • Variable-length lists in mathematics are often written using an ellipsis (. . .); 1, 2, . . . is an example.
  • BNF does not include the ellipsis, so an alternative method is required for describing lists of syntactic elements in programming languages (for example, a list of identifiers appearing on a data declaration statement).
  • For BNF, the alternative is recursion. A rule is recursive if its LHS appears in its RHS. The following rules illustrate how recursion is used to describe lists:
    <ident_list> → identifier
           | identifier, <ident_list>
  • This defines as either a single token (identifier) or an identifier followed by a comma and another instance of.

Comments