- Noam Chomsky proposed syntactic models of natural languages.
- The Chomsky hierarchy is a containment hierarchy of classes of formal grammars.
- Four families of grammars make up the chomsky hierarchy
A formal grammar G = (N; Σ ; P; S) consists of:
A finite set N of nonterminal symbols;
A finite set Σ of terminal symbols not in N;
A finite set P of production rules of the form u→v, where u and v are strings in (Σ U N) and u contains at least one non-terminal symbol;
a start symbol S in N.
- A formal grammar defines (or generates) a formal language, which is a set of sequences of symbols that may be constructed by applying production rules to a sequence of symbols which initially contains just the start symbol.
- A rule may be applied to a sequence of symbols by replacing an occurrence of the symbols on the left-hand side of the rule with those that appear on the right-hand side
For example, the grammar with terminals {a,b}, nonterminals {S,A,B}, production rules
S → ABS
S → ε (where ε is the empty string)
BA → AB
BS → b
Bb → bb
Ab → ab
Aa → aa
and start symbol S, defines the language of all words of the form a^nb^n.
The Chomsky hierarchy consists of the following levels:
Type – 0 grammars (unrestricted grammars): They include all formal grammars.They generate exactly all languages that can be recognized by a turing machine.
An unrestricted grammar is a quadruple (V; Σ ; P; S) where V is finite set of variables, Σ (the alphabet) is a finite set of terminal symbols, P is a set of productions, and S is a distinguished element of V.
Type – 1 grammars (Context-Sensitive Grammars): The Generate the context sensitive languages.
No restrictions are placed on the left-hand side of the production, but the length of the right-hand is required to be at least that of the left.
Type-2 grammars (context-free grammars): They generate the context-free languages.
Type-3 grammars (regular grammars): They generate the regular languages. Every regular grammar is a context-free grammar.
Such a grammar restricts its rules to a single nonterminal on the left-hand side and a right-hand side consisting of a single terminal, possibly followed by a single nonterminal.