Friday, September 25, 2009

Grammars

The purpose of this entry is to describe the 4 type of grammars that can be used to classify a language, and the means used to classify a language as one of the four types of grammrs.

From a linguistics and NLP standpoint, languages can be classified by 4 possible grammar types. From the most expressive description to the least expressive description, a language can be described by a grammar known as type 0, type 1, type 2, or type 3. Each of the different grammars has a common name. A given grammar can sometimes describe other grammars. A type 0 grammar can describe type 1, type2, and type 3 grammars. A type 1 grammar can describe type 2 and type 3. A type 2 grammar can describe a type 3 grammar. A type 3 grammar cannot describe any other type of grammar.

For the four types of grammars, each are composed of rules known as productions, which have the general form of w1 -> w2 . Each production rule generates a sequence of terminal tokens and non-terminals. Non-terminals are production rules that go by the lhs symbol, w1 .

A recursively enumerable grammar is also known as a type 0 grammar. A type 0 grammar has no restrictions on its production rules. Context-sensitive grammar is a type 1 grammar. A type 1 grammar is restricted to productions where the number of symbols on the rhs is equal to or greater than the number of symbols on the lhs. Context-free grammar is a type 2 grammar. A non-terminal in a type 2 grammar can be replaced by its rhs. In comparison, a non-terminal in a type 1 grammar can only be replaced if there is a production that matches the symbols on the rhs with an equivalent lhs. A regular grammar is a type 3 grammar. A regular grammar is also known as a regular expression, which is used by Perl, Python, and grep when searching on strings. A production of a regular grammar has a restricted expression. The lhs is a non-terminal. The rhs is a terminal, which is optionally followed by a non-terminal.

Grammars are also more formally known as phrase structure grammars.

G = phrase structure grammar as a set

G = (V,T,S,P)

V is a vocabulary, a set of tokens and non-tokens(non-terminals)
T is a subset of V. T is the set of terminal tokens
S is a start symbol/token that is a member of V
P is a set of production rules
N is V - T, set of non-terminal symbols/tokens

Types of grammars and restrictions on their productions, w1 -> w2
0 no restrictions
1 length(w1) <= length(w2), w2=lambda
2 w1 = A, where A is non-terminal symbol
3 w1=A and w2=aB or w2=a, where A is an element of N, B is an element of N, and a is an element of T, or S->lambda

No comments: