Saturday, 19 January 2013

Set-theoretic foundations for formal language theory?

Hi Anthony,



The question you're asking in your text isn't quite the same as the question in your title, but the answer to the title subsumes the one you ask in the text.



To define a grammar, we start with a an alphabet $Sigma$, and a set of nonterminal variable $N$, and then define a grammatical expression as an element of the formal semiring $G$ over $Sigma + N$. (It's a good exercise to see how the semiring axioms let you convert grammatical expressions into their Backus normal form.) Next, a grammar is a map from $N to G$ -- that is, it assigns a grammatical expression to each variable.



Now, consider the free monoid over $Sigma$. Concretely, these are sequences of characters in $Sigma$. A language is a set of these strings, and our goal is to give an interpretion sending nonterminals to languages. If we had a map $f$ sending nonterminals to languages, we could interpret grammatical expressions inductively, by interpreting the formal multiplication in the ring as concatenation (using the monoid operation) of elements in each language, and interpreting the formal sum as set union of languages.



Now, consider the free monoid over $Sigma$. Concretely, these are sequences of characters in $Sigma$. A language is a set of these strings (ie, an element of $mathcal{P}(Sigma^{*})$), and our goal is to give an interpretion sending nonterminals to languages. If we had a map $f$ sending nonterminals to languages, we could interpret grammatical expressions inductively, by interpreting the formal multiplication in the ring as concatenation (using the monoid operation) of elements in each language, and interpreting the formal sum as set union of languages.



If we have such an interpretation (a function $N to mathcal{P}(Sigma^{*})$) and a grammar (a map $N to G$), we can lift the grammar to an endofunction on interpretations -- that is, to a function $(N to L) to (N to L)$. (I'm using $L$ for a set of strings due to jsMath flakiness -- it should be powerset-sigma-star.)



The language for each nonterminal will be the least fixed point of this functional. (As a technical detail, to make this functional monotone, so that the Knaster-Tarski theorem applies, you need to ensure that each interpretation also unions the new language with the old argument language -- ie, you take an interpretation $i$ sketched above and change it to $lambda f.;lambda n.; f(n) cup i(n)$.)



I should also add that this is all standard material, which should appear in every textbook on formal language theory. (I'm pretty sure it's in Sipser.)

No comments:

Post a Comment