I have just released a new version 0.4.0 of the P2X light-weight parser with XML output. The main improvement is that there now is complete support for the UTF-8 encoding. In particular, identifiers can now have arbitrary charachters. For example, if you execute
echo "øłð ñ€W" | p2x
you now get in the output two identifiers, nice and clean:
<op line='1' col='6' code='71' type='JUXTA'> <id line='1' col='0' code='31' repr='øłð' type='IDENTIFIER'> <ca:text>øłð</ca:text> </id> <op line='1' col='6' code='70' type='SPACE'> <ca:text> </ca:text> </op> </op> <id line='1' col='7' code='31' repr='ñ€W' type='IDENTIFIER'> <ca:text>ñ€W</ca:text> </id>
The other major change is the license, P2X is now distributed under the GPLv3 license.