I have just released a new version 0.4.0 of the P2X light-weight parser with XML output. The main improvement is that there now is complete support for the UTF-8 encoding. In particular, identifiers can now have arbitrary charachters. For example, if you execute

echo "øłð ñ€W" | p2x

you now get in the output two identifiers, nice and clean:

<op line='1' col='6' code='71' type='JUXTA'>
 <id line='1' col='0' code='31' repr='øłð' type='IDENTIFIER'>
  <ca:text>øłð</ca:text>
 </id>
 <op line='1' col='6' code='70' type='SPACE'>
  <ca:text> </ca:text>
 </op>
</op>
<id line='1' col='7' code='31' repr='ñ€W' type='IDENTIFIER'>
 <ca:text>ñ€W</ca:text>
</id>

The other major change is the license, P2X is now distributed under the GPLv3 license.

Posted on .
blog comments powered by Disqus