[Gtk-sharp-list] Re: Encoding problems
Gaute B Strokkenes
Fri, 16 Apr 2004 20:23:33 +0200
On 16 apr 2004, email@example.com wrote:
>>> Does it work if you add -codepage:utf8 to the mcs compile line?
>> Yes, it works, thanks :)
>> But shouldn't that be taken care of by MonoDevelop?
>> And also - is UTF-16 a standard for Gtk# applications?
> Not having used MonoDevelop yet (yes, I'm evil!), I can only
> I suspect the problem is the lack of a BOM (Byte Order Mark), which
> would let the compiler know the byte order of the file.
Really, the problem is lack of autodetection. It's easy enough to
recognise UTF-8, because the byte sequences that form valid UTF-8
sequences are very distinctive: While it's technically possible for,
say a (highly contrived) ISO-8859-1 encoded file to consist only of
byte sequences that are valid UTF-8 sequences, that just doesn't
happen in practise.
> UTF-16 requires the presence of a BOM (0xFFFE or OXFEFF, depending
> on big-endian or little-endian, not necessarily in that order), so
> if the BOM is present the compiler will know what codepage to use.
> UTF-8 doesn't require it. Which means it is impossible to
> distinguish between a UTF-8 encoded file and a file encoded in the
> local codepage. Consequently, mcs assumes that the local codepage
> is used.
I would recommend scanning the file to check for UTF-8--ness first (in
the absence of any explicit declaration.)
> The solution is to either tell mcs the correct codepage, which is
> what -codepage:UTF-8 does,
It's always a good idea to be explicit.
> or to insert a UTF-8 encoded BOM at the beginning of the file.
I strongly disrecommend that; the UTF-8 BOM will break a lot of other
stuff on a unix system.
Have a look at:
Gaute Strokkenes http://www.srcf.ucam.org/~gs234/
Now, I think it would be GOOD to buy FIVE or SIX STUDEBAKERS
and CRUISE for ARTIFICIAL FLAVORING!!