[Gtk-sharp-list] File name encodings

Federico Mena Quintero federico@ximian.com
Thu, 17 Feb 2005 13:56:11 -0600


There's a problem in the way file names are extracted from
FileChooserDialog and then represented internally:

1. The generator spits out something like

	public string Filename { 
		get {
			IntPtr raw_ret = gtk_file_chooser_get_filename(Handle);
			string ret = GLib.Marshaller.PtrToStringGFree(raw_ret);
			return ret;

2. In turn, Glib.Marshaller.PtrToStringGFree() uses
Marshal.PtrToStringAnsi() internally.

3. PtrToStringAnsi() is implemented with mono_string_new().

4. mono_string_new() uses g_utf8_to_utf16().  If the conversion results
in an error (for example, if the source string is not valid UTF-8), no
string is created.

The problem is that gtk_file_chooser_get_filename() returns filenames in
the "Glib filename encoding" [1].  This is the same as the on-disk
representation, whose encoding is hopefully listed in the
G_FILENAME_ENCODING environment variable [2].

If a filename is not UTF-8 on disk, then mono_string_new() will fail and
no filename will be returned at the gtk-sharp level.

This is a general problem in Unix with respect to representing
filenames; they are really raw chunks of bytes rather than strings in a
known encoding.

(I only have an old copy of mono which may not reflect the current state
of things, but it also had trouble with this in
mono/io-layer/io.c:FindNextFile() --- if it can't convert the on-disk
filename to UTF-8, it ignores the filename).

[1] http://developer.gnome.org/doc/API/2.0/gtk/GtkFileChooser.html#gtkfilechooser-encodings

[2] http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings