[Mono-dev] [Dotnet-runtime-dev] ASCII Strings Proposal
Miguel de Icaza
miguel.de.icaza at gmail.com
Thu Jul 28 14:07:23 UTC 2016
While this is indeed possible, we would not be able to leverage the fact
that 7-bit encoded strings could be copied without conversions when going
out on a P/Invoke with "Ansi" settings (which in Mono, we have overloaded
to mean "utf-8").
And Unix is predominantly a utf-8 friendly world. Hence, the encoding is
better for our purposes.
On Thu, Jul 28, 2016 at 3:33 AM, Jonathan Gilbert <logic at deltaq.org> wrote:
> Another thought: It would make more sense for the single-byte encoding to
> be ISO-8559-1 (Latin-1) than ASCII, because ASCII is either constrained to
> 128 code points, or, most typically extended by code page 437 in North
> American computers (and, of course, it cannot be assumed to be code page
> 437 in the local encoding) requires a look-up table to convert to/from
> Unicode, whereas Latin-1 simply is the first 256 code points of Unicode,
> making the conversion a simple cast between System.Char/wchar_t and byte.
> Jonathan Gilbert
> On Thu, Jul 28, 2016 at 2:15 AM, Jonathan Gilbert <logic at deltaq.org>
>> Phew :-) I must have gotten the wrong idea from this:
>> Jonathan Gilbert
>> On Thu, Jul 28, 2016 at 12:06 AM, Miguel de Icaza <
>> miguel.de.icaza at gmail.com> wrote:
>>> Hello Jonathan,
>>> I personally think it is a terrible idea to make Mono completely unable
>>> to run code that compiles and runs just fine on Microsoft's .NET framework.
>>> Could get_OffsetToStringData be made to convert the ASCII
>>> representation back to UCS-2 on-the-fly for that edge case where the code
>>> actually uses the fixed (char *ptr = str) pattern? It's not a very
>>> common pattern, so the overhead of the conversion, while defeating the
>>> purpose of using that pattern in the first place, would affect only the
>>> tiniest minority of code.
>>> If this were to become a standard part of Mono, that would have to be
>>> The reason it is not done in the current patch is that we needed to
>>> identify all the spots with issues so they could adjusted to deal with the
>>> two encodings, purely a bootstrapping side effect.
>>> And we need the spots adjusted, so we do not needlessly create duplicate
>>> strings on demand, otherwise one of the benefits of this work (reduce
>>> memory pressure) would go out the window.
>>> If this were the direction taken, it might be nice also to provide a way
>>> to force an ASCII-capable string to be UCS-2 anyway, in case there are
>>> people who want the fixed (char *ptr = str) pattern to remain
>>> performant -- perhaps an environment variable?? Obviously we wouldn't want
>>> the Mono runtime to scan the environment block every time it allocates a
>>> string, so perhaps it could do the check & cache the result once on
>>> startup, and then allow some innocuous method that's already doing a lot of
>>> work, such as string.IsInterned, to re-check it. This avoids adding
>>> Mono-specific API, so that code written to be aware of Mono's peculiarity
>>> still runs just fine on other frameworks.
>>> Something like that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mono-devel-list