String performance boost (Re: [Mono-dev] Patch to boost speedofUnicodeEncoding)
ClassDevelopment at A-SoftTech.com
Fri Mar 17 07:23:11 EST 2006
Basically this also were my findings back then (you can even see some
functions where I did check if e.g. the length was > 10000 and then did a
fallback to native). (By the way: I believe in a huge number of real-world
cases strings will be rather short)
Did you test using the current memcpy function? I think that it is possible
to do something notably faster than this method. If you are using it just
for copying chars I don't even think that these could possibly be 1 or 3
byte aligned (Well at least I can't think of any situation where they would
But as I wrote the optimum would likely be to replace this method with
assembly on the most important platforms in the JIT.
Another thing that I found back then is that running the managed
implementations on MS .Net usually gave another significant speedup, so
there is still potential for there as the mono runtime gets faster (Maybe
better Register Allocator?) while the unmanaged versions are not likely to
Also most of the implementations I wrote rely on inlining of the used Char
Methodcalls or they will be notably slower than native.
----- Original Message -----
From: "Zac Bowling" <zac at zacbowling.com>
To: <mono-devel-list at lists.ximian.com>
Sent: Friday, March 17, 2006 12:11 PM
Subject: Re: String performance boost (Re: [Mono-dev] Patch to boost
I dispeared for a few days to work on this and came back to find I
wasn't alone. :-P
I was testing some similar concepts myself. I've also been testing
moving things back and forth to unamanged internal calls to see if I
could speed it up and so far I've got mixed results. Some things
faster; some things slower. Mostly slower for most of the functions,
but it really depended on the size of the string I was working with
which was the deciding factor if it was faster to be unmanged or not.
One thing I found is that NUnit itself can't be trusted 100% with
testing some of this :-P It calls corelib use these same functions we
are tweaking are slowing it down or speeding it up in some cases. It's
fine because the test suite is a test in of itself in a way. Just
something interesting I noticed when I was testing, when I managed to
slow down strings so much in one test (lovely printf on each char) that
it took NUnit about 30 seconds to initialize before my test even ran :-)
I'll post all my benchmarks and patches in the morning when I can
assembly them in some readable format.
----- Message from atsushi at ximian.com ---------
Date: Fri, 17 Mar 2006 19:02:50 +0900
From: Atsushi Eno <atsushi at ximian.com>
Reply-To: Atsushi Eno <atsushi at ximian.com>
Subject: Re: String performance boost (Re: [Mono-dev] Patch to boost
To: Andreas Nahr <ClassDevelopment at A-SoftTech.com>
> Your patch is based on pretty old code(!) ;-) I tried it for
> each functionality:
> - CopyTo():
> The patch makes it faster. But I guess CopyChar() functionality
> overlaps Paolo's memcpy() implementation in the latest code,
> though it was not simply replaceable.
> - ToCharArray():
> ditto. The improvement rate is like you showed, about 5%.
> - Trim():
> The patch resulted much better, as you showed. I think it is
> the best improvement.
> - GetHashCode(), Insert(), Remove(),
> ToUpperInvariant(), ToLowerInvariant():
> We already have managed implementation like yours.
> - ToUpper(), ToLower():
> Based on your patch, I rather made TextInfo.ToUpper() and
> ToLower() based on pointer implementation, so the mind is already
> checked in svn :-)
> - Replace():
> There was a one-liner bug in the implementation. Even after
> fixing it, the result became much worse than existing one, about
> 1.5x slower :-(
> - IndexOf():
> It didn't make improvement - I think it is because we already
> have managed implementation by now.
> - IndexOfAny():
> The patch made results much worse :-( It doubled execution time.
> - Join():
> The patch was a bit worse, about 7-8% loss.
> - PadLeft(), PadRight():
> Almost no difference - and seems like it still has bugs (some
> NUnit tests fail).
> - Split():
> It looks pretty fast but it is too buggy to fix right now :(
> I'm attaching a patch based on the original source, so anyone can
> try them handy. Those which is buggy or makes performance worse
> are commented out.
> I think Trim() can be checked in, with a few changes
> (changing CharCopy() to InternalStrcpy()). For CharCopy(),
> I'd wait for further review. ToCharArray() is also nicer
> to have, but it depends on CharCopy().
> Thanks a bunch, Andreas.
> Atsushi Eno
> Andreas Nahr wrote:
>> It's just bad naming ;)
>> The String.cs.old contains the managed implementations. Please note that
>> there are usually multiple implementation-tries for each method and all
>> but one is just commented out. Its extremely unlikely that the ones
>> currently "active" are the best ones.
>> Also I can remember that at least one of the stats is completely wrong
>> because there was a bug in the implemenation that copied only half the
>> If looking at the numbers I guess it is Replace because the difference is
>> really huge
>> I'll also attach the file I used for the microbenchmarks. But it is a
>> complete mess and I doubt anyone could use anything from it.
>>> Wow, the numbers are quite impressive. Can you please attach your
>>> String.cs(.new) ? :-)
>>> Atsushi Eno
>>> Andreas Nahr wrote:
>>>> Hi Zac, Hi Kornél,
>>>> I've been working quite some time on improving the existing String
>>>> class a
>>>> long time ago (about 2-3 years), but as I got a new job back then never
>>>> the time to finish anything. My findings back then were that purely
>>>> implementations basically always outperformed the internalcalls (And I
>>>> the JIT is now even more evolved than it was 3 years ago).
>>>> However as I sayed it was never finished and contains bugs. Moreover it
>>>> doesn't care at all for alignment issues.
>>>> If anyone want's to look at it - I attach my String-Testing class.
>>>> find lots of different tries to optimize the Methods. But beware, the
>>>> is in horrible shape, far from being usable.
>>>> Some optimizations use specific string-domain knowlege (like "equals"
>>>> testing first for the first char and after that from the end of the
>>>> My conclusion was: We should have a few managed functions to do the
>>>> (MemoryCopy, MemoryCompare, possibly for Byte* and Char*). They should
>>>> managed so that optimizers and optimizing compilers are able to do
>>>> optimizations even on IL level. Whenever possible the JIT should
>>>> these at runtime (provided they aren't optimized away) with
>>>> architecture-specific assembly versions with the managed version as
>>>> Here are some findings from microbenchmarks made back then (the first
>>>> is always the time in milliseconds for the existing unmanaged
>>>> (internalcall), the second for a tested managed implementation, the
>>>> in () is the length of the string tested):
>>>> Some Methods still need internalcalls to create new Strings, but were
>>>> faster than the native implementations (The optimum would be to
>>>> InternalAllocateStr calls).
>>>> CopyTo (000): 810 -> 381
>>>> CopyTo (010): 832 -> 441
>>>> CopyTo (100): 1352 -> 881
>>>> CopyTo (512): 3395 -> 3014
>>>> ToCharArray (000): 5067 -> 4466
>>>> ToCharArray (002): 5317 -> 4857
>>>> ToCharArray (015): 8041 -> 7691
>>>> ToCharArray (960): 2915 -> 2894
>>>> ToCharArray (with parameters): Similar to above
>>>> Trim (): 6930 -> 6760
>>>> Trim (custom search Chars): 10596 -> 9413
>>>> Trim (default search Chars): 10455 -> 7210
>>>> Trim (no trimmed chars, long string): 1893 -> 661
>>>> Trim (no trimmed chars, short string): 1893 -> 631
>>>> Replace (004 - one replace): 37264 -> 3135
>>>> Replace (004 - nothing to replace): 3735 -> 310
>>>> Replace (961 - everything replaced): 2584 -> 501
>>>> Replace (961 - only last char replaced): 2463 -> 481
>>>> Split (default split Chars, long string, lots splitted): 42421 -> 8523
>>>> Split (custom split Chars, long string, non found): 2944 -> 2263
>>>> Split (custom split Chars, long string, lots found): 22062 -> 7330
>>>> Split (default split Chars, short string, non found): 2093 -> 761
>>>> Split (default split Chars, short string, nearly only splitChars):
>>>> 8002 ->
>>>> IndexOf (17): 1132 -> 791
>>>> IndexOf (2162): 10576 -> 7862
>>>> LastIndexOf (similar to above)
>>>> IndexOfAny (long string, nothing found): 25867 -> 2984 (Break even bei
>>>> LastIndexOfAny (similar to above)
>>>> PadLeft/ PadRight: slightly slower that current (Should get faster once
>>>> optimized CharCopy is available): 1012 -> 1031
>>>> Remove: slightly slower that current (Should get faster once optimized
>>>> CharCopy is available): 2153 -> 2283
>>>> If somebody is interesting in picking this up I might be able to help a
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
----- End message from atsushi at ximian.com -----
Mono-devel-list mailing list
Mono-devel-list at lists.ximian.com
More information about the Mono-devel-list