[Mono-dev] Mono.Simd - slower than the normal implementation
Alan McGovern
alan.mcgovern at gmail.com
Sat Nov 15 21:37:01 EST 2008
Hey,
On Sat, Nov 15, 2008 at 3:50 PM, Rodrigo Kumpera <kumpera at gmail.com> wrote:
> Hi Alan,
> -Getters and setter are a hint of ill vectorized code.
In this particular scenario, I'm not sure how i can get rid of the use
of getters/setters unless I use even more unsafe code. I don't know
whether it's feasible or not, but it'd be great to be able to use this
API without having to use unsafe code. At the moment, I don't think
it's really possible to use this API without getters and setters.
> The last part of your unsafe code should use temps for the intermediate results.
Do you mean that I should copy the vector 'e', which i got from
XOR'ing my values, into another Vector4ui using the store operation?
Then I should do my bitshifting/storing into uint[] from that one?
> -For the safe case we still miss proper integration with arrays, in the form
> of methods to
> extract and store vectors from them.
I was thinking that the API could expose something like:
Vector4ui.Create (uint[] array, int offset, ref Vector4ui result)
which could be changed into:
result = *((Vector4ui*)&array [offset]);
Though I'm sure you have ideas already on this ;) A similar method for
storing the result into a uint[] would be great too.
>
> Your code looks a bit strange, the Vector4ui constructor indexes in
> particular. Have you checked that
> the output of the 3 methods are the same?
Yes, there is a bug in my implementation there, I left out a bracket
when setting the value of buff[t+3]. There should be an additional
bracket around (e.W << 1) | (e.W >> 31). Other than that, the
implementation is correct. I've pasted the correct implementation of
the unsafe and safe SIMD versions below. Just for reference purposes.
>
> I'll work on the Mono.Simd issues next week, getting setters to be
> accelerated, some methods
> to better integrate with arrays and other things like element extractors.
Great stuff. Give me a shout when you've done that and I'll try to
improve the above implementation. Though if you have time to spare
while writing the SIMD code, you could take a look at it yourself ;)
Thanks,
Alan.
Reference implementations (non buggy ;) ):
public static void FillBuffSafe(uint[] buff)
{
for (int t = 16; t < buff.Length; t += 4)
{
Vector4ui e = new Vector4ui(buff[t - 3], buff[t - 2],
buff[t - 1], buff[t - 0]) ^
new Vector4ui(buff[t - 8], buff[t - 7],
buff[t - 6], buff[t - 5]) ^
new Vector4ui(buff[t - 14], buff[t -
13], buff[t - 12], buff[t - 11]) ^
new Vector4ui(buff[t - 16], buff[t -
15], buff[t - 14], buff[t - 13]);
e.W ^= buff[t];
buff[t] = (e.X << 1) | (e.X >> 31);
buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
buff[t + 3] = ((e.W << 1) | (e.W >> 31)) ^ ((e.X << 2)
| (e.X >> 30));
}
}
public unsafe static void FillBuffUnsafe(uint[] buffb)
{
fixed (uint* buff = buffb)
{
for (int t = 16; t < buffb.Length; t += 4)
{
Vector4ui e = *((Vector4ui*)&buff[t - 3]) ^
*((Vector4ui*)&buff[t - 8]) ^
*((Vector4ui*)&buff[t - 14]) ^
*((Vector4ui*)&buff[t - 16]);
e.W ^= buff[t];
buff[t] = (e.X << 1) | (e.X >> 31);
buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
buff[t + 3] = ((e.W << 1) | (e.W >> 31)) ^ ((e.X
<< 2) | (e.X >> 30));
}
}
}
>
> Rodrigo
>
> On Sat, Nov 15, 2008 at 12:13 AM, Alan McGovern <alan.mcgovern at gmail.com>
> wrote:
>>
>> I found a bit of code in the SHA1 implementation which i thought was
>> ideal for SIMD optimisations. However, unless i resort to unsafe code,
>> it's actually substantially slower! I've attached three
>> implementations of the method here. The original, the safe SIMD and
>> the unsafe SIMD. The runtimes are as follows:
>>
>> Original: 600ms
>> Unsafe Simd: 450ms
>> Safe Simd: 1700ms
>>
>> Also, the method is always called with a uint[] of length 80.
>>
>> Is this just the wrong place to be using simd? It seemed ideal because
>> i need 75% less XOR's. If anyone has an ideas on whether SIMD could
>> actually be useful for this case or not, let me know.
>>
>> Thanks,
>> Alan.
>>
>>
>> The original code is:
>>
>> private static void FillBuff(uint[] buff)
>> {
>> uint val;
>> for (int i = 16; i < 80; i += 8)
>> {
>> val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i -
>> 16];
>> buff[i] = (val << 1) | (val >> 31);
>>
>> val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i -
>> 15];
>> buff[i + 1] = (val << 1) | (val >> 31);
>>
>> val = buff[i - 1] ^ buff[i - 6] ^ buff[i - 12] ^ buff[i -
>> 14];
>> buff[i + 2] = (val << 1) | (val >> 31);
>>
>> val = buff[i + 0] ^ buff[i - 5] ^ buff[i - 11] ^ buff[i -
>> 13];
>> buff[i + 3] = (val << 1) | (val >> 31);
>>
>> val = buff[i + 1] ^ buff[i - 4] ^ buff[i - 10] ^ buff[i -
>> 12];
>> buff[i + 4] = (val << 1) | (val >> 31);
>>
>> val = buff[i + 2] ^ buff[i - 3] ^ buff[i - 9] ^ buff[i -
>> 11];
>> buff[i + 5] = (val << 1) | (val >> 31);
>>
>> val = buff[i + 3] ^ buff[i - 2] ^ buff[i - 8] ^ buff[i -
>> 10];
>> buff[i + 6] = (val << 1) | (val >> 31);
>>
>> val = buff[i + 4] ^ buff[i - 1] ^ buff[i - 7] ^ buff[i -
>> 9];
>> buff[i + 7] = (val << 1) | (val >> 31);
>> }
>> }
>>
>> The unsafe SIMD code is:
>> public unsafe static void FillBuff(uint[] buffb)
>> {
>> fixed (uint* buff = buffb) {
>> Vector4ui e;
>> for (int t = 16; t < buffb.Length; t += 4)
>> {
>> e = *((Vector4ui*)&(buff [t-16])) ^
>> *((Vector4ui*)&(buff [t-14])) ^
>> *((Vector4ui*)&(buff [t- 8])) ^
>> *((Vector4ui*)&(buff [t- 3]));
>> e.W ^= buff[t];
>>
>> buff[t] = (e.X << 1) | (e.X >> 31);
>> buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
>> buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
>> buff[t + 3] = (e.W << 1) | (e.W >> 31) ^ ((e.X << 2) | (e.X >>
>> 30));
>> }
>> }
>> }
>>
>> The safe simd code is:
>> public static void FillBuff(uint[] buff)
>> {
>> Vector4ui e;
>> for (int t = 16; t < buff.Length; t += 4)
>> {
>> e = new Vector4ui (buff [t-16],buff [t-15],buff
>> [t-14],buff [t-13]) ^
>> new Vector4ui (buff [t-14],buff [t-13],buff
>> [t-12],buff [t-11]) ^
>> new Vector4ui (buff [t-8], buff [t-7], buff
>> [t-6], buff [t-5]) ^
>> new Vector4ui (buff [t-3], buff [t-2], buff
>> [t-1], buff [t-0]);
>>
>> e.W ^= buff[t];
>> buff[t] = (e.X << 1) | (e.X >> 31);
>> buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
>> buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
>> buff[t + 3] = (e.W << 1) | (e.W >> 31) ^ ((e.X << 2) |
>> (e.X >> 30));
>> }
>> }
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
>
More information about the Mono-devel-list
mailing list