Skip to content

Commit

Permalink
MMI: Fix comp. perf. issue w/ unaligned image rows
Browse files Browse the repository at this point in the history
Using ldc1 with a non-64-bit-aligned memory location causes as much as a
10x slow-down in overall compression performance.
  • Loading branch information
dcommander committed Jan 31, 2019
1 parent 2d0b675 commit 1c2d3cf
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 7 deletions.
3 changes: 3 additions & 0 deletions ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ incorrect PPM images when used with the `-colors` option.
7. Fixed an issue whereby a static build of libjpeg-turbo (a build in which
`ENABLE_SHARED` is `0`) could not be installed using the Visual Studio IDE.

8. Fixed a severe performance issue in the Loongson MMI SIMD extensions that
occurred when compressing RGB images whose image rows were not 64-bit-aligned.


2.0.1
=====
Expand Down
17 changes: 12 additions & 5 deletions simd/loongson/jccolext-mmi.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
* Loongson MMI optimizations for libjpeg-turbo
*
* Copyright 2009 Pierre Ossman <[email protected]> for Cendio AB
* Copyright (C) 2014-2015, D. R. Commander. All Rights Reserved.
* Copyright (C) 2016-2017, Loongson Technology Corporation Limited, BeiJing.
* Copyright (C) 2014-2015, 2019, D. R. Commander. All Rights Reserved.
* Copyright (C) 2016-2018, Loongson Technology Corporation Limited, BeiJing.
* All Rights Reserved.
* Authors: ZhuChen <[email protected]>
* SunZhangzhi <[email protected]>
* CaiWanwei <[email protected]>
* ZhangLixia <[email protected]>
*
* Based on the x86 SIMD extension for IJG JPEG library
* Copyright (C) 1999-2006, MIYASAKA Masaru.
Expand Down Expand Up @@ -184,9 +185,15 @@ void jsimd_rgb_ycc_convert_mmi(JDIMENSION image_width, JSAMPARRAY input_buf,
"$14", "memory"
);
} else {
mmA = _mm_load_si64((__m64 *)&inptr[0]);
mmG = _mm_load_si64((__m64 *)&inptr[8]);
mmF = _mm_load_si64((__m64 *)&inptr[16]);
if (!(((long)inptr) & 7)) {
mmA = _mm_load_si64((__m64 *)&inptr[0]);
mmG = _mm_load_si64((__m64 *)&inptr[8]);
mmF = _mm_load_si64((__m64 *)&inptr[16]);
} else {
mmA = _mm_loadu_si64((__m64 *)&inptr[0]);
mmG = _mm_loadu_si64((__m64 *)&inptr[8]);
mmF = _mm_loadu_si64((__m64 *)&inptr[16]);
}
inptr += RGB_PIXELSIZE * 8;
}
mmD = mmA;
Expand Down
21 changes: 19 additions & 2 deletions simd/loongson/loongson-mmintrin.h
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
/*
* Loongson MMI optimizations for libjpeg-turbo
*
* Copyright (C) 2016-2017, Loongson Technology Corporation Limited, BeiJing.
* Copyright (C) 2016-2018, Loongson Technology Corporation Limited, BeiJing.
* All Rights Reserved.
* Copyright (C) 2019, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
Expand Down Expand Up @@ -41,7 +42,7 @@ typedef float __m32;

/********** Set Operations **********/

extern __inline __m64
extern __inline __m64 FUNCTION_ATTRIBS
_mm_setzero_si64(void)
{
return 0.0;
Expand Down Expand Up @@ -1245,6 +1246,22 @@ _mm_load_si64(const __m64 *src)
asm("ldc1 %0, %1\n\t"
: "=f" (ret)
: "m" (*src)
: "memory"
);

return ret;
}

extern __inline __m64 FUNCTION_ATTRIBS
_mm_loadu_si64(const __m64 *src)
{
__m64 ret;

asm("gsldlc1 %0, 7(%1)\n\t"
"gsldrc1 %0, 0(%1)\n\t"
: "=f" (ret)
: "r" (src)
: "memory"
);

return ret;
Expand Down

0 comments on commit 1c2d3cf

Please sign in to comment.