JPEG XL

Info

rules 57
github 35276
reddit 647

JPEG XL

tools 4225
website 1655
adoption 20712
image-compression-forum 0

General chat

welcome 3810
introduce-yourself 291
color 1414
photography 3435
other-codecs 23765
on-topic 24923
off-topic 22701

Voice Channels

General 2147

Archived

bot-spam 4380

libjxl

intelfx
2025-04-01 04:06:30
Probably.
Demiurge
2025-04-01 04:06:40
What's wrong about it?
intelfx
2025-04-01 04:06:48
You don't need a separate process to run a WASM VM.
Demiurge
2025-04-01 04:07:48
You can link a wasm vm into your process but copying memory in and out of the vm is no different than serializing data between different processes. Just with even MORE overhead.
intelfx
2025-04-01 04:07:54
That's incorrect.
Demiurge
2025-04-01 04:08:19
Really?
2025-04-01 04:08:34
I don't understand then.
intelfx
2025-04-01 04:08:52
Yup, really. Context switches are expensive and are only getting more expensive (Meltdown/Spectre says hello).
2025-04-01 04:09:31
Besides, who said you need to copy in and out of the VM? If it's a few tens (or hundreds) of bytes it's easier to copy and virtually free, if it's more, you can absolutely do something to access WASM buffers directly.
2025-04-01 04:10:07
Anyway, embedding native code in JXL bitstream is an obvious non-starter for obvious reasons of ISA dependency. Even if you could sandbox it perfectly with zero cost (you can't).
Demiurge
intelfx Anyway, embedding native code in JXL bitstream is an obvious non-starter for obvious reasons of ISA dependency. Even if you could sandbox it perfectly with zero cost (you can't).
2025-04-01 04:11:51
Oh, no one was arguing for that. Jon was just saying it might be worthwhile to make a very simple and basic JIT that generates native instructions to decode the MA trees.
intelfx
2025-04-01 04:12:04
Ah, then I misread that part.
2025-04-01 04:12:16
Then you have a JIT already, what's the problem with shipping WASM bitcode instead of that? :)
Demiurge
2025-04-01 04:12:31
It's a kind of risky idea but MAYBE it could be made guaranteed-safe if someone was very careful and clever?
intelfx
2025-04-01 04:12:42
They were, and they made wasm...
2025-04-01 04:12:52
Why reinvent the wheel?
Demiurge
2025-04-01 04:13:56
Well the MA trees have a lot simpler requirements than wasm. That would be like killing a mosquito with a grenade launcher
2025-04-01 04:14:43
wasm doesn't have good support for simd instructions yet either right?
intelfx
2025-04-01 04:18:22
There is support for simd, has been for quite some time. You can use SIMD from Rust compiled to wasm32, for instance. Is there any indication that it isn't good?
2025-04-01 04:18:53
Anyway. TL;DR of my position. - Linux namespaces are totally irrelevant because they leave GIANT attack surface which is extremely excessive for running untrusted native code (Linux namespaces except userns aren't, and never were, a security boundary; userns are kinda trying to be, but a very shoddy one) - Plan 9 mooning is totally irrelevant as well because we have it already and it's called seccomp mode 1, literally designed for running untrusted binary code with almost zero attack surface - If the amount of untrusted binary code is tiny, the setup and communication overhead of ANY external process (be it seccomp 1 or whatever Plan 9 had) would be MASSIVELY exceeding the cost of the code itself - if you want/need a JIT anyway, just use WASM (in whatever part of the pipeline) and don't reinvent the wheel
jonnyawsom3
2025-04-01 04:22:38
<:JXL:805850130203934781>
Demiurge
intelfx Anyway. TL;DR of my position. - Linux namespaces are totally irrelevant because they leave GIANT attack surface which is extremely excessive for running untrusted native code (Linux namespaces except userns aren't, and never were, a security boundary; userns are kinda trying to be, but a very shoddy one) - Plan 9 mooning is totally irrelevant as well because we have it already and it's called seccomp mode 1, literally designed for running untrusted binary code with almost zero attack surface - If the amount of untrusted binary code is tiny, the setup and communication overhead of ANY external process (be it seccomp 1 or whatever Plan 9 had) would be MASSIVELY exceeding the cost of the code itself - if you want/need a JIT anyway, just use WASM (in whatever part of the pipeline) and don't reinvent the wheel
2025-04-01 05:35:16
unveil is a filesystem namespace with no overhead, and unveil+pledge is a way for processes to revoke their own privileges and have the kernel enforce it. They're good ideas and something similar should be adopted everywhere to make security easier.
intelfx
2025-04-01 05:36:01
both unveil and pledge combined are less powerful than seccomp mode 1, so again totally irrelevant
Demiurge
2025-04-01 05:36:12
I thought simd instructions were still in the planning stages for wasm
intelfx
2025-04-01 05:36:43
(seccomp mode 2 _is_ pledge-equivalent, btw, and yes, filesystem namespaces are unveil, but that's again not what we need here)
Demiurge
intelfx both unveil and pledge combined are less powerful than seccomp mode 1, so again totally irrelevant
2025-04-01 05:37:46
Less powerful? It's not about power. All of the power of seccomp is totally useless if it's incomprehensible to use.
2025-04-01 05:38:10
Writing secure software needs to be practical and obvious and convenient.
intelfx
2025-04-01 05:38:28
It's about relevance. I don't understand what's the point of bringing irrelevant sandboxing features from other OSes into the discussion, what point are you making?
Demiurge
2025-04-01 05:40:27
Because you mentioned seccomp, which is a uselessly incomprehensible version of what could be a useful security feature
2025-04-01 05:40:40
But it's only useful if people actually want to use it
2025-04-01 05:40:52
And no one wants to use it if it's not easy to use like pledge
intelfx
2025-04-01 05:43:12
originally I mentioned seccomp mode 1, which has NO configuration
2025-04-01 05:43:19
so it can't be incomprehensible by definition
Demiurge
2025-04-01 05:43:34
And that is what I mean by namespace, btw
intelfx
2025-04-01 05:43:43
then you are using the words wrong
Demiurge
2025-04-01 05:43:47
Different processes having different views of the filesystem
intelfx
2025-04-01 05:43:48
namespaces mean a very specific thing
Demiurge
2025-04-01 05:43:59
That is often called a filesystem namespace
2025-04-01 05:44:58
And if different processes have access to different kernel syscalls, then that isn't usually called a namespace but you can probably assume the intent or meaning still by the context
intelfx
2025-04-01 05:45:06
Okay, we are going in circles. Namespaces are irrelevant. To run untrusted native code, you don't need "different views of the filesystem": you need *NO* view of the filesystem, like no access to the syscalls at all. Anything else is already an infinitely larger attack surface than you want. So in context of this discussion, filesystem namespaces, or any other namespaces at all, have exactly 0% relevance.
2025-04-01 05:45:34
I said that like an hour ago.
Demiurge And if different processes have access to different kernel syscalls, then that isn't usually called a namespace but you can probably assume the intent or meaning still by the context
2025-04-01 05:46:09
I don't want to have to "assume the intent or meaning". Words have defined meanings, let's use them.
Demiurge
2025-04-01 05:56:08
I can understand your point of view and it's a good one.
intelfx originally I mentioned seccomp mode 1, which has NO configuration
2025-04-01 06:01:14
I don't know what mode 1 is off the top of my head. It's a linux specific API with very fine grained control and not at all friendly to the typical programmer, with not even a wrapper in the C library for it.
jonnyawsom3
2025-04-01 06:23:00
<#806898911091753051>?
Demiurge
Lilli I could not make the chunked API work. Is there an example somewhere, where it is used? I could not find one after looking for quite a while. :/ I set up `JxlChunkedFrameInputSource` with callbacks, which I then feed to `JxlEncoderAddChunkedFrame(frame_settings, true, chunked)` This essentially replaces the call to `JxlEncoderAddImageFrame(frame_settings, &pixel_format, image_data, image_data_size)`
2025-04-01 06:41:22
Sorry no one has gotten back to you on this. Chunked encode API is pretty new and I'm not sure how it works.
2025-04-01 06:41:30
I'm just a lurker.
jonnyawsom3
2025-04-03 08:13:21
Seems like auto-merge for PRs is being blocked by a formatting error <https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178> Just added it to my changelog PR if you want to merge it quickly <@794205442175402004> https://github.com/libjxl/libjxl/pull/4169
Lucas Chollet
2025-04-03 08:35:17
Yikes, now they fail because of my recent changes, pls revert your typo fix lol
jonnyawsom3
Lucas Chollet Yikes, now they fail because of my recent changes, pls revert your typo fix lol
2025-04-03 08:40:27
The only failing required test is due to your [CMAKE fix not being merged](<https://github.com/libjxl/libjxl/actions/runs/14251775663/job/39946056483?pr=4169>), and yours won't merge because of my [typo fix not being merged](<https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178>), so now we're at a gridlock of who's gets manually merged first haha
Lucas Chollet
2025-04-03 08:42:59
I didn't realize that you were hitting that issue too, I was referring to that log in your PR: ``` tools/jxltran.cc:9:#include "lib/include/jxl/decode.h" tools/jxltran.cc:10:#include "lib/include/jxl/decode_cxx.h" Don't add "include/" to the include path of public headers. ``` Isn't it a required test?
jonnyawsom3
Lucas Chollet I didn't realize that you were hitting that issue too, I was referring to that log in your PR: ``` tools/jxltran.cc:9:#include "lib/include/jxl/decode.h" tools/jxltran.cc:10:#include "lib/include/jxl/decode_cxx.h" Don't add "include/" to the include path of public headers. ``` Isn't it a required test?
2025-04-03 08:44:07
Required for merging have the indicator on the right
Lucas Chollet
2025-04-03 08:45:20
Ah, didn't realize that 😅
jonnyawsom3
2025-04-03 08:45:42
Though, fixing it wouldn't hurt since we appear to have plenty of time
2025-04-03 08:46:50
I actually have permission to close PRs on the repo with my Triage role, but unfortunately I can't merge
Lucas Chollet
2025-04-03 08:47:16
I would like to fix the CMake first, but I would need another CI run for that. I'm a bit plaing a guess game here
Though, fixing it wouldn't hurt since we appear to have plenty of time
2025-04-03 08:47:55
I will do it on my next `jxltran` PR. But you can do too if you want
jonnyawsom3
2025-04-03 08:49:35
It doesn't seem to break anything, so I'll let you do it as part of the jxltran work
Seems like auto-merge for PRs is being blocked by a formatting error <https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178> Just added it to my changelog PR if you want to merge it quickly <@794205442175402004> https://github.com/libjxl/libjxl/pull/4169
2025-04-03 08:52:37
<@794205442175402004> apologies for the double ping, but you'll need to force merge that due to the stalemate with the CMAKE sjpeg fix (which will require re-approval after too). Then Auto-Merge should start working again
intelfx
2025-04-05 04:52:40
OK, I'm playing with progressive encoding again... Last time y'all told me that progressive lossless encoding is basically broken with effort >=5 (although in the end I failed to understand why exactly). However, I'm also getting the same thing if I use `--progressive_dc` instead of `-p`: ``` $ cjxl path/to/png path/to/jxl -d 0 JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 161373.8 kB including container (10.997 bpp). 10672 x 11000, 7.569 MP/s [7.57, 7.57], , 1 reps, 32 threads. cjxl path/to/png -d 0 365,93s user 16,98s system 2143% cpu 17,861 total $ cjxl path/to/png path/to/jxl -d 0 --progressive_dc=1 JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] ^C cjxl path/to/png -d 0 --progressive_dc= 392,90s user 233,60s system 104% cpu 9:58,64 total ``` — which, to my understanding, isn't supposed to actually do progressive encoding of the main image (i.e., it does not imply `--progressive_ac` or `--responsive`). Why is this?
A homosapien
2025-04-05 05:20:30
`progressive_dc` and `progressive_ac` only work for the lossy mode of libjxl. For lossless they don't actually make the image progressive, all it does is disables chunked encoding, which trades encoding time for more density.
intelfx
2025-04-05 05:25:08
Ah, it seems my understanding was incomplete. I did understand that `{q,}progressive_ac` was only applicable to the lossy mode (as it is fundamentally about encoding the AC/"HF" VarDCT coefficients), but I thought that `progressive_dc` was basically "prepending" a frame made of the LF coefficients onto the main image, regardless of whether the main image was lossy or lossless.
A homosapien
2025-04-05 05:29:51
Progressive lossless uses a different technique called squeeze.
2025-04-05 05:32:37
2025-04-05 05:32:37
All of JPEG XL's features are explained really well in this technical report. Section 5.1.3 explains what squeeze does alongside some images a few pages down.
jonnyawsom3
intelfx OK, I'm playing with progressive encoding again... Last time y'all told me that progressive lossless encoding is basically broken with effort >=5 (although in the end I failed to understand why exactly). However, I'm also getting the same thing if I use `--progressive_dc` instead of `-p`: ``` $ cjxl path/to/png path/to/jxl -d 0 JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 161373.8 kB including container (10.997 bpp). 10672 x 11000, 7.569 MP/s [7.57, 7.57], , 1 reps, 32 threads. cjxl path/to/png -d 0 365,93s user 16,98s system 2143% cpu 17,861 total $ cjxl path/to/png path/to/jxl -d 0 --progressive_dc=1 JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] ^C cjxl path/to/png -d 0 --progressive_dc= 392,90s user 233,60s system 104% cpu 9:58,64 total ``` — which, to my understanding, isn't supposed to actually do progressive encoding of the main image (i.e., it does not imply `--progressive_ac` or `--responsive`). Why is this?
2025-04-05 07:21:49
We're actively working on progressive lossless, if not almost done with it already https://discord.com/channels/794206087879852103/803645746661425173/1357850229247840401 Right now, progressive_dc disabling chunked works around an [issue](<https://github.com/libjxl/libjxl/issues/3823#issuecomment-2351120650>) with the TOC, where the 1:8 LF frame can't be rendered until the end of the file. Downside being the entire image has to be processed as a whole, instead of threading individual groups. Squeeze, used for progressive lossless, also disables chunked due to downsampling the image as part of the transform
intelfx
2025-04-05 12:24:02
We're actively working on progressive
Crite Spranberry
2025-04-05 04:46:40
So I'm trying to compile libjxl and I followed this guide https://github.com/libjxl/libjxl/blob/main/doc/developing_in_windows_vcpkg.md Visual Studio doesn't show any errors, but I just have no binaries at all in the /out/build/x64-Clang-Release/tools folder
A homosapien
2025-04-05 05:53:09
I recommend using msys2, it's an easier process and it generates faster binaries
Crite Spranberry
2025-04-06 11:29:13
I got further by setting BUILD_TESTING to OFF in CMakeSettings.json
2025-04-06 11:31:30
Why does this always happen
2025-04-06 11:31:40
2025-04-06 11:33:57
What am I doing where everything always fails with the most obscure errors that nobody else has?
2025-04-06 11:56:49
huh msys2 just worked
2025-04-06 11:56:55
No errors, no bs
2025-04-06 11:57:34
Or well ig one error I had to work around (my cmake version is 4.0.0 idk wtf it on about)
2025-04-06 11:58:46
Why are these like quadruple the size they should be
2025-04-06 11:59:10
Actually more like 20-30x the size
2025-04-06 12:04:23
and they don't even work wtf
Demiurge
2025-04-06 12:36:11
Did it download and compile dependencies like brotli, hwy, skia/skcms and whatever?
2025-04-06 12:36:46
It's kind of a pain to compile libjxl
2025-04-06 12:37:10
But totally doable
jonnyawsom3
2025-04-06 12:54:54
Homosapien had the same issues with it not being static and massively larger. We fixed thrm (mostly) and he mentioned rewriting the build docs
Quackdoc
Crite Spranberry and they don't even work wtf
2025-04-06 05:00:46
you compiled dynamic, you need to copy DLLs to the folder the exec is in
Homosapien had the same issues with it not being static and massively larger. We fixed thrm (mostly) and he mentioned rewriting the build docs
2025-04-06 05:01:20
in isolation static will always be smaller
spider-mario
Quackdoc you compiled dynamic, you need to copy DLLs to the folder the exec is in
2025-04-06 05:29:10
or run them from the mingw shell you built them in, as it will have the `PATH` set appropriately
Crite Spranberry Why are these like quadruple the size they should be
2025-04-06 05:29:53
maybe a debug (or at least unoptimised) build? you can run `cmake -DCMAKE_BUILD_TYPE=Release .` from the build directory (or edit `CMAKE_BUILD_TYPE` in `CMakeCache.txt` directly) and rebuild
jonnyawsom3
Quackdoc in isolation static will always be smaller
2025-04-06 06:37:13
Our static builds are still 2-4x larger than the github releases. 10mb per binary
Quackdoc
2025-04-06 06:37:28
[av1_woag](https://cdn.discordapp.com/emojis/852007419474608208.webp?size=48&name=av1_woag)
jonnyawsom3
2025-04-06 06:37:59
It was 50, but that was debug as Mario said
spider-mario
2025-04-06 07:41:02
note that by default, a CMake build won’t be stripped
2025-04-06 07:41:22
so you can do that right after building, or add `-s` (if I recall correctly) to the linker flags
Crite Spranberry
spider-mario maybe a debug (or at least unoptimised) build? you can run `cmake -DCMAKE_BUILD_TYPE=Release .` from the build directory (or edit `CMAKE_BUILD_TYPE` in `CMakeCache.txt` directly) and rebuild
2025-04-06 09:10:37
I did that, which changed my command to this ``` cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_FORCE_SYSTEM_BROTLI=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 .. ``` The exes are still slightly too large, but much closer to normal release From here how would I make a stripped static build?
jonnyawsom3
2025-04-06 09:42:02
<@207980494892040194> didn't you say two static flags are required?
spider-mario
2025-04-06 10:31:03
`-DCMAKE_EXE_LINKER_FLAGS=-s` might help
2025-04-06 10:31:45
(that, or running `strip` on all executables yourself)
A homosapien
Crite Spranberry I did that, which changed my command to this ``` cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_FORCE_SYSTEM_BROTLI=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 .. ``` The exes are still slightly too large, but much closer to normal release From here how would I make a stripped static build?
2025-04-06 11:05:02
You need to specify two flags for a truly static libjxl. `-DBUILD_SHARED_LIBS=OFF` and `-DJPEGXL_STATIC=ON`. Also I recommend removing `-DJPEGXL_FORCE_SYSTEM_BROTLI=ON`, lots of errors pop up and the build fails with it on.
2025-04-06 11:10:58
Also I recommend using clang, according to my testing it's around 5-10% faster than GCC
Crite Spranberry
spider-mario `-DCMAKE_EXE_LINKER_FLAGS=-s` might help
2025-04-06 11:41:44
This does decrease the size, but static builds are still 1-2MB bigger than official release
2025-04-06 11:41:49
I'll see about trying GCC
2025-04-06 11:42:03
Also my command currently ``` cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_STATIC=ON -DCMAKE_EXE_LINKER_FLAGS=-s -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 .. ```
2025-04-06 11:44:35
pacman -S can't find clang-compiler-rt so well idk if it will work but ig I'll try
Crite Spranberry What am I doing where everything always fails with the most obscure errors that nobody else has?
2025-04-06 11:46:40
aaaaaaaa
2025-04-06 11:46:55
welp I got what I could time to try build
2025-04-06 11:47:53
Same Cmake error even though I have 4.0 so ig I will add the minimum shit
2025-04-06 11:49:37
<:Think2:826218556453945364> ``` [344/379] Error: Can't generate doc since Doxygen not installed. FAILED: CMakeFiles/doc C:/Users/Admin/Documents/GitHub/libjxl-mingw/build/CMakeFiles/doc C:\Windows\system32\cmd.exe /C "cd /D C:\Users\Admin\Documents\GitHub\libjxl-mingw\build && false" [353/379] Building CXX object tools/CMakeFiles/enc_fast_lossless.dir/__/lib/jxl/enc_fast_lossless.cc.obj ninja: build stopped: subcommand failed. + retcode=1 ```
A homosapien
2025-04-06 11:50:04
~~I think you don't need clang-rt to compile.~~ Also I think the inflated binary sizes are intrinsically tried to lib-c or p-threads or something like that. Not much you can do about it I think.
Crite Spranberry
2025-04-06 11:52:13
How do I disable doc generation with clang?
A homosapien
2025-04-06 11:52:48
strange, I got it to compile on my machine even though I don't have doxygen installed
2025-04-06 11:52:53
try adding `-DJPEGXL_ENABLE_DOXYGEN=OFF`
Crite Spranberry
2025-04-06 11:58:00
Now I get this ``` + cmake --build /c/Users/Admin/Documents/GitHub/libjxl-mingw/build -- all doc ninja: error: unknown target 'doc', did you mean 'jxl'? ```
2025-04-07 12:00:42
Oh I can just do this
A homosapien
2025-04-07 12:01:31
`Pacman -S --needed mingw-w64-clang-x86_64-compiler-rt mingw-w64-x86_64-doxygen `
Crite Spranberry
2025-04-07 12:02:45
wtf why do I get that error now
Crite Spranberry Now I get this ``` + cmake --build /c/Users/Admin/Documents/GitHub/libjxl-mingw/build -- all doc ninja: error: unknown target 'doc', did you mean 'jxl'? ```
2025-04-07 12:03:31
I went back to the original command and I get that error and I have no idea what I did
2025-04-07 12:04:04
temp storing this here ``` ./ci.sh opt -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_STATIC=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_BUILD_TYPE=Release -DJPEGXL_ENABLE_DOXYGEN=OFF ```
2025-04-07 12:06:15
I took a snapshot before I went to gcc because I knew some shit would happen so ig I'll try again
A homosapien
2025-04-07 12:07:50
wait you are using the script? I got it to compile like so, `export CC=clang && export CXX=clang++` and then running the regular cmake command
2025-04-07 12:08:12
I don't really trust that script I'll be honest
Crite Spranberry
2025-04-07 12:08:33
Oh I'm just following this https://github.com/libjxl/libjxl/blob/main/doc/developing_in_windows_msys.md
2025-04-07 12:12:08
So DJPEGXL_ENABLE_DOXYGEN=OFF just permenantly breaks the build command even if I remove it and nuke the build folder
2025-04-07 12:12:57
So I get an error with doxygen so ig I'll try your method
A homosapien
spider-mario note that by default, a CMake build won’t be stripped
2025-04-07 12:13:14
Does compiling libjxl with LTO optimizations work? It could be another way of reducing binary sizes but it always seems to fail for me.
Crite Spranberry So I get an error with doxygen so ig I'll try your method
2025-04-07 12:14:07
yeah regular ol' cmake works just fine, and it builds faster too. Just use this command for it to use all of your cores `cmake --build . -- -j$(nproc)`
Crite Spranberry
2025-04-07 12:16:49
Well it's using clang now and idk if I notice any difference The executables are still a bit bigger than the official release
A homosapien
2025-04-07 12:17:36
It's 20% faster than the official Windows releases so I would say it's a worthwhile trade off
Crite Spranberry
2025-04-07 12:17:52
cool
2025-04-07 12:20:14
So interesting findings, gcc compiled libjxl theoretically compatible with XP Except I get this error and it works fine on Vista+ so idk what it's on about
2025-04-07 12:20:53
Or well all the executables I'm interested in (cjpegli, cjxl, djxl, and jxlinfo)
2025-04-07 12:21:32
They check out in Dependency Walker, but I get this with all
2025-04-07 12:27:31
The UTF8 manifest breaks XP Ig I'll just compile without it
2025-04-07 12:29:39
Nevermind it can have a UTF8 manifest, it just is picky
2025-04-07 12:32:07
So adding a compatibility entry makes it work ``` <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1"> <application> <windowsSettings> <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage> </windowsSettings> </application> </compatibility> </assembly> ```
Crite Spranberry So interesting findings, gcc compiled libjxl theoretically compatible with XP Except I get this error and it works fine on Vista+ so idk what it's on about
2025-04-07 12:33:05
clang*
2025-04-07 12:33:07
i already forgor
2025-04-07 12:43:15
Well time to see if libavif fares better now with clang as well
2025-04-07 12:50:01
Had to install yasm but it just builds as well holy shit
2025-04-07 12:50:20
oop need to make static build
2025-04-07 12:57:11
<:Think2:826218556453945364>
2025-04-07 01:19:20
I apparently can't do 32 bit build of libavif
2025-04-07 01:19:39
eh jxl better anyways
2025-04-07 02:13:29
Does clang not support utf8?
2025-04-07 02:13:38
Wait nvm
2025-04-07 02:13:41
I noticed the issue
A homosapien
2025-04-07 02:32:29
It would be funny to have a building doc for Windows XP <:KekDog:805390049033191445>
jonnyawsom3
Crite Spranberry So interesting findings, gcc compiled libjxl theoretically compatible with XP Except I get this error and it works fine on Vista+ so idk what it's on about
2025-04-07 06:41:27
Not sure what hardware you're running, but if it's period correct, you could be a good benchmark for our faster_decoding tweaks xD
Lucas Chollet
2025-04-07 02:52:02
Can I get someone to run the pipeline on [4165](<https://github.com/libjxl/libjxl/pull/4165>)
Crite Spranberry
Not sure what hardware you're running, but if it's period correct, you could be a good benchmark for our faster_decoding tweaks xD
2025-04-07 03:03:57
It's a VM, but I do have some period hardware
2025-04-07 03:04:14
but period hardware boring
2025-04-07 03:08:46
For older pre-Sandy Bridge or AVX, I have 462 Athlon XP (unlikely to run due to no SSE2) 478 Pentium 4 something or another i forgor 754 or 939 Athlon 64 I forgor again 775 Pentium 4 3.06GHz 775 C2D 1.86GHz 775 C2Q Q6700 AM3 Athlon II x4 something or another AM3 Phenom II x4 955 AM3 Phenom II x6 1090t 1366 Xeon 5080
jonnyawsom3
2025-04-08 04:48:28
Interesting, I wonder why this wasn't done for jpegli, seeing as it's usually YCbCr (Could this have avoided RGB JPEGs?) <https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_modular.cc#L763>
damian101
2025-04-09 12:42:44
encoding in RGB is very inefficient
2025-04-09 12:44:18
jpegli can encode in XYB using an ICC v4 profile
Demiurge
2025-04-09 12:50:43
Yup, and it would be a lot more compatible with existing hardware and software if it would add a JPEG APP14 tag. Otherwise some decoders will treat it like a normal YCbCr JPEG and mess up the colors.
2025-04-09 12:55:26
Also jpegli STILL uses chroma subsampling by default for XYB JPEG, which is another hint for certain decoders to treat it like a normal YCbCr JPEG
2025-04-09 12:55:42
Since it makes no sense to use chroma subsamping for an RGB JPEG
2025-04-09 12:57:04
Pretty sure that would be a one-line fix too, and that alone would have a big impact. The APP14 header fix might be a 2 line fix.
jonnyawsom3
jpegli can encode in XYB using an ICC v4 profile
2025-04-09 01:20:39
That still uses RGB JPEG internally
damian101
That still uses RGB JPEG internally
2025-04-09 01:21:03
but it's not RGB
2025-04-09 01:21:23
I see what you mean
jonnyawsom3
2025-04-09 01:22:05
Yeah, I mean it could have used the usual YCoCg with YXB
Demiurge
2025-04-09 01:27:27
The ICC profile is applied AFTER reverse-YCoCg back to RGB
2025-04-09 01:30:42
And Adobe (APP14) JPEG is well supported by existing decoders
2025-04-09 01:36:54
Just like how CMYK JPEG is commonly supported, using the same exact Adobe tag
Yeah, I mean it could have used the usual YCoCg with YXB
2025-04-09 01:42:18
If you were to do this, then the decoder would still do the reverse YCbCr transform, but maybe it's possible to make an ICC profile that takes that into account and changes it back by applying another YCbCr transformation on top of the inverse XYB transform. Ugh, confusing. But it's probably doable.
username
2025-04-09 01:45:42
isn't the reason jpegli does XYB as an RGB JPEG because YCbCr in JPEG is not and cannot be lossless meaning you would get a double lossy color transform?
2025-04-09 01:46:40
also jpegli does chroma subsampling **on purpose** for XYB JPEGs it's not some random accident
2025-04-09 01:46:59
although it's not compatible with JPEG XL sadly :(
jonnyawsom3
username isn't the reason jpegli does XYB as an RGB JPEG because YCbCr in JPEG is not and cannot be lossless meaning you would get a double lossy color transform?
2025-04-09 01:48:00
Ah yeah, I recall something along those lines
2025-04-09 01:49:15
Regardless, swapping the Y and X channels would give a more graceful image degredation than hot pink and green
Demiurge
2025-04-09 02:34:47
Graceful? I hear that it might actually be better for it to be ungraceful, so it's more obvious when something goes wrong.
2025-04-09 02:35:21
Also I thought the channels are already arranged "YXB"
jonnyawsom3
2025-04-09 02:37:59
I tried creating my own YXB JPEG by channel swapping and ICC editing, but it ended up still having a tint, but with the image still recognisable. So it might be worth it
Demiurge
2025-04-09 03:26:53
YBX would probably be kinda close to YCbCr right?
2025-04-09 03:27:34
If the goal was "graceful degradation"
2025-04-09 03:28:03
Which is arguably undesirable if it makes it harder to tell that a problem is present
2025-04-09 03:30:42
Maybe there should be some "tell" like intentionally making it look super bright and washed out without color management
2025-04-09 03:31:34
But that's not as obvious and cool as just making the whole thing look green
2025-04-09 03:31:42
😎
jonnyawsom3
2025-04-09 03:32:23
The B channel caused an overexposed Cr result, but I could have done something wrong
2025-04-09 03:32:45
Just a more graceful tint than eye searing pink and green
Demiurge
2025-04-09 07:16:09
Pink and green are cool tho
Meow
2025-04-10 02:47:51
Upgrading jpeg-xl 0.11.1 -> 0.11.1_1 👀
Demiurge
2025-04-10 08:29:51
Is it normal for effort=10 to give a larger file size for "JPEG lossless transcode" mode?
2025-04-10 08:30:02
Larger than effort=9 I mean
HCrikki
2025-04-10 08:46:56
i recall it performs worse than e9 for reversible jpeg transcode
Demiurge
2025-04-10 09:39:52
🧐 why
A homosapien
2025-04-10 11:00:14
Local MA trees are sometimes better than Global ones
2025-04-10 11:00:16
I think
Melirius
A homosapien Local MA trees are sometimes better than Global ones
2025-04-11 02:35:47
Exactly, as separation along group number is limited to 256 splits in global tree, but not limited in local ones (each group has its own tree). Then this effect is more pronounced in large JPEGs with high variability
jonnyawsom3
2025-04-11 02:46:14
To be a bit more specific, an MA tree can have 255 contexts at most, so using a tree per group drastically increases the possible options
CrushedAsian255
To be a bit more specific, an MA tree can have 255 contexts at most, so using a tree per group drastically increases the possible options
2025-04-11 03:12:04
so if using local trees there can be at most 255 * group_count contexts?
jonnyawsom3
2025-04-11 03:20:29
As I understand it, yes
2025-04-11 03:25:48
I was actually contemplating changing group size dependent on image size and thread count. Using the largest group size that fully utilises the encoding threads So for me, 1080p would be the current default of `-g 1`, 4K would be `-g 2` and 8K would be `-g 3` (based on megapixels/16 threads, compared to pixels per group size) Depends how much of an impact on encode/decode speed it has though. Yet more testing to be done!
2025-04-11 09:36:15
I'm sure I've said it before, but I would've expected a memory reduction between 16 threads and 1 thread, since it's using local MA trees per group in lossless. Accounting for the image buffer being 32f instead of 8int, it's still using twice as much memory on something
Melirius
2025-04-12 03:53:58
Could somebody run CI on mine PR? I think I fixed all the problems, but cannot check it locally, thanks
2025-04-12 03:54:02
https://github.com/libjxl/libjxl/pull/4185
jonnyawsom3
2025-04-13 08:23:38
https://discord.com/channels/794206087879852103/1256302117379903498/1360893139233280092
Demiurge
2025-04-13 08:23:44
I can think of a good group size heuristic.
2025-04-13 08:26:18
For each possible group size, Calculate the area in px^2 of "unused space" and use the largest one?
2025-04-13 08:26:34
The largest one with the least unused space I mean
2025-04-13 08:27:44
Afaik there's no reason for small group sizes to be better than larger group size.
jonnyawsom3
2025-04-13 08:28:56
That's what I was thinking, but ended up just using a minimum "pixels per thread" to make sure every block was full before picking a higher group size. So all threads are always saturated in the first pass, then whatever's left runs after
Demiurge Afaik there's no reason for small group sizes to be better than larger group size.
2025-04-13 08:29:08
Encoding speed, decoding speed, memory and density
Demiurge
2025-04-13 08:29:10
And I don't know how much cost "wasted space" actually has, if any at all
Encoding speed, decoding speed, memory and density
2025-04-13 08:30:09
Well in terms of density I mean, larger group sizes should not have any reason to be at a disadvantage.
jonnyawsom3
Demiurge And I don't know how much cost "wasted space" actually has, if any at all
2025-04-13 08:30:55
From my testing, very little. At most, a memory hit from allocating the extra space, and a small speed penalty from the bigger groups. Hence why I thought I'd keep it simple with the 'minimum saturation'
Demiurge Well in terms of density I mean, larger group sizes should not have any reason to be at a disadvantage.
2025-04-13 08:31:08
Better local MA trees in smaller groups
2025-04-13 08:33:20
This is what I had cooked up, split for legibility here ```C uint64 pixels_per_thread; pixels_per_thread = (xsize * ysize) / num_threads; if (cparams.modular_group_size_shift == -1) { if (cparams.speed_tier <= SpeedTier::kKitten && xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){ frame_header->group_size_shift = 3; } } else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){ frame_header->group_size_shift = 2; } } else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){ frame_header->group_size_shift = 1; } } else { frame_header->group_size_shift = 0; } } else { frame_header->group_size_shift = cparams.modular_group_size_shift; } ```
Demiurge
Better local MA trees in smaller groups
2025-04-13 08:33:49
Oh... that makes sense.
jonnyawsom3
2025-04-13 08:34:13
Checks the image meets the dimensions of the group size, then checks if there's enough pixels at that group size to fill all threads, if not, try the next lower size
Demiurge
2025-04-13 08:35:17
If a bunch of similar dct blocks are all in the same group...
jonnyawsom3
2025-04-13 08:35:28
There are no DCT blocks, this is modular
Tirr
2025-04-13 08:36:21
also vardct has a fixed group size of 256
Demiurge
2025-04-13 08:37:06
Oh
jonnyawsom3
2025-04-13 08:37:46
I went for hardware dependant settings since if you want the old behaviour, all you need to do is add `g 1`, and it's dependant on image resolution too so it would already vary based on input
Demiurge
2025-04-13 08:38:49
I think it makes more sense for it to vary based on image than based on hardware
2025-04-13 08:39:08
But it's not that big of a deal. Especially if it increases speed too
jonnyawsom3
2025-04-13 08:39:11
Gives a encode speed boost, sometimes a big decode speed boost (5x) and at worst a 0.1 bpp increase, or at best a 0.1 bpp decrease so far. I am gonna test it more though
Demiurge
2025-04-13 08:39:36
The density difference I would imagine is too small to matter also
jonnyawsom3
2025-04-13 08:40:34
Yeah, I *was* worrying about it, but you get more image-to-image variance than this ever seems to cause, and bumping up an effort level with the speed increase natually obliterates it
Demiurge
2025-04-13 08:50:51
The best way to improve libjxl right now is to make the source tree more logically organized into folders, making it easier to find and build only exactly what you want/need/expect/specify. While wasting the minimum amount of time figuring out the build system or fetching dependencies you don't even need. If I only need libjxl and not cjxl, I should be able to do that easily without being a cmake genius.
2025-04-13 08:52:43
Hopefully that also will improve programmer productivity in the long term too
jonnyawsom3
This is what I had cooked up, split for legibility here ```C uint64 pixels_per_thread; pixels_per_thread = (xsize * ysize) / num_threads; if (cparams.modular_group_size_shift == -1) { if (cparams.speed_tier <= SpeedTier::kKitten && xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){ frame_header->group_size_shift = 3; } } else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){ frame_header->group_size_shift = 2; } } else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){ frame_header->group_size_shift = 1; } } else { frame_header->group_size_shift = 0; } } else { frame_header->group_size_shift = cparams.modular_group_size_shift; } ```
2025-04-13 08:53:52
Tried it on a few more images, the results are 'it varies' Anything from a 1% size increase, to a 20% reduction. 25% faster encoding, or 5% slower. I'll probably stick it in a PR down the road so I can test it independently and more thoroughly
Demiurge
2025-04-13 08:56:43
Same thing with jpegli too. If I want to build that it should be easy to specify what I want, if I want a libjpeg-compatible static or dynamic library for linking, what version, or if I want a jpegli-specific library with jpegli symbols, and if I want to install header files for libjpeg and/or libjpegli
Tried it on a few more images, the results are 'it varies' Anything from a 1% size increase, to a 20% reduction. 25% faster encoding, or 5% slower. I'll probably stick it in a PR down the road so I can test it independently and more thoroughly
2025-04-13 08:58:35
20% bitrate reduction? For lossy modular?
jonnyawsom3
Demiurge 20% bitrate reduction? For lossy modular?
2025-04-13 09:10:29
Lossless, and 12% because I forgot the .6 thanks to being up all night, but yeah
A homosapien
2025-04-13 09:31:25
All of the improvements jonny and I are working on are purely for lossless. It's somewhat easy to benchmark and gauge improvements since all we have to worry about is encode/decode speeds and density.
Demiurge
2025-04-13 09:33:49
12% is still pretty massive
2025-04-13 09:34:10
Smaller group size = 12% bitrate improvement??
2025-04-13 09:34:29
For progressive lossless only?
A homosapien
2025-04-13 09:53:53
Progressive lossless has some strange behavior, a lot of the settings which benefit regular lossless actually hurt progressive lossless. An example would be small group sizes, on average it's bad for normal lossless but good for progressive lossless. So we had to completely retune the codec for progressive.
jonnyawsom3
Demiurge For progressive lossless only?
2025-04-13 09:54:03
Normal lossless, you can see the command in the image
2025-04-13 09:54:38
The group size and threading is an idea I had while we wrap up the progressive and faster decoding tweaks, but I think it's better suited as a separate PR
2025-04-13 09:56:29
For progressive lossless only, it's around 20% bitrate improvement and 600% encode speed improvement, 20% decode speed improvement Faster Decoding lossless, 75% bitrate improvement and 40% decode speed improvement
A homosapien
2025-04-13 09:59:33
I thought progressive was more like 35 - 40% bitrate improvment?
2025-04-13 09:59:42
Ever since we fixed that RCT bug
jonnyawsom3
2025-04-13 09:59:54
Oh right yeah, with that fixed it's around 35%
2025-04-13 10:00:29
Some images don't like it, some love it. Heuristics are broken so we can't try both easily
A homosapien
2025-04-13 10:02:35
Yeah, some of the heuristics seem actively hurt progressive. So just disabling them and setting a global flag is better in most cases.
jonnyawsom3
2025-04-13 10:08:05
YCoCg *tends* to help more than hurt, so we're enabling it for progressive. Though I have seen a few images get bigger with it instead
A homosapien Progressive lossless has some strange behavior, a lot of the settings which benefit regular lossless actually hurt progressive lossless. An example would be small group sizes, on average it's bad for normal lossless but good for progressive lossless. So we had to completely retune the codec for progressive.
2025-04-13 10:09:40
I only just thought, but it's the per channel (or global?) palette that breaks things. Smaller groups might be allowing it to use local palette still... Yet more testing to be done!
2025-04-13 10:10:40
Okay nevermind, I got them mixed up
A homosapien
2025-04-13 10:10:52
Yeah, we still might be able to eek out a little more bitrate saving with progressive. <:FeelsReadingMan:808827102278451241> <:jxl:1300131149867126814>
Demiurge
2025-04-13 10:11:13
<:JXL:805850130203934781>
jonnyawsom3
2025-04-13 10:11:18
```-X PERCENT, --pre-compact=PERCENT Use global channel palette if the number of sample values is smaller than this percentage of the nominal range. -Y PERCENT, --post-compact=PERCENT Use local (per-group) channel palette if the number of sample values is smaller than this percentage of the nominal range.``` It's `-Y 0` that fixes progressive even on main cjxl
A homosapien Yeah, we still might be able to eek out a little more bitrate saving with progressive. <:FeelsReadingMan:808827102278451241> <:jxl:1300131149867126814>
2025-04-13 10:17:16
Username found this, so we could try adding a few predictors for progressive to try <https://github.com/libjxl/libjxl/blob/c496c521f99c13b8205c4fc4ff3eb3d652a1d1c3/lib/jxl/modular/encoding/enc_ma.cc#L535>
A homosapien
2025-04-13 10:18:47
Right, like a special heuristic predictor just for progressive
2025-04-13 10:19:29
Probably for efforts 8+
jonnyawsom3
2025-04-13 10:22:45
Nah, I think it could run lower. It'll only be maybe 4 predictors instead of the full 12 that P15 does. Maybe even just gradient and none for the decode speed
monad
I went for hardware dependant settings since if you want the old behaviour, all you need to do is add `g 1`, and it's dependant on image resolution too so it would already vary based on input
2025-04-13 09:36:22
not that it matters, but current default selects g2 for small enough images
This is what I had cooked up, split for legibility here ```C uint64 pixels_per_thread; pixels_per_thread = (xsize * ysize) / num_threads; if (cparams.modular_group_size_shift == -1) { if (cparams.speed_tier <= SpeedTier::kKitten && xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){ frame_header->group_size_shift = 3; } } else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){ frame_header->group_size_shift = 2; } } else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){ frame_header->group_size_shift = 1; } } else { frame_header->group_size_shift = 0; } } else { frame_header->group_size_shift = cparams.modular_group_size_shift; } ```
2025-04-13 09:51:20
interesting concept for encoding, but wouldn't larger group sizes negatively affect decode speed on platforms with more threads than the origin? I'm also curious if it's really a net benefit to use the smallest group size in general, I recall it especially slowing down decode in some cases
jonnyawsom3
monad not that it matters, but current default selects g2 for small enough images
2025-04-13 11:27:19
Yeah, images 400 x 400 or less, citing no multithreaded speedup and wasted space in half-full groups. In practice though, there's not much overhead from empty space in groups. Otherwise it's always g1, apart from e11 which overrides any given parameters anyway
monad interesting concept for encoding, but wouldn't larger group sizes negatively affect decode speed on platforms with more threads than the origin? I'm also curious if it's really a net benefit to use the smallest group size in general, I recall it especially slowing down decode in some cases
2025-04-13 11:31:44
From our testing in <#1358733203619319858>, g0 is always faster, to the point we made faster decoding level 2 and up force g0. And with the overhauled decoding levels, I'm hoping people will be more likely to use them if they want to make sure decoding is fast. The odds are highly unlikely the image exactly matches group dimensions, so there'll be extra groups that more threads can still take advantage of too. Surprisingly, g0 even helps when threads are already saturated at higher levels. Best guess is the shallower MA trees allow quicker traversal per group
2025-04-13 11:34:17
As always with image compression, results vary based on image content, but it seems to be a slight density improvement while making sure the threads you specify are actually being used. I think it's because the higher resolution the image, the more likely it is to have slow gradients instead of sharp edges, making larger groups more effective
itszn
2025-04-14 12:03:50
not quite sure what channel to post this, but I have something kinda cool to show off :) I write security puzzles for hacking competitions (CTFs) and this year modified libjxl to include some extra predictor opcodes which had vulnerabilities. Teams had to craft a jxl image which could exploit these vulnerabilities and get code execution when the image is rendered to png. https://github.com/Nautilus-Institute/quals-2025/tree/main/jxl4fun Attached is what my final exploit image looked like, all of the grey parts are memory address leaks propagating through the predictor operators. It calculates ASLR (Address Space Layout Randomization, an exploit mitigation) bypass offsets using various operators. Finally the red pixels are from a new operator I added which had a out-of-bounds vulnerability. This is where the actual exploit triggers :) Anyway I had a lot of fun learning libjxl internals so that I could modify it in this way for the puzzle. I hope some of you can appreciate the exploit and the puzzle
jonnyawsom3
2025-04-14 12:15:06
Ooh, intruiging
itszn not quite sure what channel to post this, but I have something kinda cool to show off :) I write security puzzles for hacking competitions (CTFs) and this year modified libjxl to include some extra predictor opcodes which had vulnerabilities. Teams had to craft a jxl image which could exploit these vulnerabilities and get code execution when the image is rendered to png. https://github.com/Nautilus-Institute/quals-2025/tree/main/jxl4fun Attached is what my final exploit image looked like, all of the grey parts are memory address leaks propagating through the predictor operators. It calculates ASLR (Address Space Layout Randomization, an exploit mitigation) bypass offsets using various operators. Finally the red pixels are from a new operator I added which had a out-of-bounds vulnerability. This is where the actual exploit triggers :) Anyway I had a lot of fun learning libjxl internals so that I could modify it in this way for the puzzle. I hope some of you can appreciate the exploit and the puzzle
2025-04-14 12:19:25
This may interest you too https://github.com/google/google-ctf/tree/main/2023/quals/rev-jxl/solution
itszn
This may interest you too https://github.com/google/google-ctf/tree/main/2023/quals/rev-jxl/solution
2025-04-14 12:21:45
Yup, I've seen that one thanks for sharing :) Cool that you know about it. For mine I wanted to take it all the way to code exec. I was inspired by this exploit: https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html Which does similar style things in the JBIG2 image library to calculate offsets
jonnyawsom3
2025-04-14 12:26:48
Ahhh, my favourite. I remember discovering that a while ago and thinking "They built a microprocessor inside a PDF? Why didn't I hear about this sooner!"
monad
From our testing in <#1358733203619319858>, g0 is always faster, to the point we made faster decoding level 2 and up force g0. And with the overhauled decoding levels, I'm hoping people will be more likely to use them if they want to make sure decoding is fast. The odds are highly unlikely the image exactly matches group dimensions, so there'll be extra groups that more threads can still take advantage of too. Surprisingly, g0 even helps when threads are already saturated at higher levels. Best guess is the shallower MA trees allow quicker traversal per group
2025-04-14 09:08:05
Maybe it's a high effort implication since that's mostly where I've permuted settings.
jonnyawsom3
monad Maybe it's a high effort implication since that's mostly where I've permuted settings.
2025-04-14 09:24:59
I was considering limiting g3 to effort 9+ or when chunked is disabled, due to the 4x memory increase compared to current
monad
2025-04-14 11:20:35
I will try the suggestion posted. I tried something similar before, but quickly discarded it due to decode. Btw, at a glance it seems the "pixels_per_thread > min_target_group_pixels" enforcement ensures threads cannot be minimized when images cleanly tile with full groups. Intended?
jonnyawsom3
2025-04-14 11:45:24
Ah, good catch, I'll edit that now. If decode speed is still an issue, I recommend trying `--faster_decoding` in our fork <https://github.com/jonnyawsom3/libjxl/tree/FastSqueezeFixes> It has a much cleaner scale of Density/Speed, with improvements exceeding main `--faster_decoding 4`
Quackdoc
2025-04-14 12:12:39
can't wait to test it in olive
Tirr
2025-04-14 12:15:32
in my testing fd4 got significantly faster with reasonable density tradeoff
jonnyawsom3
2025-04-14 12:36:55
25% faster and 25% smaller as a rule of thumb https://discord.com/channels/794206087879852103/1358733203619319858/1358735338817720330
monad
This is what I had cooked up, split for legibility here ```C uint64 pixels_per_thread; pixels_per_thread = (xsize * ysize) / num_threads; if (cparams.modular_group_size_shift == -1) { if (cparams.speed_tier <= SpeedTier::kKitten && xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){ frame_header->group_size_shift = 3; } } else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){ frame_header->group_size_shift = 2; } } else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){ frame_header->group_size_shift = 1; } } else { frame_header->group_size_shift = 0; } } else { frame_header->group_size_shift = cparams.modular_group_size_shift; } ```
2025-04-14 05:18:25
```images per bucket g3 g2 g1 g0 20t 1 31 261 529 1t 299 306 146 71 bpp 20t dec MP/s 1t dec MP/s git head e7 4.59 66.575 10.559 modified 20t e7 4.57 81.226 10.366 modified 1t e7 4.56 40.009 9.389 git head e8 4.47 72.912 10.044 modified 20t e8 4.46 78.802 9.717 modified 1t e8 4.45 19.530 8.164```
jonnyawsom3
2025-04-14 08:28:51
Intriguing
Demiurge
2025-04-17 09:26:29
If someone fixes the color ringing/desaturation issue, and the overzealous crushing of shadows, then libjxl will leap 2 generations ahead and surpass libaom...
CrushedAsian255
Demiurge If someone fixes the color ringing/desaturation issue, and the overzealous crushing of shadows, then libjxl will leap 2 generations ahead and surpass libaom...
2025-04-18 02:25:02
isn't it mainly just a tuning issue?
Demiurge
2025-04-18 03:38:34
You could call it that
jonnyawsom3
2025-04-18 12:30:21
Is there anything blocking turning the jpegli folder into a submodule of the Google repo? We've noticed some commits not being mirrored and it would remove any confusion or ambiguity
Demiurge
2025-04-18 12:47:00
So would deleting the separate repo 😂
jonnyawsom3
2025-04-18 12:56:50
If we use it as a submodule, then we can treat it as an actual library instead of a growth clinging onto libjxl. Ideally also using it instead of libjpeg
Demiurge
2025-04-18 03:32:06
Whether it's an actual library or not doesn't depend on it being in a separate repo. It depends on the build system making it easy to build static/dynamic libraries and whether it comes with header files to make it easier to use as a library.
2025-04-18 03:33:59
It uses a lot of code and files from libjxl. But libjxl needs to be easier to build as a library only, without setting up a bunch of dependencies that are only used for cjxl/djxl
2025-04-18 03:36:29
libjpegli builds itself as a single libjpeg compatible library but doesn't let you build different API versions of the library at the same time, and it doesn't come with any header files.
2025-04-18 03:37:27
Those seem like far more useful things to fix than separating redundant copies of the code into confusingly diverging repos
2025-04-18 03:40:03
It's cool that they share lots of code and that improvements and adoption of one helps both...
Torn
2025-04-19 09:12:34
Might be a bit much to ask, but is there an example of initializing an ImageBundle with my own data? (If this is the wrong channel, point me to the right one.)
spider-mario
2025-04-19 09:38:54
isn’t ImageBundle the internal API?
Torn
2025-04-19 09:39:30
It's a jxl class, yeah?
spider-mario
2025-04-19 10:07:13
one that is not meant to be exposed outside of libjxl
Torn
2025-04-19 10:11:15
I'm just trying to make a version of ssimulacra2 that accepts images as piped data, instead of command line arguments that are file names. It seems to be quite attached to jxl and operates on ImageBundles, as far as I can tell.
CrushedAsian255
spider-mario one that is not meant to be exposed outside of libjxl
2025-04-20 05:38:52
oops forgot to sign the ImageBundle NDA
monad
2025-04-22 08:28:38
guys, we can finally transcode our JPEGs https://github.com/libjxl/libjxl/pull/2704
jonnyawsom3
2025-04-22 08:33:51
> the fraction of JPEGs with empty DHT markers found in the wild seems to have grown recently and it is now a substantial amount
2025-04-22 08:34:01
Interesting, I wonder what changed
monad
2025-04-22 08:55:25
I think it's just change in visibility given sufficient time
_wb_
2025-04-22 10:26:14
I think WhatsApp for some reason started to produce jpegs with empty dht markers
Melirius
2025-04-22 02:06:30
OK, I've tried several approaches for DCT coefficient order determination other than simple histogram (taking into account two-coef correlations growing from beginning and end of coefficients, remaking histogram for zero-runs after each coef selection, etc.), all of them are much slower and produce on my JPEG test suite at best (2-3)*10^-5 relative size improvement, so I think to stop here and try other improvements
2025-04-22 02:08:25
Maybe Glacier mode can benefit from the best of them, otherwise it is pointless
_wb_
Melirius OK, I've tried several approaches for DCT coefficient order determination other than simple histogram (taking into account two-coef correlations growing from beginning and end of coefficients, remaking histogram for zero-runs after each coef selection, etc.), all of them are much slower and produce on my JPEG test suite at best (2-3)*10^-5 relative size improvement, so I think to stop here and try other improvements
2025-04-23 08:39:26
Thanks for trying, it's of course more satisfying if there's an improvement but it's also good to know when the simpler thing is as good as it gets.
MSLP
2025-04-24 05:43:15
Apart from everything, it's great that jxl-rs is getting more love recently!
Traneptora
Torn I'm just trying to make a version of ssimulacra2 that accepts images as piped data, instead of command line arguments that are file names. It seems to be quite attached to jxl and operates on ImageBundles, as far as I can tell.
2025-04-25 07:33:51
Can you pass "-" as the filename? this works with djxl
Torn
Traneptora Can you pass "-" as the filename? this works with djxl
2025-04-25 08:41:06
No. I already rewrote the main method. It had no logic previously to get the data in any other manner than reading it itself. I got it to run with my own data, but either I put it at slightly wrong addresses or in the wrong format, so I'll have to continue debugging it on the weekend.
Traneptora
Torn No. I already rewrote the main method. It had no logic previously to get the data in any other manner than reading it itself. I got it to run with my own data, but either I put it at slightly wrong addresses or in the wrong format, so I'll have to continue debugging it on the weekend.
2025-04-25 09:34:13
possibly relevant is there's a separate ssimulacra2 repo
2025-04-25 09:34:23
https://github.com/cloudinary/ssimulacra2
2025-04-25 09:34:30
dunno how much is actually just stuff pulled from upstream libjxl
Torn
2025-04-25 09:42:49
Pretty much all of it.
_wb_
2025-04-25 09:48:47
It's identical
jonnyawsom3
2025-04-27 09:38:12
Assuming I didn't break anything, this should help a few longstanding issues https://github.com/google/jpegli/pull/130
spider-mario
2025-04-27 09:41:31
how much benefit is gained from the less compatible blue subsampling with XYB?
username
spider-mario how much benefit is gained from the less compatible blue subsampling with XYB?
2025-04-27 09:45:35
this I guess? https://discord.com/channels/794206087879852103/803645746661425173/1362628965860380823 (not-specified = blue subsampling)
jonnyawsom3
spider-mario how much benefit is gained from the less compatible blue subsampling with XYB?
2025-04-27 09:46:20
Subsampling of RGB JPEG at all is a bit exotic, but generally not a whole lot... The channel is already heavily quantized, making the subsampling have less impact. In the PR I made 444 default for compatibility, and merely fixed the `--chroma_subsampling` parameter so that it doesn't subsample the Y channel for XYB JPEGs, if people choose to use subsampling anyway for the extra few percent (and sacraficing the 20% from JXL...)
2025-04-27 09:47:50
We did try subsampling X too, since if we're risking compatibility anyway, we might as well make the most of it. It ended up hurting image quality far too much though, so we stuck with only B subsampling instead
spider-mario
username this I guess? https://discord.com/channels/794206087879852103/803645746661425173/1362628965860380823 (not-specified = blue subsampling)
2025-04-27 09:53:16
oh, yes, that must be it, thanks (I assume those bits are not really “per pixel”, though)
jonnyawsom3
Assuming I didn't break anything, this should help a few longstanding issues https://github.com/google/jpegli/pull/130
2025-04-27 10:00:00
We don't know why, but we also discovered that Vertical and Horizontal subsampling are swapped between YCbCr and XYB, so we had to reverse the values to get the expected results...
CrushedAsian255
2025-04-27 10:31:08
What happens is you subsample luma and keep chroma at 1:1
jonnyawsom3
2025-04-27 10:40:33
You get a blurry image
Demiurge
2025-04-27 02:32:21
Honestly it should refuse to use subsampling for RGB JPEG...
2025-04-27 02:33:08
Also what about the jpegli files in the libjxl source, are they just diverging into two separate redundant branches and files now?
2025-04-27 03:07:46
Good start though 👀
2025-04-27 03:50:48
I think usually you would put those in 3 separate pull requests though
2025-04-27 03:51:51
Also does it actually write the correct adobe tag or does it write a ycck tag?
username
Demiurge Also does it actually write the correct adobe tag or does it write a ycck tag?
2025-04-27 08:57:53
seems like it should write the correct value for the APP14 marker: https://github.com/google/jpegli/blob/bc19ca2393f79bfe0a4a9518f77e4ad33ce1ab7a/lib/jpegli/bitstream.cc#L58
jonnyawsom3
Demiurge Honestly it should refuse to use subsampling for RGB JPEG...
2025-04-27 08:58:17
Probably... Maybe I could have it print a warning about compatibility issues. Can't just make it a submodule because it has duplicate JXL code and the folder structure is a level too high. Probably should be separate PRs. Only the second 'real' PR I've actually made along with <#1358733203619319858>, so still learning best practices. It was mostly one line changes so didn't seem worth splitting to me. Correct tags. Default value of APP14 is 0, which is RGB and CMYK JPEG. YCbCr gets assigned 1 (Not that it's used anyway) and YCCK 2
2025-04-27 08:59:01
2025-04-27 09:00:18
<@245794734788837387> brought this up, sounds like a good idea. Using RGB at Quality 100/Distance 0 with a warning when subsampling is enabled
username
2025-04-27 09:24:30
it would also kinda line up with libjxl does things since it also has special behavior for distance 0
2025-04-27 09:26:09
oh and if there is a warning added about JXL-transcoding for XYB and quality 100 it should probably be present around the API and CLI option(s) for subsampling
2025-04-27 09:32:40
maybe something like "**WARNING:** Using Values other then 444 in conjunction with either XYB or Quality 100/Distance 0 will result in files that cannot be losslessly transcoded to JPEG XL!"
jonnyawsom3
2025-04-27 09:43:37
As previously said, subsampled RGB isn't common anyway, so it could just be `Note: Implicit-default for Quality 100/Distance 0 is RGB JPEG` `Warning: Subsampled RGB JPEG may cause compatibility issues`
2025-04-27 09:48:08
Also <@207980494892040194>, I'm gonna need a hand to fix... Everything xD
A homosapien
2025-04-27 09:48:58
https://tenor.com/view/meme-anime-gif-22734101
jonnyawsom3
A homosapien https://tenor.com/view/meme-anime-gif-22734101
2025-04-27 10:37:27
Think I fixed it, somehow a missing } wasn't breaking the windows builds, so we never caught it before
2025-04-27 10:38:38
Also made it only subsample *YCbCr* at low quality, since if we do want to expose RGB in cjpegli, we don't want that subsampled at all Should probably make RGB a parameter so people can disable it if they only want q100 YCbCr
2025-04-27 10:39:19
Though, I wonder how well subsampling everything and effectively halving the resolution but at higher quality would look....
A homosapien
Think I fixed it, somehow a missing } wasn't breaking the windows builds, so we never caught it before
2025-04-27 10:55:47
I also fixed some incorrect chroma values for 440 and renamed `distance` to `qDistance` since it was already defined earlier.
jonnyawsom3
2025-04-28 12:36:36
Added APP14 to CMYK and fixed the issues... Hopefully. Tests soon™
runr855
2025-04-28 12:46:36
Is there a reason for why the jxl-x86-windows-static.zip 0.11.1 release of libjxl has 19/64 detections as a trojan? That seems very high for a false positive
2025-04-28 12:46:55
On Virustotal
2025-04-28 12:47:20
And it has been detected as a trojan for months
Demiurge
<@245794734788837387> brought this up, sounds like a good idea. Using RGB at Quality 100/Distance 0 with a warning when subsampling is enabled
2025-04-28 01:12:33
Doesn't sound particularly appealing to me...
jonnyawsom3
Demiurge Doesn't sound particularly appealing to me...
2025-04-28 01:13:13
Scores 2 points higher in SSIMULACRA, so it's something :P
Demiurge
2025-04-28 01:15:15
Higher than default ycbcr?
Scores 2 points higher in SSIMULACRA, so it's something :P
2025-04-28 01:16:45
That's still kinda silly
2025-04-28 01:17:17
RGB JPEG is kind of uncommon
jonnyawsom3
Demiurge Higher than default ycbcr?
2025-04-28 01:23:45
`cjpeg -quality 100` vs `cjpeg -quality 100 -rgb`
Demiurge RGB JPEG is kind of uncommon
2025-04-28 01:25:30
And quality 100 generally shouldn't be used. It'd be a default, but I'd probably add the same `-rgb` flag to cjpegli, with 0 disabling the switch and 1 forcing RGB on... If it's feasible. Along with the printout about being RGB
Demiurge
2025-04-28 01:26:49
RGB JPEG shouldn't be used because it's not efficient and the quant tables are not even tuned for that.
jonnyawsom3
2025-04-28 01:27:16
There is no quant table at q 100, it's all 1...
Demiurge
2025-04-28 01:27:18
But for quality 100 that's a different story since that doesn't apply
2025-04-28 01:27:24
Yeah
2025-04-28 01:27:53
Still uncommon
jonnyawsom3
2025-04-28 01:28:31
Didn't stop XYB, and again, it'll only be a default for cjpegli. With a message about how to disable it, similar to JPEG transcoding with cjxl
2025-04-28 01:30:28
But it'll probably be a sperate PR. This one was meant to fix the biggest issues. APP14, weird XYB defaults and broken subsampling, with a tweak to enable 420 at q 30
2025-04-28 01:37:33
Might address https://discord.com/channels/794206087879852103/1301682361502531594 at some point too, but we need to do wider testing around what quality threshold to disable it
username
`cjpeg -quality 100` vs `cjpeg -quality 100 -rgb`
2025-04-28 02:12:59
do a Xor compare of both of them against the lossless source, there are a lot less colors shifted around for RGB
Demiurge RGB JPEG shouldn't be used because it's not efficient and the quant tables are not even tuned for that.
2025-04-28 02:14:34
if someone is specifying "quality 100" then I don't think they care about size
2025-04-28 02:17:15
I would presume the amount of people who care about size when defining the maximum available quality value are wayyy less then the people defining it because they expressly don't care about size and want a reference exchange file
Demiurge
2025-04-28 03:09:21
What difference does it actually make though?
2025-04-28 03:09:47
Aside from compatibility possibly.
2025-04-28 03:11:00
Possibly larger file size for no actual increase in fidelity?
jonnyawsom3
2025-04-28 03:17:40
We *just* said that it scores 2 points higher...
username
Demiurge What difference does it actually make though?
2025-04-28 03:19:44
I have seen on multiple occasions both people and companies use "quality 100" JPEGs as original references or as an intermediate format or "master copy", such as for example an artist exporting a ref sheet with defined color areas you are supposed to use a color picker on **OR** a company serving and processing millions or more images a day.
Demiurge
We *just* said that it scores 2 points higher...
2025-04-28 03:21:27
That doesn't demonstrate anything though frankly.
2025-04-28 03:23:06
Comparisons with color gradients would be an excellent real demonstration though
username I have seen on multiple occasions both people and companies use "quality 100" JPEGs as original references or as an intermediate format or "master copy", such as for example an artist exporting a ref sheet with defined color areas you are supposed to use a color picker on **OR** a company serving and processing millions or more images a day.
2025-04-28 03:23:39
What are those white sqares and why do the originals have such bad banding?
jonnyawsom3
Demiurge That doesn't demonstrate anything though frankly.
2025-04-28 03:24:42
YCbCr and RGB at q100 compared to the original with XOR
username
Demiurge What are those white sqares and why do the originals have such bad banding?
2025-04-28 03:26:44
https://cloudinary.com/blog/why_jpeg_is_like_a_photocopier#why_does_this_happen_
Demiurge
YCbCr and RGB at q100 compared to the original with XOR
2025-04-28 03:26:44
This is a cool comparison too, but not as convincing as just showing some color gradients, side by side.
username
2025-04-28 03:26:55
~~trolling arc~~
jonnyawsom3
2025-04-28 03:27:16
At least they haven't brought up noise again
username
2025-04-28 03:29:08
maybe they are genuinely worried about the compatibility concern although In my testing RGB JPEGs seem to work just fine in most software
Demiurge
2025-04-28 03:29:11
I'm sincerely asking questions and sincerely wondering how much of a difference it makes. It's unfortunate you assume I have bad intentions.
username
2025-04-28 03:30:11
with the context of your messages being in relation to a compatibility concern with software they make more sense
2025-04-28 03:30:48
otherwise they seem like you are either ignoring or don't understand what is being presented to you and why
Demiurge
2025-04-28 03:30:53
The xor comparison for example is a cool visualization but a better visualization would be a worst-case image like RGB color gradients and comparing the difference side by side.
jonnyawsom3
Demiurge This is a cool comparison too, but not as convincing as just showing some color gradients, side by side.
2025-04-28 03:32:55
Original, YCbCr, RGB
Demiurge
2025-04-28 03:33:41
Nice! See? That very effectively demonstrates that it makes a real and positive difference.
2025-04-28 03:34:01
That's all I was asking.
jonnyawsom3
2025-04-28 03:34:06
Gradients were actually a good shout, my fairly noisy test image was masking most of it
Demiurge
2025-04-28 03:34:18
Exactly.
2025-04-28 03:34:38
I'm happy now. You demonstrated exactly what I was asking about.
2025-04-28 03:35:16
It's not trolling to ask a sincere and fair question...
2025-04-28 03:36:24
I genuinely didn't know if the color transformation would actually make a difference in practice.
jonnyawsom3
2025-04-28 03:37:23
It's nearly 5am and I *really* didn't wanna go into Krita to try and make a comparison image... Then I realised I had that 10-bit test image and could just let Discord do the comparing for me
username
Demiurge It's not trolling to ask a sincere and fair question...
2025-04-28 03:42:17
I guess my reason for confusion was I couldn't gauge exactly *why* you kept seemingly fighting against a change that when presented as improving color sampling accuracy in a case where people treat something as a giant reference image. psychovisual tuning vs mathematical similarity or something idk
jonnyawsom3
2025-04-28 03:42:43
Bonus YCbCr vs XYB
username
2025-04-28 03:43:30
XYB still has that issue for me where stuff becomes darker
jonnyawsom3
username XYB still has that issue for me where stuff becomes darker
2025-04-28 03:47:27
Importing to Krita the image is darker, but strangely converting the layer from XYB to sRGB fixes it, as if it's using the wrong transfer or something for rendering. In Irfanview it has a pink tint...
Demiurge
username I guess my reason for confusion was I couldn't gauge exactly *why* you kept seemingly fighting against a change that when presented as improving color sampling accuracy in a case where people treat something as a giant reference image. psychovisual tuning vs mathematical similarity or something idk
2025-04-28 03:47:48
I was skeptical of what the actual difference was or whether it could actually be demonstrated.
2025-04-28 03:48:07
Or if it was just an assumption
jonnyawsom3
2025-04-28 03:48:38
If I'm honest I was skeptical it would show anything in a gradient, but then it clicked that an RGB gradient would be best *in* RGB, naturally
Demiurge
If I'm honest I was skeptical it would show anything in a gradient, but then it clicked that an RGB gradient would be best *in* RGB, naturally
2025-04-28 03:49:13
Yep, it's basically a worst case scenario and the best contrived example to demonstrate the difference
2025-04-28 03:50:15
But it's still real enough to matter
2025-04-28 03:51:06
RGB looks just like the original whereas xyb and ycbcr are uneven steps
jonnyawsom3
2025-04-28 03:52:17
....it's the god damn gamma again
Demiurge
2025-04-28 03:52:41
To be fair, the uneven steps are because of rounding errors that can theoretically be fixed in the decoder/cms, but that's a whole other can of worms. And you're not going to fix everyone's broken software.
jonnyawsom3
2025-04-28 03:53:42
Old XYB, New XYB (Stripped gAMA from the PNG)
Demiurge
2025-04-28 03:54:44
The gAMA tag was messing up the png xyb?
jonnyawsom3
2025-04-28 03:55:35
cjpegli uses the guts of libjxl, so it correctly handles gamma in PNGs... Everything else, doesn't. So the XYB looks wrong in comparison
runr855 Is there a reason for why the jxl-x86-windows-static.zip 0.11.1 release of libjxl has 19/64 detections as a trojan? That seems very high for a false positive
2025-04-28 05:25:51
Interesting, seems jpegli is the main culprit, not sure why though
Demiurge
2025-04-28 06:10:31
Lots of virus engines classify very broad categories as "trojan" like for example anything with the curl dll
2025-04-28 06:10:49
what do the virus engines call the supposed trojan?
runr855
2025-04-28 12:24:04
I believe it would be worth investigating. Widows Defender reacts to it, so no Windows users can use it without Defender intervening
2025-04-28 12:24:21
There is also the risk of supply chain attacks, which I don't think should be forgotten completely
novomesk
runr855 I believe it would be worth investigating. Widows Defender reacts to it, so no Windows users can use it without Defender intervening
2025-04-28 04:16:42
https://www.virustotal.com/gui/file/aa950f4d37abc1e52a5dbca153479b7cba0303e35331deb7d5ee5b18adf7a23b It is necessary to contact those AV companies and to report the case as False Positive. I recommend to start with Avast/AVG - same company, same detection. BitDefender's engine is used by more different products - so resolving it there has big impact.
_wb_
2025-04-29 12:45:42
If someone feels like it, feel free to check https://app.codecov.io/gh/libjxl/libjxl?search=&displayType=list and try to figure out what's up with those ~15% lines of code currently not covered by tests. It could be various things: - missing tests that should actually be there - various rather trivial error conditions (e.g. invalid api usage) that we didn't bother to add tests for (though maybe we should?) - dead code that can be removed - dead code because of a bug
jonnyawsom3
2025-04-29 01:07:47
Seems to be errors or untested encode parameters like keeping invisible pixels
A homosapien
Seems to be errors or untested encode parameters like keeping invisible pixels
2025-04-29 01:52:59
Do you think that could explain the excessive ram usage? I remember you said the math wasn't adding up.
Melirius
_wb_ If someone feels like it, feel free to check https://app.codecov.io/gh/libjxl/libjxl?search=&displayType=list and try to figure out what's up with those ~15% lines of code currently not covered by tests. It could be various things: - missing tests that should actually be there - various rather trivial error conditions (e.g. invalid api usage) that we didn't bother to add tests for (though maybe we should?) - dead code that can be removed - dead code because of a bug
2025-04-29 02:04:05
Will try to check
jonnyawsom3
A homosapien Do you think that could explain the excessive ram usage? I remember you said the math wasn't adding up.
2025-04-29 02:06:54
You mean for progressive lossless? Because that was something else, I just mean the error conditions aren't being tested in the coverage
pshufb
2025-04-29 04:57:47
https://web.ist.utl.pt/nuno.lopes/pubs/ub-pldi25.pdf
2025-04-29 04:58:04
came across this in a paper and thought it may be relevant to devs here. I am slightly skeptical that there’s performance on the table here / that this will replicate, and it’s perhaps best dealt with by the LLVM developers, but <:shrugm:322486234142212107>
jonnyawsom3
2025-04-29 05:06:44
Clang alone is a 20% performance increase, 130% for fast lossless. It gets built as part of the tests on Github but discarded, with MSVC being uploaded to releases instead
pshufb
Clang alone is a 20% performance increase, 130% for fast lossless. It gets built as part of the tests on Github but discarded, with MSVC being uploaded to releases instead
2025-04-29 07:33:26
My message is less about the speedup from compiler choice, and more about how a loop in clang builds of libjxl may be responsible for a lot of lost, but potentially easily recovered, performance.
jonnyawsom3
2025-04-29 07:34:35
Oh I didn't even see they used Clang, I was just saying much more than 7% is on the table
pshufb
2025-04-29 07:36:35
Unfortunately the paper doesn’t provide a whole lot of detail, and the regression is _probably_ a weird quirk of Sandy Bridge. (Which is weird since the Ivy Bridge cores aren’t much different from Sandy Bridge.) It’s a shame they don’t test on a modern architecture.
Oh I didn't even see they used Clang, I was just saying much more than 7% is on the table
2025-04-29 07:36:40
Fair!
jonnyawsom3
2025-04-29 07:37:59
> jpegxl-1.5.0 Not sure where they found that version number...
2025-04-29 07:54:34
Ahh, it's a benchmarking suite version, not a library version https://openbenchmarking.org/test/pts/jpegxl
2025-04-29 07:55:12
So they ran the tests using 0.7 too... Not exactly representative for multithreading either then
A homosapien
You mean for progressive lossless? Because that was something else, I just mean the error conditions aren't being tested in the coverage
2025-04-30 01:21:35
No that, more like how ram usage is double the size of the image as a raw bitmap after accounting for 32 bit float, ram was still 2x higher than it should be.
2025-04-30 01:25:50
Or maybe I'm misremembering
jonnyawsom3
A homosapien No that, more like how ram usage is double the size of the image as a raw bitmap after accounting for 32 bit float, ram was still 2x higher than it should be.
2025-04-30 03:06:36
Oh, no. The code coverage is just code that doesn't run in the tests. So corrupted files or misconfigured settings
_wb_
2025-04-30 12:50:18
Finally this is all-green again
pshufb
So they ran the tests using 0.7 too... Not exactly representative for multithreading either then
2025-04-30 02:36:33
Great catch! Thanks for digging into it.
Demiurge
2025-04-30 09:07:56
This is your regularly scheduled reminder that <:JXL:805850130203934781> is awesome and cool.
jonnyawsom3
2025-05-02 04:51:24
Looking though some old PRs that never made it. I wasn't expecting the Game of Life as a heuristic
A homosapien
2025-05-02 04:53:55
https://tenor.com/view/game-of-life-glider-grid-pixels-repeat-gif-27605519
jonnyawsom3
2025-05-02 07:49:06
Ended up doing more than we expected, but I think it's ready now https://github.com/google/jpegli/pull/130
2025-05-02 07:49:16
We wanted to have cjpegli display the new defaults when they're triggered, but couldn't get it working. We also wanted to disable XYB when the RGB at distance 0 is triggered, since the color transform causes artifacts similar to YCbCr
2025-05-02 07:50:27
Should give better results by default now though, with multiple bugs/strange behaviours fixed and the `-d 0`/`-q 100` RGB mode improving quality by a few points more
veluca
2025-05-02 09:46:30
2025-05-02 09:46:40
first (?) jxl-rs decoded image 🙂
jonnyawsom3
2025-05-02 10:44:03
And a 40MP image no less, so much for starting small xD
Meow
2025-05-03 05:54:53
Curious about its performance
veluca
Meow Curious about its performance
2025-05-03 06:09:59
Slow, but not even *too* slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)
CrushedAsian255
veluca Slow, but not even *too* slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)
2025-05-03 06:10:44
Was it a VarDCT or Modular image?
veluca
2025-05-03 06:10:53
Modular
CrushedAsian255
veluca Modular
2025-05-03 06:11:19
Simple modular or with Squeeze/RCT/Delta?
veluca
2025-05-03 06:11:38
RCT, but no squeeze or other fun stuff
jonnyawsom3
veluca Slow, but not even *too* slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)
2025-05-03 06:11:44
Is that 5x slower both singlethreaded? (Prepare yourself for the barrage of questions xD)
veluca
Is that 5x slower both singlethreaded? (Prepare yourself for the barrage of questions xD)
2025-05-03 06:12:23
yup
Tirr
2025-05-03 06:34:32
jxl-rs is currently single thread only and doesn't have any handwritten SIMD routines
2025-05-03 06:35:17
just focusing on working implementation
jonnyawsom3
2025-05-03 06:45:08
I was moreso checking if libjxl was set to singlethreaded, but yeah. Glad we've hit this milestone and I'm sure more aren't far off
Meow
2025-05-03 11:39:31
Reaching the usable status is already a milestone
veluca
2025-05-03 12:01:36
not there yet 😛
Tirr
2025-05-04 10:37:08
it seems that libjxl is creating VarDCT image that its LF quant values exceed signed 16-bit range, but isn't marked as `modular_16bit_buffers = false` https://github.com/tirr-c/jxl-oxide/issues/456
2025-05-04 10:37:20
jxl-oxide decodes the image successfully when I turn off 16-bit buffer optimization
2025-05-04 10:39:33
(the problematic sample is at `c=0 y=25 x=39` in LF image which has value of `32894`)
_wb_
2025-05-04 10:43:18
We had something similar in libjxl-tiny in the implementation of a hw encoder. I suppose we should be more accurate in the range of quant factors to ensure the quantized lf stays within Level 5 constraints.
jonnyawsom3
2025-05-04 12:58:07
Realised the changelog had been neglected, so thought I'd try and catch it up https://github.com/libjxl/libjxl/pull/4224
2025-05-04 12:59:41
I'll have to make a mental note of adding to the changelog as part of my future PRs, if applicable, rather than trying to recall what's new since the last release
RaveSteel
2025-05-05 01:53:14
Is there an ETA or any milestone that needs to be met before 0.12 releases?
jonnyawsom3
2025-05-05 01:57:10
AFAIK no set goals/dates from the core devs, but I was hoping to get all the jpegli tweaks merged and copied over before the next release, since libjxl is still where most get it from https://github.com/google/jpegli/pull/130
_wb_
2025-05-05 08:30:18
Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228 (and at faster_decode=2, some slight decode speedup, at the cost of some density) Feel free to try it out on your favorite image/corpus.
2025-05-05 08:41:54
In general there is probably still some substantial room to improve MA tree learning heuristics. In particular we should implement some post-clustering tree pruning that (recursively) removes splits that go to two leaf nodes with identical predictor and context after clustering (and identical multiplier/offset). Such splits only cause some encode/decode slowdown (since the tree is unnecessarily deep) and some signaling overhead, without giving any compression benefit, so pruning them can only improve things — it just seems a bit tricky to do the code plumbing to do this pruning. <@179701849576833024> or <@1346460706345848868> do you want to give it a shot?
veluca
2025-05-05 08:42:52
I think I should dedicate my jxl time to jxl-rs 😄 also I remember trying that out and it not being helpful, but I might misremember
Mine18
2025-05-05 08:44:13
~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~
veluca
2025-05-05 08:49:30
I still feel like we should try non-greedy tree splitting, but who has the time...
_wb_
Mine18 ~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~
2025-05-05 08:49:48
this is for lossless, where only speed and density matter. Lossy is a trickier thing
veluca I think I should dedicate my jxl time to jxl-rs 😄 also I remember trying that out and it not being helpful, but I might misremember
2025-05-05 08:51:25
yeah, jxl-rs is more important than slight encoder improvements
veluca
2025-05-05 08:51:25
As in, for each property do a DP to figure out the best way to split *more than 2-way* along that property, then repeat recursively in each subtree
veluca As in, for each property do a DP to figure out the best way to split *more than 2-way* along that property, then repeat recursively in each subtree
2025-05-05 08:52:18
The decoder could also optimize for things generated that way (especially if we limit this to using, say, two properties at most), and I imagine this would be massively faster to decode too
2025-05-05 08:53:13
(two properties makes this effectively be 3 lookups in a lookup table)
_wb_
2025-05-05 08:59:53
Why 3 lookups?
veluca
2025-05-05 09:00:21
2 1d lookups to reduce the range of the properties, and 1 2d lookup for the leaf
_wb_
2025-05-05 09:00:35
ah right
2025-05-05 09:01:47
if it's limited to _n_ properties you can do it with _n_ 1D lookups followed by 1 _n_ D lookups, right?
veluca
2025-05-05 09:02:05
yup
2025-05-05 09:02:34
I imagine as soon as n starts being more than 3 or 4 the n-D lookup becomes unpractical
2025-05-05 09:02:57
(depending on the # of distinct values)
_wb_
2025-05-05 09:03:26
where the size of the _n_ D lookup table is equal to the product of the number of nodes per property (+1)
veluca
2025-05-05 09:03:34
yup
2025-05-05 09:03:41
well, number of distinct nodes
2025-05-05 09:03:56
there's already a specialized codepath for n = 1 and property = gradient/wp
_wb_
2025-05-05 09:04:44
yeah it might be lower than the number of nodes if there's repetition in the subtrees
veluca
2025-05-05 09:05:07
but tbh, even if we don't table it up, a tree which has a relatively small number of parts that all share the same property should be significantly faster to decode as is
2025-05-05 09:06:02
(basically by making a tree of 1d lookup tables)
2025-05-05 09:06:36
(or even not lookup tables, if the # of possible values is small -- just do a SIMD-fied linear search...)
_wb_
2025-05-05 09:09:21
something like 7 buckets per property (large negative, medium negative, small negative, zero, small positive, medium positive, large positive) could already be pretty effective, so I can imagine you could pick the 3 most informative properties and make a lookup table of size `7*7*7`
veluca
2025-05-05 09:10:24
yeah that would work, and the LUT would either be small or just fit in a single SIMD registers (and effectively be 3 instructions or so)
2025-05-05 09:11:10
(fwiw you don't even need to decide those buckets, you can just let the DP figure out the best 7-way split :P)
A homosapien
_wb_ Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228 (and at faster_decode=2, some slight decode speedup, at the cost of some density) Feel free to try it out on your favorite image/corpus.
2025-05-05 09:23:47
I'm getting mixed results, and the impact to decoding speed is relatively negligible (within 0.5-2%). t's hard to say if it benefits photographic images more than non-photo
2025-05-05 09:24:12
~~Also, I think I just found another huge regression with lossless.~~
2025-05-05 09:24:47
~~I'll post it in <#803645746661425173> when I'm done double checking my numbers~~
2025-05-05 09:32:29
Nevermind, got my numbers mixed up
2025-05-05 09:33:16
Was comparing two different images by accident lol 😅
2025-05-05 09:33:54
Welp, back to work addressing the smaller regression(s) with faster decoding 3
_wb_ Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228 (and at faster_decode=2, some slight decode speedup, at the cost of some density) Feel free to try it out on your favorite image/corpus.
2025-05-05 10:02:06
Speaking of which, can I use this multiplier for faster decoding to increase the number of buckets? It's hurting large photos for faster decoding 1-3. https://github.com/libjxl/libjxl/pull/4201#issuecomment-2849934762
_wb_
2025-05-05 10:43:05
No, that's a different parameter I think.
A homosapien
2025-05-05 10:54:02
I'm making some edits in a fork and it turns out making the histogram "less efficient" is increasing density somehow.
2025-05-05 10:56:13
Granted idk what an "efficient histogram" means. I'm just going off what the PR and Chat GPT says and changing variables around
jonnyawsom3
Mine18 ~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~
2025-05-06 01:39:05
Me and Sapien have been discussing that. When we have time, we're going to try changing values to their previous states, to see if we can get the quality back without outright reverting the PR
_wb_ Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228 (and at faster_decode=2, some slight decode speedup, at the cost of some density) Feel free to try it out on your favorite image/corpus.
2025-05-06 01:44:24
The only difference is actually in non-static properties. Level 1 already disables WP entirely, giving a 2x speedup thanks to using the kNoWP tree type
Mine18
Me and Sapien have been discussing that. When we have time, we're going to try changing values to their previous states, to see if we can get the quality back without outright reverting the PR
2025-05-06 03:57:56
hopefully the solution to the regression gets found sooner or later
jonnyawsom3
2025-05-06 07:06:12
I'm struggling to understand it due to my lack of C++ experience, but assuming `gi` is Global Image and `sg` is Single Group, isn't this trying to select per-group RCTs by measuing the entire image instead? <https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L1462>
_wb_
2025-05-06 07:24:35
Nah, at that point in the code, `gi` is a group image 🙂
2025-05-06 07:27:28
https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L1363
jonnyawsom3
2025-05-06 07:35:48
Ahh, we're trying to figure out why the RCT selection is worse than explicitly setting YCoCg in our tests, since it should be trying each RCT and then using the best result
2025-05-06 07:41:20
Similar was happening with per-group palette and squeeze for progressive lossless, we think it must be basing decisions on the smallest step
_wb_ Nah, at that point in the code, `gi` is a group image 🙂
2025-05-06 08:42:18
EstimateCost is at the top of the file, so is that using the full image? https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L273 "at that point in the code" has me wanting to double check haha. Regardless, we're testing it now and globally setting a specific RCT is better than allowing local RCTs, so something is definitely wrong with the cost estimation
A homosapien
2025-05-06 08:44:10
``` cjxl smol.png smol.jxl -d 0 JPEG XL encoder v0.12.0 e87f2f87 [_AVX2_,SSE4,SSE2] Compressed to 24252.3 kB (7.616 bpp). 5828 x 4371, 5.662 MP/s [5.66, 5.66], , 1 reps, 12 threads. cjxl smol.png smol.jxl -d 0 -C 6 JPEG XL encoder v0.12.0 e87f2f87 [_AVX2_,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 24093.9 kB (7.567 bpp). 5828 x 4371, 7.448 MP/s [7.45, 7.45], , 1 reps, 12 threads. ``` Choosing a global RCT `-C 6` or `-C 10` comes really close to the RCT heuristics, or in this case, even beating it.
Melirius
_wb_ In general there is probably still some substantial room to improve MA tree learning heuristics. In particular we should implement some post-clustering tree pruning that (recursively) removes splits that go to two leaf nodes with identical predictor and context after clustering (and identical multiplier/offset). Such splits only cause some encode/decode slowdown (since the tree is unnecessarily deep) and some signaling overhead, without giving any compression benefit, so pruning them can only improve things — it just seems a bit tricky to do the code plumbing to do this pruning. <@179701849576833024> or <@1346460706345848868> do you want to give it a shot?
2025-05-06 08:44:59
Yes, good idea
jonnyawsom3
EstimateCost is at the top of the file, so is that using the full image? https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L273 "at that point in the code" has me wanting to double check haha. Regardless, we're testing it now and globally setting a specific RCT is better than allowing local RCTs, so something is definitely wrong with the cost estimation
2025-05-06 09:00:47
We added some debug printout and it seems like it *is* running per-group, but it *isn't* taking into account predictor selection. Still looking into it though