JPEG XL

You can link a wasm vm into your process but copying memory in and out of the vm is no different than serializing data between different processes. Just with even MORE overhead.

intelfx

2025-04-01 04:07:54

That's incorrect.

Demiurge

2025-04-01 04:08:19	Really?
2025-04-01 04:08:34	I don't understand then.

intelfx

2025-04-01 04:08:52	Yup, really. Context switches are expensive and are only getting more expensive (Meltdown/Spectre says hello).
2025-04-01 04:09:31	Besides, who said you need to copy in and out of the VM? If it's a few tens (or hundreds) of bytes it's easier to copy and virtually free, if it's more, you can absolutely do something to access WASM buffers directly.
2025-04-01 04:10:07	Anyway, embedding native code in JXL bitstream is an obvious non-starter for obvious reasons of ISA dependency. Even if you could sandbox it perfectly with zero cost (you can't).

Demiurge

	intelfx Anyway, embedding native code in JXL bitstream is an obvious non-starter for obvious reasons of ISA dependency. Even if you could sandbox it perfectly with zero cost (you can't).
2025-04-01 04:11:51	Oh, no one was arguing for that. Jon was just saying it might be worthwhile to make a very simple and basic JIT that generates native instructions to decode the MA trees.

intelfx

2025-04-01 04:12:04	Ah, then I misread that part.
2025-04-01 04:12:16	Then you have a JIT already, what's the problem with shipping WASM bitcode instead of that? :)

Demiurge

2025-04-01 04:12:31

It's a kind of risky idea but MAYBE it could be made guaranteed-safe if someone was very careful and clever?

intelfx

2025-04-01 04:12:42	They were, and they made wasm...
2025-04-01 04:12:52	Why reinvent the wheel?

Demiurge

2025-04-01 04:13:56	Well the MA trees have a lot simpler requirements than wasm. That would be like killing a mosquito with a grenade launcher
2025-04-01 04:14:43	wasm doesn't have good support for simd instructions yet either right?

intelfx

2025-04-01 04:18:22

There is support for simd, has been for quite some time. You can use SIMD from Rust compiled to wasm32, for instance. Is there any indication that it isn't good?

2025-04-01 04:18:53

Anyway. TL;DR of my position. - Linux namespaces are totally irrelevant because they leave GIANT attack surface which is extremely excessive for running untrusted native code (Linux namespaces except userns aren't, and never were, a security boundary; userns are kinda trying to be, but a very shoddy one) - Plan 9 mooning is totally irrelevant as well because we have it already and it's called seccomp mode 1, literally designed for running untrusted binary code with almost zero attack surface - If the amount of untrusted binary code is tiny, the setup and communication overhead of ANY external process (be it seccomp 1 or whatever Plan 9 had) would be MASSIVELY exceeding the cost of the code itself - if you want/need a JIT anyway, just use WASM (in whatever part of the pipeline) and don't reinvent the wheel

jonnyawsom3

2025-04-01 04:22:38

<:JXL:805850130203934781>

Demiurge

intelfx Anyway. TL;DR of my position. - Linux namespaces are totally irrelevant because they leave GIANT attack surface which is extremely excessive for running untrusted native code (Linux namespaces except userns aren't, and never were, a security boundary; userns are kinda trying to be, but a very shoddy one) - Plan 9 mooning is totally irrelevant as well because we have it already and it's called seccomp mode 1, literally designed for running untrusted binary code with almost zero attack surface - If the amount of untrusted binary code is tiny, the setup and communication overhead of ANY external process (be it seccomp 1 or whatever Plan 9 had) would be MASSIVELY exceeding the cost of the code itself - if you want/need a JIT anyway, just use WASM (in whatever part of the pipeline) and don't reinvent the wheel

2025-04-01 05:35:16

unveil is a filesystem namespace with no overhead, and unveil+pledge is a way for processes to revoke their own privileges and have the kernel enforce it. They're good ideas and something similar should be adopted everywhere to make security easier.

intelfx

2025-04-01 05:36:01

both unveil and pledge combined are less powerful than seccomp mode 1, so again totally irrelevant

Demiurge

2025-04-01 05:36:12

I thought simd instructions were still in the planning stages for wasm

intelfx

2025-04-01 05:36:43

(seccomp mode 2 _is_ pledge-equivalent, btw, and yes, filesystem namespaces are unveil, but that's again not what we need here)

Demiurge

	intelfx both unveil and pledge combined are less powerful than seccomp mode 1, so again totally irrelevant
2025-04-01 05:37:46	Less powerful? It's not about power. All of the power of seccomp is totally useless if it's incomprehensible to use.
2025-04-01 05:38:10	Writing secure software needs to be practical and obvious and convenient.

intelfx

2025-04-01 05:38:28

It's about relevance. I don't understand what's the point of bringing irrelevant sandboxing features from other OSes into the discussion, what point are you making?

Demiurge

2025-04-01 05:40:27	Because you mentioned seccomp, which is a uselessly incomprehensible version of what could be a useful security feature
2025-04-01 05:40:40	But it's only useful if people actually want to use it
2025-04-01 05:40:52	And no one wants to use it if it's not easy to use like pledge

intelfx

2025-04-01 05:43:12	originally I mentioned seccomp mode 1, which has NO configuration
2025-04-01 05:43:19	so it can't be incomprehensible by definition

Demiurge

2025-04-01 05:43:34

And that is what I mean by namespace, btw

intelfx

2025-04-01 05:43:43

then you are using the words wrong

Demiurge

2025-04-01 05:43:47

Different processes having different views of the filesystem

intelfx

2025-04-01 05:43:48

namespaces mean a very specific thing

Demiurge

2025-04-01 05:43:59	That is often called a filesystem namespace
2025-04-01 05:44:58	And if different processes have access to different kernel syscalls, then that isn't usually called a namespace but you can probably assume the intent or meaning still by the context

intelfx

2025-04-01 05:45:06	Okay, we are going in circles. Namespaces are irrelevant. To run untrusted native code, you don't need "different views of the filesystem": you need NO view of the filesystem, like no access to the syscalls at all. Anything else is already an infinitely larger attack surface than you want. So in context of this discussion, filesystem namespaces, or any other namespaces at all, have exactly 0% relevance.
2025-04-01 05:45:34	I said that like an hour ago.
	Demiurge And if different processes have access to different kernel syscalls, then that isn't usually called a namespace but you can probably assume the intent or meaning still by the context
2025-04-01 05:46:09	I don't want to have to "assume the intent or meaning". Words have defined meanings, let's use them.

Demiurge

2025-04-01 05:56:08	I can understand your point of view and it's a good one.
	intelfx originally I mentioned seccomp mode 1, which has NO configuration
2025-04-01 06:01:14	I don't know what mode 1 is off the top of my head. It's a linux specific API with very fine grained control and not at all friendly to the typical programmer, with not even a wrapper in the C library for it.

jonnyawsom3

2025-04-01 06:23:00

<#806898911091753051>?

Demiurge

	Lilli I could not make the chunked API work. Is there an example somewhere, where it is used? I could not find one after looking for quite a while. :/ I set up `JxlChunkedFrameInputSource` with callbacks, which I then feed to `JxlEncoderAddChunkedFrame(frame_settings, true, chunked)` This essentially replaces the call to `JxlEncoderAddImageFrame(frame_settings, &pixel_format, image_data, image_data_size)`
2025-04-01 06:41:22	Sorry no one has gotten back to you on this. Chunked encode API is pretty new and I'm not sure how it works.
2025-04-01 06:41:30	I'm just a lurker.

jonnyawsom3

2025-04-03 08:13:21

Seems like auto-merge for PRs is being blocked by a formatting error <https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178> Just added it to my changelog PR if you want to merge it quickly <@794205442175402004> https://github.com/libjxl/libjxl/pull/4169

Lucas Chollet

2025-04-03 08:35:17

Yikes, now they fail because of my recent changes, pls revert your typo fix lol

jonnyawsom3

	Lucas Chollet Yikes, now they fail because of my recent changes, pls revert your typo fix lol
2025-04-03 08:40:27	The only failing required test is due to your [CMAKE fix not being merged](<https://github.com/libjxl/libjxl/actions/runs/14251775663/job/39946056483?pr=4169>), and yours won't merge because of my [typo fix not being merged](<https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178>), so now we're at a gridlock of who's gets manually merged first haha

Lucas Chollet

2025-04-03 08:42:59

I didn't realize that you were hitting that issue too, I was referring to that log in your PR: ``` tools/jxltran.cc:9:#include "lib/include/jxl/decode.h" tools/jxltran.cc:10:#include "lib/include/jxl/decode_cxx.h" Don't add "include/" to the include path of public headers. ``` Isn't it a required test?

jonnyawsom3

	Lucas Chollet I didn't realize that you were hitting that issue too, I was referring to that log in your PR: ``` tools/jxltran.cc:9:#include "lib/include/jxl/decode.h" tools/jxltran.cc:10:#include "lib/include/jxl/decode_cxx.h" Don't add "include/" to the include path of public headers. ``` Isn't it a required test?
2025-04-03 08:44:07	Required for merging have the indicator on the right

Lucas Chollet

2025-04-03 08:45:20

Ah, didn't realize that 😅

jonnyawsom3

2025-04-03 08:45:42	Though, fixing it wouldn't hurt since we appear to have plenty of time
2025-04-03 08:46:50	I actually have permission to close PRs on the repo with my Triage role, but unfortunately I can't merge

Lucas Chollet

2025-04-03 08:47:16	I would like to fix the CMake first, but I would need another CI run for that. I'm a bit plaing a guess game here
	Though, fixing it wouldn't hurt since we appear to have plenty of time
2025-04-03 08:47:55	I will do it on my next `jxltran` PR. But you can do too if you want

jonnyawsom3

2025-04-03 08:49:35	It doesn't seem to break anything, so I'll let you do it as part of the jxltran work
	Seems like auto-merge for PRs is being blocked by a formatting error <https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178> Just added it to my changelog PR if you want to merge it quickly <@794205442175402004> https://github.com/libjxl/libjxl/pull/4169
2025-04-03 08:52:37	<@794205442175402004> apologies for the double ping, but you'll need to force merge that due to the stalemate with the CMAKE sjpeg fix (which will require re-approval after too). Then Auto-Merge should start working again

intelfx

2025-04-05 04:52:40

OK, I'm playing with progressive encoding again... Last time y'all told me that progressive lossless encoding is basically broken with effort >=5 (although in the end I failed to understand why exactly). However, I'm also getting the same thing if I use `--progressive_dc` instead of `-p`: ``` $ cjxl path/to/png path/to/jxl -d 0 JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 161373.8 kB including container (10.997 bpp). 10672 x 11000, 7.569 MP/s [7.57, 7.57], , 1 reps, 32 threads. cjxl path/to/png -d 0 365,93s user 16,98s system 2143% cpu 17,861 total $ cjxl path/to/png path/to/jxl -d 0 --progressive_dc=1 JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] ^C cjxl path/to/png -d 0 --progressive_dc= 392,90s user 233,60s system 104% cpu 9:58,64 total ``` — which, to my understanding, isn't supposed to actually do progressive encoding of the main image (i.e., it does not imply `--progressive_ac` or `--responsive`). Why is this?

A homosapien

2025-04-05 05:20:30

`progressive_dc` and `progressive_ac` only work for the lossy mode of libjxl. For lossless they don't actually make the image progressive, all it does is disables chunked encoding, which trades encoding time for more density.

intelfx

2025-04-05 05:25:08

Ah, it seems my understanding was incomplete. I did understand that `{q,}progressive_ac` was only applicable to the lossy mode (as it is fundamentally about encoding the AC/"HF" VarDCT coefficients), but I thought that `progressive_dc` was basically "prepending" a frame made of the LF coefficients onto the main image, regardless of whether the main image was lossy or lossless.

A homosapien

2025-04-05 05:29:51	Progressive lossless uses a different technique called squeeze.
2025-04-05 05:32:37
2025-04-05 05:32:37	All of JPEG XL's features are explained really well in this technical report. Section 5.1.3 explains what squeeze does alongside some images a few pages down.

jonnyawsom3

intelfx OK, I'm playing with progressive encoding again... Last time y'all told me that progressive lossless encoding is basically broken with effort >=5 (although in the end I failed to understand why exactly). However, I'm also getting the same thing if I use `--progressive_dc` instead of `-p`: ``` $ cjxl path/to/png path/to/jxl -d 0 JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 161373.8 kB including container (10.997 bpp). 10672 x 11000, 7.569 MP/s [7.57, 7.57], , 1 reps, 32 threads. cjxl path/to/png -d 0 365,93s user 16,98s system 2143% cpu 17,861 total $ cjxl path/to/png path/to/jxl -d 0 --progressive_dc=1 JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] ^C cjxl path/to/png -d 0 --progressive_dc= 392,90s user 233,60s system 104% cpu 9:58,64 total ``` — which, to my understanding, isn't supposed to actually do progressive encoding of the main image (i.e., it does not imply `--progressive_ac` or `--responsive`). Why is this?

2025-04-05 07:21:49

We're actively working on progressive lossless, if not almost done with it already https://discord.com/channels/794206087879852103/803645746661425173/1357850229247840401 Right now, progressive_dc disabling chunked works around an [issue](<https://github.com/libjxl/libjxl/issues/3823#issuecomment-2351120650>) with the TOC, where the 1:8 LF frame can't be rendered until the end of the file. Downside being the entire image has to be processed as a whole, instead of threading individual groups. Squeeze, used for progressive lossless, also disables chunked due to downsampling the image as part of the transform

intelfx

2025-04-05 12:24:02

We're actively working on progressive

Crite Spranberry

2025-04-05 04:46:40

So I'm trying to compile libjxl and I followed this guide https://github.com/libjxl/libjxl/blob/main/doc/developing_in_windows_vcpkg.md Visual Studio doesn't show any errors, but I just have no binaries at all in the /out/build/x64-Clang-Release/tools folder

A homosapien

2025-04-05 05:53:09

I recommend using msys2, it's an easier process and it generates faster binaries

Crite Spranberry

2025-04-06 11:29:13	I got further by setting BUILD_TESTING to OFF in CMakeSettings.json
2025-04-06 11:31:30	Why does this always happen
2025-04-06 11:31:40
2025-04-06 11:33:57	What am I doing where everything always fails with the most obscure errors that nobody else has?
2025-04-06 11:56:49	huh msys2 just worked
2025-04-06 11:56:55	No errors, no bs
2025-04-06 11:57:34	Or well ig one error I had to work around (my cmake version is 4.0.0 idk wtf it on about)
2025-04-06 11:58:46	Why are these like quadruple the size they should be
2025-04-06 11:59:10	Actually more like 20-30x the size
2025-04-06 12:04:23	and they don't even work wtf

Demiurge

2025-04-06 12:36:11	Did it download and compile dependencies like brotli, hwy, skia/skcms and whatever?
2025-04-06 12:36:46	It's kind of a pain to compile libjxl
2025-04-06 12:37:10	But totally doable

jonnyawsom3

2025-04-06 12:54:54

Homosapien had the same issues with it not being static and massively larger. We fixed thrm (mostly) and he mentioned rewriting the build docs

Quackdoc

	Crite Spranberry and they don't even work wtf
2025-04-06 05:00:46	you compiled dynamic, you need to copy DLLs to the folder the exec is in
	Homosapien had the same issues with it not being static and massively larger. We fixed thrm (mostly) and he mentioned rewriting the build docs
2025-04-06 05:01:20	in isolation static will always be smaller

spider-mario

	Quackdoc you compiled dynamic, you need to copy DLLs to the folder the exec is in
2025-04-06 05:29:10	or run them from the mingw shell you built them in, as it will have the `PATH` set appropriately
	Crite Spranberry Why are these like quadruple the size they should be
2025-04-06 05:29:53	maybe a debug (or at least unoptimised) build? you can run `cmake -DCMAKE_BUILD_TYPE=Release .` from the build directory (or edit `CMAKE_BUILD_TYPE` in `CMakeCache.txt` directly) and rebuild

jonnyawsom3

	Quackdoc in isolation static will always be smaller
2025-04-06 06:37:13	Our static builds are still 2-4x larger than the github releases. 10mb per binary

Quackdoc

2025-04-06 06:37:28

[av1_woag](https://cdn.discordapp.com/emojis/852007419474608208.webp?size=48&name=av1_woag)

jonnyawsom3

2025-04-06 06:37:59

It was 50, but that was debug as Mario said

spider-mario

2025-04-06 07:41:02	note that by default, a CMake build won’t be stripped
2025-04-06 07:41:22	so you can do that right after building, or add `-s` (if I recall correctly) to the linker flags

Crite Spranberry

spider-mario maybe a debug (or at least unoptimised) build? you can run `cmake -DCMAKE_BUILD_TYPE=Release .` from the build directory (or edit `CMAKE_BUILD_TYPE` in `CMakeCache.txt` directly) and rebuild

2025-04-06 09:10:37

I did that, which changed my command to this ``` cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_FORCE_SYSTEM_BROTLI=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 .. ``` The exes are still slightly too large, but much closer to normal release From here how would I make a stripped static build?

jonnyawsom3

2025-04-06 09:42:02

<@207980494892040194> didn't you say two static flags are required?

spider-mario

2025-04-06 10:31:03	`-DCMAKE_EXE_LINKER_FLAGS=-s` might help
2025-04-06 10:31:45	(that, or running `strip` on all executables yourself)

A homosapien

	Crite Spranberry I did that, which changed my command to this ``` cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_FORCE_SYSTEM_BROTLI=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 .. ``` The exes are still slightly too large, but much closer to normal release From here how would I make a stripped static build?
2025-04-06 11:05:02	You need to specify two flags for a truly static libjxl. `-DBUILD_SHARED_LIBS=OFF` and `-DJPEGXL_STATIC=ON`. Also I recommend removing `-DJPEGXL_FORCE_SYSTEM_BROTLI=ON`, lots of errors pop up and the build fails with it on.
2025-04-06 11:10:58	Also I recommend using clang, according to my testing it's around 5-10% faster than GCC

Crite Spranberry

	spider-mario `-DCMAKE_EXE_LINKER_FLAGS=-s` might help
2025-04-06 11:41:44	This does decrease the size, but static builds are still 1-2MB bigger than official release
2025-04-06 11:41:49	I'll see about trying GCC
2025-04-06 11:42:03	Also my command currently ``` cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_STATIC=ON -DCMAKE_EXE_LINKER_FLAGS=-s -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 .. ```
2025-04-06 11:44:35	pacman -S can't find clang-compiler-rt so well idk if it will work but ig I'll try
	Crite Spranberry What am I doing where everything always fails with the most obscure errors that nobody else has?
2025-04-06 11:46:40	aaaaaaaa
2025-04-06 11:46:55	welp I got what I could time to try build
2025-04-06 11:47:53	Same Cmake error even though I have 4.0 so ig I will add the minimum shit
2025-04-06 11:49:37	<:Think2:826218556453945364> ``` [344/379] Error: Can't generate doc since Doxygen not installed. FAILED: CMakeFiles/doc C:/Users/Admin/Documents/GitHub/libjxl-mingw/build/CMakeFiles/doc C:\Windows\system32\cmd.exe /C "cd /D C:\Users\Admin\Documents\GitHub\libjxl-mingw\build && false" [353/379] Building CXX object tools/CMakeFiles/enc_fast_lossless.dir/__/lib/jxl/enc_fast_lossless.cc.obj ninja: build stopped: subcommand failed. + retcode=1 ```

A homosapien

2025-04-06 11:50:04

~~I think you don't need clang-rt to compile.~~ Also I think the inflated binary sizes are intrinsically tried to lib-c or p-threads or something like that. Not much you can do about it I think.

Crite Spranberry

2025-04-06 11:52:13

How do I disable doc generation with clang?

A homosapien

2025-04-06 11:52:48	strange, I got it to compile on my machine even though I don't have doxygen installed
2025-04-06 11:52:53	try adding `-DJPEGXL_ENABLE_DOXYGEN=OFF`

Crite Spranberry

2025-04-06 11:58:00	Now I get this ``` + cmake --build /c/Users/Admin/Documents/GitHub/libjxl-mingw/build -- all doc ninja: error: unknown target 'doc', did you mean 'jxl'? ```
2025-04-07 12:00:42	Oh I can just do this

A homosapien

2025-04-07 12:01:31

`Pacman -S --needed mingw-w64-clang-x86_64-compiler-rt mingw-w64-x86_64-doxygen `

Crite Spranberry

2025-04-07 12:02:45	wtf why do I get that error now
	Crite Spranberry Now I get this ``` + cmake --build /c/Users/Admin/Documents/GitHub/libjxl-mingw/build -- all doc ninja: error: unknown target 'doc', did you mean 'jxl'? ```
2025-04-07 12:03:31	I went back to the original command and I get that error and I have no idea what I did
2025-04-07 12:04:04	temp storing this here ``` ./ci.sh opt -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_STATIC=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_BUILD_TYPE=Release -DJPEGXL_ENABLE_DOXYGEN=OFF ```
2025-04-07 12:06:15	I took a snapshot before I went to gcc because I knew some shit would happen so ig I'll try again

A homosapien

2025-04-07 12:07:50	wait you are using the script? I got it to compile like so, `export CC=clang && export CXX=clang++` and then running the regular cmake command
2025-04-07 12:08:12	I don't really trust that script I'll be honest

Crite Spranberry

2025-04-07 12:08:33	Oh I'm just following this https://github.com/libjxl/libjxl/blob/main/doc/developing_in_windows_msys.md
2025-04-07 12:12:08	So DJPEGXL_ENABLE_DOXYGEN=OFF just permenantly breaks the build command even if I remove it and nuke the build folder
2025-04-07 12:12:57	So I get an error with doxygen so ig I'll try your method

A homosapien

	spider-mario note that by default, a CMake build won’t be stripped
2025-04-07 12:13:14	Does compiling libjxl with LTO optimizations work? It could be another way of reducing binary sizes but it always seems to fail for me.
	Crite Spranberry So I get an error with doxygen so ig I'll try your method
2025-04-07 12:14:07	yeah regular ol' cmake works just fine, and it builds faster too. Just use this command for it to use all of your cores `cmake --build . -- -j$(nproc)`

Crite Spranberry

2025-04-07 12:16:49

Well it's using clang now and idk if I notice any difference The executables are still a bit bigger than the official release

A homosapien

2025-04-07 12:17:36

It's 20% faster than the official Windows releases so I would say it's a worthwhile trade off

Crite Spranberry

2025-04-07 12:17:52	cool
2025-04-07 12:20:14	So interesting findings, gcc compiled libjxl theoretically compatible with XP Except I get this error and it works fine on Vista+ so idk what it's on about
2025-04-07 12:20:53	Or well all the executables I'm interested in (cjpegli, cjxl, djxl, and jxlinfo)
2025-04-07 12:21:32	They check out in Dependency Walker, but I get this with all
2025-04-07 12:27:31	The UTF8 manifest breaks XP Ig I'll just compile without it
2025-04-07 12:29:39	Nevermind it can have a UTF8 manifest, it just is picky
2025-04-07 12:32:07	So adding a compatibility entry makes it work ``` <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1"> <application> <windowsSettings> <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage> </windowsSettings> </application> </compatibility> </assembly> ```
	Crite Spranberry So interesting findings, gcc compiled libjxl theoretically compatible with XP Except I get this error and it works fine on Vista+ so idk what it's on about
2025-04-07 12:33:05	clang*
2025-04-07 12:33:07	i already forgor
2025-04-07 12:43:15	Well time to see if libavif fares better now with clang as well
2025-04-07 12:50:01	Had to install yasm but it just builds as well holy shit
2025-04-07 12:50:20	oop need to make static build
2025-04-07 12:57:11	<:Think2:826218556453945364>
2025-04-07 01:19:20	I apparently can't do 32 bit build of libavif
2025-04-07 01:19:39	eh jxl better anyways
2025-04-07 02:13:29	Does clang not support utf8?
2025-04-07 02:13:38	Wait nvm
2025-04-07 02:13:41	I noticed the issue

A homosapien

2025-04-07 02:32:29

It would be funny to have a building doc for Windows XP <:KekDog:805390049033191445>

jonnyawsom3

	Crite Spranberry So interesting findings, gcc compiled libjxl theoretically compatible with XP Except I get this error and it works fine on Vista+ so idk what it's on about
2025-04-07 06:41:27	Not sure what hardware you're running, but if it's period correct, you could be a good benchmark for our faster_decoding tweaks xD

Lucas Chollet

2025-04-07 02:52:02

Can I get someone to run the pipeline on [4165](<https://github.com/libjxl/libjxl/pull/4165>)

Crite Spranberry

	Not sure what hardware you're running, but if it's period correct, you could be a good benchmark for our faster_decoding tweaks xD
2025-04-07 03:03:57	It's a VM, but I do have some period hardware
2025-04-07 03:04:14	but period hardware boring
2025-04-07 03:08:46	For older pre-Sandy Bridge or AVX, I have 462 Athlon XP (unlikely to run due to no SSE2) 478 Pentium 4 something or another i forgor 754 or 939 Athlon 64 I forgor again 775 Pentium 4 3.06GHz 775 C2D 1.86GHz 775 C2Q Q6700 AM3 Athlon II x4 something or another AM3 Phenom II x4 955 AM3 Phenom II x6 1090t 1366 Xeon 5080

jonnyawsom3

2025-04-08 04:48:28

Interesting, I wonder why this wasn't done for jpegli, seeing as it's usually YCbCr (Could this have avoided RGB JPEGs?) <https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_modular.cc#L763>

damian101

2025-04-09 12:42:44	encoding in RGB is very inefficient
2025-04-09 12:44:18	jpegli can encode in XYB using an ICC v4 profile

Demiurge

2025-04-09 12:50:43	Yup, and it would be a lot more compatible with existing hardware and software if it would add a JPEG APP14 tag. Otherwise some decoders will treat it like a normal YCbCr JPEG and mess up the colors.
2025-04-09 12:55:26	Also jpegli STILL uses chroma subsampling by default for XYB JPEG, which is another hint for certain decoders to treat it like a normal YCbCr JPEG
2025-04-09 12:55:42	Since it makes no sense to use chroma subsamping for an RGB JPEG
2025-04-09 12:57:04	Pretty sure that would be a one-line fix too, and that alone would have a big impact. The APP14 header fix might be a 2 line fix.

jonnyawsom3

	jpegli can encode in XYB using an ICC v4 profile
2025-04-09 01:20:39	That still uses RGB JPEG internally

damian101

	That still uses RGB JPEG internally
2025-04-09 01:21:03	but it's not RGB
2025-04-09 01:21:23	I see what you mean

jonnyawsom3

2025-04-09 01:22:05

Yeah, I mean it could have used the usual YCoCg with YXB

Demiurge

2025-04-09 01:27:27	The ICC profile is applied AFTER reverse-YCoCg back to RGB
2025-04-09 01:30:42	And Adobe (APP14) JPEG is well supported by existing decoders
2025-04-09 01:36:54	Just like how CMYK JPEG is commonly supported, using the same exact Adobe tag
	Yeah, I mean it could have used the usual YCoCg with YXB
2025-04-09 01:42:18	If you were to do this, then the decoder would still do the reverse YCbCr transform, but maybe it's possible to make an ICC profile that takes that into account and changes it back by applying another YCbCr transformation on top of the inverse XYB transform. Ugh, confusing. But it's probably doable.

username

2025-04-09 01:45:42	isn't the reason jpegli does XYB as an RGB JPEG because YCbCr in JPEG is not and cannot be lossless meaning you would get a double lossy color transform?
2025-04-09 01:46:40	also jpegli does chroma subsampling on purpose for XYB JPEGs it's not some random accident
2025-04-09 01:46:59	although it's not compatible with JPEG XL sadly :(

jonnyawsom3

	username isn't the reason jpegli does XYB as an RGB JPEG because YCbCr in JPEG is not and cannot be lossless meaning you would get a double lossy color transform?
2025-04-09 01:48:00	Ah yeah, I recall something along those lines
2025-04-09 01:49:15	Regardless, swapping the Y and X channels would give a more graceful image degredation than hot pink and green

Demiurge

2025-04-09 02:34:47	Graceful? I hear that it might actually be better for it to be ungraceful, so it's more obvious when something goes wrong.
2025-04-09 02:35:21	Also I thought the channels are already arranged "YXB"

jonnyawsom3

2025-04-09 02:37:59

I tried creating my own YXB JPEG by channel swapping and ICC editing, but it ended up still having a tint, but with the image still recognisable. So it might be worth it

Demiurge

2025-04-09 03:26:53	YBX would probably be kinda close to YCbCr right?
2025-04-09 03:27:34	If the goal was "graceful degradation"
2025-04-09 03:28:03	Which is arguably undesirable if it makes it harder to tell that a problem is present
2025-04-09 03:30:42	Maybe there should be some "tell" like intentionally making it look super bright and washed out without color management
2025-04-09 03:31:34	But that's not as obvious and cool as just making the whole thing look green
2025-04-09 03:31:42	😎

jonnyawsom3

2025-04-09 03:32:23	The B channel caused an overexposed Cr result, but I could have done something wrong
2025-04-09 03:32:45	Just a more graceful tint than eye searing pink and green

Demiurge

2025-04-09 07:16:09

Pink and green are cool tho

Meow

2025-04-10 02:47:51

Upgrading jpeg-xl 0.11.1 -> 0.11.1_1 👀

Demiurge

2025-04-10 08:29:51	Is it normal for effort=10 to give a larger file size for "JPEG lossless transcode" mode?
2025-04-10 08:30:02	Larger than effort=9 I mean

HCrikki

2025-04-10 08:46:56

i recall it performs worse than e9 for reversible jpeg transcode

Demiurge

2025-04-10 09:39:52

🧐 why

A homosapien

2025-04-10 11:00:14	Local MA trees are sometimes better than Global ones
2025-04-10 11:00:16	I think

Melirius

	A homosapien Local MA trees are sometimes better than Global ones
2025-04-11 02:35:47	Exactly, as separation along group number is limited to 256 splits in global tree, but not limited in local ones (each group has its own tree). Then this effect is more pronounced in large JPEGs with high variability

jonnyawsom3

2025-04-11 02:46:14

To be a bit more specific, an MA tree can have 255 contexts at most, so using a tree per group drastically increases the possible options

CrushedAsian255

	To be a bit more specific, an MA tree can have 255 contexts at most, so using a tree per group drastically increases the possible options
2025-04-11 03:12:04	so if using local trees there can be at most 255 * group_count contexts?

jonnyawsom3

2025-04-11 03:20:29	As I understand it, yes
2025-04-11 03:25:48	I was actually contemplating changing group size dependent on image size and thread count. Using the largest group size that fully utilises the encoding threads So for me, 1080p would be the current default of `-g 1`, 4K would be `-g 2` and 8K would be `-g 3` (based on megapixels/16 threads, compared to pixels per group size) Depends how much of an impact on encode/decode speed it has though. Yet more testing to be done!
2025-04-11 09:36:15	I'm sure I've said it before, but I would've expected a memory reduction between 16 threads and 1 thread, since it's using local MA trees per group in lossless. Accounting for the image buffer being 32f instead of 8int, it's still using twice as much memory on something

Melirius

2025-04-12 03:53:58	Could somebody run CI on mine PR? I think I fixed all the problems, but cannot check it locally, thanks
2025-04-12 03:54:02	https://github.com/libjxl/libjxl/pull/4185

jonnyawsom3

2025-04-13 08:23:38

https://discord.com/channels/794206087879852103/1256302117379903498/1360893139233280092

Demiurge

2025-04-13 08:23:44	I can think of a good group size heuristic.
2025-04-13 08:26:18	For each possible group size, Calculate the area in px^2 of "unused space" and use the largest one?
2025-04-13 08:26:34	The largest one with the least unused space I mean
2025-04-13 08:27:44	Afaik there's no reason for small group sizes to be better than larger group size.

jonnyawsom3

2025-04-13 08:28:56	That's what I was thinking, but ended up just using a minimum "pixels per thread" to make sure every block was full before picking a higher group size. So all threads are always saturated in the first pass, then whatever's left runs after
	Demiurge Afaik there's no reason for small group sizes to be better than larger group size.
2025-04-13 08:29:08	Encoding speed, decoding speed, memory and density

Demiurge

2025-04-13 08:29:10	And I don't know how much cost "wasted space" actually has, if any at all
	Encoding speed, decoding speed, memory and density
2025-04-13 08:30:09	Well in terms of density I mean, larger group sizes should not have any reason to be at a disadvantage.

jonnyawsom3

	Demiurge And I don't know how much cost "wasted space" actually has, if any at all
2025-04-13 08:30:55	From my testing, very little. At most, a memory hit from allocating the extra space, and a small speed penalty from the bigger groups. Hence why I thought I'd keep it simple with the 'minimum saturation'
	Demiurge Well in terms of density I mean, larger group sizes should not have any reason to be at a disadvantage.
2025-04-13 08:31:08	Better local MA trees in smaller groups
2025-04-13 08:33:20	This is what I had cooked up, split for legibility here ```C uint64 pixels_per_thread; pixels_per_thread = (xsize * ysize) / num_threads; if (cparams.modular_group_size_shift == -1) { if (cparams.speed_tier <= SpeedTier::kKitten && xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){ frame_header->group_size_shift = 3; } } else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){ frame_header->group_size_shift = 2; } } else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){ frame_header->group_size_shift = 1; } } else { frame_header->group_size_shift = 0; } } else { frame_header->group_size_shift = cparams.modular_group_size_shift; } ```

Demiurge

	Better local MA trees in smaller groups
2025-04-13 08:33:49	Oh... that makes sense.

jonnyawsom3

2025-04-13 08:34:13

Checks the image meets the dimensions of the group size, then checks if there's enough pixels at that group size to fill all threads, if not, try the next lower size

Demiurge

2025-04-13 08:35:17

If a bunch of similar dct blocks are all in the same group...

jonnyawsom3

2025-04-13 08:35:28

There are no DCT blocks, this is modular

Tirr

2025-04-13 08:36:21

also vardct has a fixed group size of 256

Demiurge

2025-04-13 08:37:06

jonnyawsom3

2025-04-13 08:37:46

I went for hardware dependant settings since if you want the old behaviour, all you need to do is add `g 1`, and it's dependant on image resolution too so it would already vary based on input

Demiurge

2025-04-13 08:38:49	I think it makes more sense for it to vary based on image than based on hardware
2025-04-13 08:39:08	But it's not that big of a deal. Especially if it increases speed too

jonnyawsom3

2025-04-13 08:39:11

Gives a encode speed boost, sometimes a big decode speed boost (5x) and at worst a 0.1 bpp increase, or at best a 0.1 bpp decrease so far. I am gonna test it more though

Demiurge

2025-04-13 08:39:36

The density difference I would imagine is too small to matter also

jonnyawsom3

2025-04-13 08:40:34

Yeah, I *was* worrying about it, but you get more image-to-image variance than this ever seems to cause, and bumping up an effort level with the speed increase natually obliterates it

Demiurge

2025-04-13 08:50:51

The best way to improve libjxl right now is to make the source tree more logically organized into folders, making it easier to find and build only exactly what you want/need/expect/specify. While wasting the minimum amount of time figuring out the build system or fetching dependencies you don't even need. If I only need libjxl and not cjxl, I should be able to do that easily without being a cmake genius.

2025-04-13 08:52:43

Hopefully that also will improve programmer productivity in the long term too

jonnyawsom3

This is what I had cooked up, split for legibility here ```C uint64 pixels_per_thread; pixels_per_thread = (xsize * ysize) / num_threads; if (cparams.modular_group_size_shift == -1) { if (cparams.speed_tier <= SpeedTier::kKitten && xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){ frame_header->group_size_shift = 3; } } else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){ frame_header->group_size_shift = 2; } } else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){ frame_header->group_size_shift = 1; } } else { frame_header->group_size_shift = 0; } } else { frame_header->group_size_shift = cparams.modular_group_size_shift; } ```

2025-04-13 08:53:52

Tried it on a few more images, the results are 'it varies' Anything from a 1% size increase, to a 20% reduction. 25% faster encoding, or 5% slower. I'll probably stick it in a PR down the road so I can test it independently and more thoroughly

Demiurge

2025-04-13 08:56:43	Same thing with jpegli too. If I want to build that it should be easy to specify what I want, if I want a libjpeg-compatible static or dynamic library for linking, what version, or if I want a jpegli-specific library with jpegli symbols, and if I want to install header files for libjpeg and/or libjpegli
	Tried it on a few more images, the results are 'it varies' Anything from a 1% size increase, to a 20% reduction. 25% faster encoding, or 5% slower. I'll probably stick it in a PR down the road so I can test it independently and more thoroughly
2025-04-13 08:58:35	20% bitrate reduction? For lossy modular?

jonnyawsom3

	Demiurge 20% bitrate reduction? For lossy modular?
2025-04-13 09:10:29	Lossless, and 12% because I forgot the .6 thanks to being up all night, but yeah

A homosapien

2025-04-13 09:31:25

All of the improvements jonny and I are working on are purely for lossless. It's somewhat easy to benchmark and gauge improvements since all we have to worry about is encode/decode speeds and density.

Demiurge

2025-04-13 09:33:49	12% is still pretty massive
2025-04-13 09:34:10	Smaller group size = 12% bitrate improvement??
2025-04-13 09:34:29	For progressive lossless only?

A homosapien

2025-04-13 09:53:53

Progressive lossless has some strange behavior, a lot of the settings which benefit regular lossless actually hurt progressive lossless. An example would be small group sizes, on average it's bad for normal lossless but good for progressive lossless. So we had to completely retune the codec for progressive.

jonnyawsom3

	Demiurge For progressive lossless only?
2025-04-13 09:54:03	Normal lossless, you can see the command in the image
2025-04-13 09:54:38	The group size and threading is an idea I had while we wrap up the progressive and faster decoding tweaks, but I think it's better suited as a separate PR
2025-04-13 09:56:29	For progressive lossless only, it's around 20% bitrate improvement and 600% encode speed improvement, 20% decode speed improvement Faster Decoding lossless, 75% bitrate improvement and 40% decode speed improvement

A homosapien

2025-04-13 09:59:33	I thought progressive was more like 35 - 40% bitrate improvment?
2025-04-13 09:59:42	Ever since we fixed that RCT bug

jonnyawsom3

2025-04-13 09:59:54	Oh right yeah, with that fixed it's around 35%
2025-04-13 10:00:29	Some images don't like it, some love it. Heuristics are broken so we can't try both easily

A homosapien

2025-04-13 10:02:35

Yeah, some of the heuristics seem actively hurt progressive. So just disabling them and setting a global flag is better in most cases.

jonnyawsom3

2025-04-13 10:08:05	YCoCg tends to help more than hurt, so we're enabling it for progressive. Though I have seen a few images get bigger with it instead
	A homosapien Progressive lossless has some strange behavior, a lot of the settings which benefit regular lossless actually hurt progressive lossless. An example would be small group sizes, on average it's bad for normal lossless but good for progressive lossless. So we had to completely retune the codec for progressive.
2025-04-13 10:09:40	I only just thought, but it's the per channel (or global?) palette that breaks things. Smaller groups might be allowing it to use local palette still... Yet more testing to be done!
2025-04-13 10:10:40	Okay nevermind, I got them mixed up

A homosapien

2025-04-13 10:10:52

Yeah, we still might be able to eek out a little more bitrate saving with progressive. <:FeelsReadingMan:808827102278451241> <:jxl:1300131149867126814>

Demiurge

2025-04-13 10:11:13

<:JXL:805850130203934781>

jonnyawsom3

2025-04-13 10:11:18	```-X PERCENT, --pre-compact=PERCENT Use global channel palette if the number of sample values is smaller than this percentage of the nominal range. -Y PERCENT, --post-compact=PERCENT Use local (per-group) channel palette if the number of sample values is smaller than this percentage of the nominal range.``` It's `-Y 0` that fixes progressive even on main cjxl
	A homosapien Yeah, we still might be able to eek out a little more bitrate saving with progressive. <:FeelsReadingMan:808827102278451241> <:jxl:1300131149867126814>
2025-04-13 10:17:16	Username found this, so we could try adding a few predictors for progressive to try <https://github.com/libjxl/libjxl/blob/c496c521f99c13b8205c4fc4ff3eb3d652a1d1c3/lib/jxl/modular/encoding/enc_ma.cc#L535>

A homosapien

2025-04-13 10:18:47	Right, like a special heuristic predictor just for progressive
2025-04-13 10:19:29	Probably for efforts 8+

jonnyawsom3

2025-04-13 10:22:45

Nah, I think it could run lower. It'll only be maybe 4 predictors instead of the full 12 that P15 does. Maybe even just gradient and none for the decode speed

monad

	I went for hardware dependant settings since if you want the old behaviour, all you need to do is add `g 1`, and it's dependant on image resolution too so it would already vary based on input
2025-04-13 09:36:22	not that it matters, but current default selects g2 for small enough images
	This is what I had cooked up, split for legibility here ```C uint64 pixels_per_thread; pixels_per_thread = (xsize * ysize) / num_threads; if (cparams.modular_group_size_shift == -1) { if (cparams.speed_tier <= SpeedTier::kKitten && xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){ frame_header->group_size_shift = 3; } } else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){ frame_header->group_size_shift = 2; } } else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){ frame_header->group_size_shift = 1; } } else { frame_header->group_size_shift = 0; } } else { frame_header->group_size_shift = cparams.modular_group_size_shift; } ```
2025-04-13 09:51:20	interesting concept for encoding, but wouldn't larger group sizes negatively affect decode speed on platforms with more threads than the origin? I'm also curious if it's really a net benefit to use the smallest group size in general, I recall it especially slowing down decode in some cases

jonnyawsom3

	monad not that it matters, but current default selects g2 for small enough images
2025-04-13 11:27:19	Yeah, images 400 x 400 or less, citing no multithreaded speedup and wasted space in half-full groups. In practice though, there's not much overhead from empty space in groups. Otherwise it's always g1, apart from e11 which overrides any given parameters anyway
	monad interesting concept for encoding, but wouldn't larger group sizes negatively affect decode speed on platforms with more threads than the origin? I'm also curious if it's really a net benefit to use the smallest group size in general, I recall it especially slowing down decode in some cases
2025-04-13 11:31:44	From our testing in <#1358733203619319858>, g0 is always faster, to the point we made faster decoding level 2 and up force g0. And with the overhauled decoding levels, I'm hoping people will be more likely to use them if they want to make sure decoding is fast. The odds are highly unlikely the image exactly matches group dimensions, so there'll be extra groups that more threads can still take advantage of too. Surprisingly, g0 even helps when threads are already saturated at higher levels. Best guess is the shallower MA trees allow quicker traversal per group
2025-04-13 11:34:17	As always with image compression, results vary based on image content, but it seems to be a slight density improvement while making sure the threads you specify are actually being used. I think it's because the higher resolution the image, the more likely it is to have slow gradients instead of sharp edges, making larger groups more effective

itszn

2025-04-14 12:03:50

not quite sure what channel to post this, but I have something kinda cool to show off :) I write security puzzles for hacking competitions (CTFs) and this year modified libjxl to include some extra predictor opcodes which had vulnerabilities. Teams had to craft a jxl image which could exploit these vulnerabilities and get code execution when the image is rendered to png. https://github.com/Nautilus-Institute/quals-2025/tree/main/jxl4fun Attached is what my final exploit image looked like, all of the grey parts are memory address leaks propagating through the predictor operators. It calculates ASLR (Address Space Layout Randomization, an exploit mitigation) bypass offsets using various operators. Finally the red pixels are from a new operator I added which had a out-of-bounds vulnerability. This is where the actual exploit triggers :) Anyway I had a lot of fun learning libjxl internals so that I could modify it in this way for the puzzle. I hope some of you can appreciate the exploit and the puzzle

jonnyawsom3

2025-04-14 12:15:06	Ooh, intruiging
	itszn not quite sure what channel to post this, but I have something kinda cool to show off :) I write security puzzles for hacking competitions (CTFs) and this year modified libjxl to include some extra predictor opcodes which had vulnerabilities. Teams had to craft a jxl image which could exploit these vulnerabilities and get code execution when the image is rendered to png. https://github.com/Nautilus-Institute/quals-2025/tree/main/jxl4fun Attached is what my final exploit image looked like, all of the grey parts are memory address leaks propagating through the predictor operators. It calculates ASLR (Address Space Layout Randomization, an exploit mitigation) bypass offsets using various operators. Finally the red pixels are from a new operator I added which had a out-of-bounds vulnerability. This is where the actual exploit triggers :) Anyway I had a lot of fun learning libjxl internals so that I could modify it in this way for the puzzle. I hope some of you can appreciate the exploit and the puzzle
2025-04-14 12:19:25	This may interest you too https://github.com/google/google-ctf/tree/main/2023/quals/rev-jxl/solution

itszn

	This may interest you too https://github.com/google/google-ctf/tree/main/2023/quals/rev-jxl/solution
2025-04-14 12:21:45	Yup, I've seen that one thanks for sharing :) Cool that you know about it. For mine I wanted to take it all the way to code exec. I was inspired by this exploit: https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html Which does similar style things in the JBIG2 image library to calculate offsets

jonnyawsom3

2025-04-14 12:26:48

Ahhh, my favourite. I remember discovering that a while ago and thinking "They built a microprocessor inside a PDF? Why didn't I hear about this sooner!"

monad

From our testing in <#1358733203619319858>, g0 is always faster, to the point we made faster decoding level 2 and up force g0. And with the overhauled decoding levels, I'm hoping people will be more likely to use them if they want to make sure decoding is fast. The odds are highly unlikely the image exactly matches group dimensions, so there'll be extra groups that more threads can still take advantage of too. Surprisingly, g0 even helps when threads are already saturated at higher levels. Best guess is the shallower MA trees allow quicker traversal per group

2025-04-14 09:08:05

Maybe it's a high effort implication since that's mostly where I've permuted settings.

jonnyawsom3

	monad Maybe it's a high effort implication since that's mostly where I've permuted settings.
2025-04-14 09:24:59	I was considering limiting g3 to effort 9+ or when chunked is disabled, due to the 4x memory increase compared to current

monad

2025-04-14 11:20:35

I will try the suggestion posted. I tried something similar before, but quickly discarded it due to decode. Btw, at a glance it seems the "pixels_per_thread > min_target_group_pixels" enforcement ensures threads cannot be minimized when images cleanly tile with full groups. Intended?

jonnyawsom3

2025-04-14 11:45:24

Ah, good catch, I'll edit that now. If decode speed is still an issue, I recommend trying `--faster_decoding` in our fork <https://github.com/jonnyawsom3/libjxl/tree/FastSqueezeFixes> It has a much cleaner scale of Density/Speed, with improvements exceeding main `--faster_decoding 4`

Quackdoc

2025-04-14 12:12:39

can't wait to test it in olive

Tirr

2025-04-14 12:15:32

in my testing fd4 got significantly faster with reasonable density tradeoff

jonnyawsom3

2025-04-14 12:36:55

25% faster and 25% smaller as a rule of thumb https://discord.com/channels/794206087879852103/1358733203619319858/1358735338817720330

monad

2025-04-14 05:18:25

```images per bucket g3 g2 g1 g0 20t 1 31 261 529 1t 299 306 146 71 bpp 20t dec MP/s 1t dec MP/s git head e7 4.59 66.575 10.559 modified 20t e7 4.57 81.226 10.366 modified 1t e7 4.56 40.009 9.389 git head e8 4.47 72.912 10.044 modified 20t e8 4.46 78.802 9.717 modified 1t e8 4.45 19.530 8.164```

jonnyawsom3

2025-04-14 08:28:51

Intriguing

Demiurge

2025-04-17 09:26:29

If someone fixes the color ringing/desaturation issue, and the overzealous crushing of shadows, then libjxl will leap 2 generations ahead and surpass libaom...

CrushedAsian255

	Demiurge If someone fixes the color ringing/desaturation issue, and the overzealous crushing of shadows, then libjxl will leap 2 generations ahead and surpass libaom...
2025-04-18 02:25:02	isn't it mainly just a tuning issue?

Demiurge

2025-04-18 03:38:34

You could call it that

jonnyawsom3

2025-04-18 12:30:21

Is there anything blocking turning the jpegli folder into a submodule of the Google repo? We've noticed some commits not being mirrored and it would remove any confusion or ambiguity

Demiurge

2025-04-18 12:47:00

So would deleting the separate repo 😂

jonnyawsom3

2025-04-18 12:56:50

If we use it as a submodule, then we can treat it as an actual library instead of a growth clinging onto libjxl. Ideally also using it instead of libjpeg

Demiurge

2025-04-18 03:32:06	Whether it's an actual library or not doesn't depend on it being in a separate repo. It depends on the build system making it easy to build static/dynamic libraries and whether it comes with header files to make it easier to use as a library.
2025-04-18 03:33:59	It uses a lot of code and files from libjxl. But libjxl needs to be easier to build as a library only, without setting up a bunch of dependencies that are only used for cjxl/djxl
2025-04-18 03:36:29	libjpegli builds itself as a single libjpeg compatible library but doesn't let you build different API versions of the library at the same time, and it doesn't come with any header files.
2025-04-18 03:37:27	Those seem like far more useful things to fix than separating redundant copies of the code into confusingly diverging repos
2025-04-18 03:40:03	It's cool that they share lots of code and that improvements and adoption of one helps both...

Torn

2025-04-19 09:12:34

Might be a bit much to ask, but is there an example of initializing an ImageBundle with my own data? (If this is the wrong channel, point me to the right one.)

spider-mario

2025-04-19 09:38:54

isn’t ImageBundle the internal API?

Torn

2025-04-19 09:39:30

It's a jxl class, yeah?

spider-mario

2025-04-19 10:07:13

one that is not meant to be exposed outside of libjxl

Torn

2025-04-19 10:11:15

I'm just trying to make a version of ssimulacra2 that accepts images as piped data, instead of command line arguments that are file names. It seems to be quite attached to jxl and operates on ImageBundles, as far as I can tell.

CrushedAsian255

	spider-mario one that is not meant to be exposed outside of libjxl
2025-04-20 05:38:52	oops forgot to sign the ImageBundle NDA

monad

2025-04-22 08:28:38

guys, we can finally transcode our JPEGs https://github.com/libjxl/libjxl/pull/2704

jonnyawsom3

2025-04-22 08:33:51	> the fraction of JPEGs with empty DHT markers found in the wild seems to have grown recently and it is now a substantial amount
2025-04-22 08:34:01	Interesting, I wonder what changed

monad

2025-04-22 08:55:25

I think it's just change in visibility given sufficient time

_wb_

2025-04-22 10:26:14

I think WhatsApp for some reason started to produce jpegs with empty dht markers

Melirius

2025-04-22 02:06:30

OK, I've tried several approaches for DCT coefficient order determination other than simple histogram (taking into account two-coef correlations growing from beginning and end of coefficients, remaking histogram for zero-runs after each coef selection, etc.), all of them are much slower and produce on my JPEG test suite at best (2-3)*10^-5 relative size improvement, so I think to stop here and try other improvements

2025-04-22 02:08:25

Maybe Glacier mode can benefit from the best of them, otherwise it is pointless

_wb_

Melirius OK, I've tried several approaches for DCT coefficient order determination other than simple histogram (taking into account two-coef correlations growing from beginning and end of coefficients, remaking histogram for zero-runs after each coef selection, etc.), all of them are much slower and produce on my JPEG test suite at best (2-3)*10^-5 relative size improvement, so I think to stop here and try other improvements

2025-04-23 08:39:26

Thanks for trying, it's of course more satisfying if there's an improvement but it's also good to know when the simpler thing is as good as it gets.

MSLP

2025-04-24 05:43:15

Apart from everything, it's great that jxl-rs is getting more love recently!

Traneptora

	Torn I'm just trying to make a version of ssimulacra2 that accepts images as piped data, instead of command line arguments that are file names. It seems to be quite attached to jxl and operates on ImageBundles, as far as I can tell.
2025-04-25 07:33:51	Can you pass "-" as the filename? this works with djxl

Torn

	Traneptora Can you pass "-" as the filename? this works with djxl
2025-04-25 08:41:06	No. I already rewrote the main method. It had no logic previously to get the data in any other manner than reading it itself. I got it to run with my own data, but either I put it at slightly wrong addresses or in the wrong format, so I'll have to continue debugging it on the weekend.

Traneptora

	Torn No. I already rewrote the main method. It had no logic previously to get the data in any other manner than reading it itself. I got it to run with my own data, but either I put it at slightly wrong addresses or in the wrong format, so I'll have to continue debugging it on the weekend.
2025-04-25 09:34:13	possibly relevant is there's a separate ssimulacra2 repo
2025-04-25 09:34:23	https://github.com/cloudinary/ssimulacra2
2025-04-25 09:34:30	dunno how much is actually just stuff pulled from upstream libjxl

Torn

2025-04-25 09:42:49

Pretty much all of it.

_wb_

2025-04-25 09:48:47

It's identical

jonnyawsom3

2025-04-27 09:38:12

Assuming I didn't break anything, this should help a few longstanding issues https://github.com/google/jpegli/pull/130

spider-mario

2025-04-27 09:41:31

how much benefit is gained from the less compatible blue subsampling with XYB?

username

	spider-mario how much benefit is gained from the less compatible blue subsampling with XYB?
2025-04-27 09:45:35	this I guess? https://discord.com/channels/794206087879852103/803645746661425173/1362628965860380823 (not-specified = blue subsampling)

jonnyawsom3

	spider-mario how much benefit is gained from the less compatible blue subsampling with XYB?
2025-04-27 09:46:20	Subsampling of RGB JPEG at all is a bit exotic, but generally not a whole lot... The channel is already heavily quantized, making the subsampling have less impact. In the PR I made 444 default for compatibility, and merely fixed the `--chroma_subsampling` parameter so that it doesn't subsample the Y channel for XYB JPEGs, if people choose to use subsampling anyway for the extra few percent (and sacraficing the 20% from JXL...)
2025-04-27 09:47:50	We did try subsampling X too, since if we're risking compatibility anyway, we might as well make the most of it. It ended up hurting image quality far too much though, so we stuck with only B subsampling instead

spider-mario

	username this I guess? https://discord.com/channels/794206087879852103/803645746661425173/1362628965860380823 (not-specified = blue subsampling)
2025-04-27 09:53:16	oh, yes, that must be it, thanks (I assume those bits are not really “per pixel”, though)

jonnyawsom3

	Assuming I didn't break anything, this should help a few longstanding issues https://github.com/google/jpegli/pull/130
2025-04-27 10:00:00	We don't know why, but we also discovered that Vertical and Horizontal subsampling are swapped between YCbCr and XYB, so we had to reverse the values to get the expected results...

CrushedAsian255

2025-04-27 10:31:08

What happens is you subsample luma and keep chroma at 1:1

jonnyawsom3

2025-04-27 10:40:33

You get a blurry image

Demiurge

2025-04-27 02:32:21	Honestly it should refuse to use subsampling for RGB JPEG...
2025-04-27 02:33:08	Also what about the jpegli files in the libjxl source, are they just diverging into two separate redundant branches and files now?
2025-04-27 03:07:46	Good start though 👀
2025-04-27 03:50:48	I think usually you would put those in 3 separate pull requests though
2025-04-27 03:51:51	Also does it actually write the correct adobe tag or does it write a ycck tag?

username

	Demiurge Also does it actually write the correct adobe tag or does it write a ycck tag?
2025-04-27 08:57:53	seems like it should write the correct value for the APP14 marker: https://github.com/google/jpegli/blob/bc19ca2393f79bfe0a4a9518f77e4ad33ce1ab7a/lib/jpegli/bitstream.cc#L58

jonnyawsom3

	Demiurge Honestly it should refuse to use subsampling for RGB JPEG...
2025-04-27 08:58:17	Probably... Maybe I could have it print a warning about compatibility issues. Can't just make it a submodule because it has duplicate JXL code and the folder structure is a level too high. Probably should be separate PRs. Only the second 'real' PR I've actually made along with <#1358733203619319858>, so still learning best practices. It was mostly one line changes so didn't seem worth splitting to me. Correct tags. Default value of APP14 is 0, which is RGB and CMYK JPEG. YCbCr gets assigned 1 (Not that it's used anyway) and YCCK 2
2025-04-27 08:59:01
2025-04-27 09:00:18	<@245794734788837387> brought this up, sounds like a good idea. Using RGB at Quality 100/Distance 0 with a warning when subsampling is enabled

username

2025-04-27 09:24:30	it would also kinda line up with libjxl does things since it also has special behavior for distance 0
2025-04-27 09:26:09	oh and if there is a warning added about JXL-transcoding for XYB and quality 100 it should probably be present around the API and CLI option(s) for subsampling
2025-04-27 09:32:40	maybe something like "WARNING: Using Values other then 444 in conjunction with either XYB or Quality 100/Distance 0 will result in files that cannot be losslessly transcoded to JPEG XL!"

jonnyawsom3

2025-04-27 09:43:37	As previously said, subsampled RGB isn't common anyway, so it could just be `Note: Implicit-default for Quality 100/Distance 0 is RGB JPEG` `Warning: Subsampled RGB JPEG may cause compatibility issues`
2025-04-27 09:48:08	Also <@207980494892040194>, I'm gonna need a hand to fix... Everything xD

A homosapien

2025-04-27 09:48:58

https://tenor.com/view/meme-anime-gif-22734101

jonnyawsom3

	A homosapien https://tenor.com/view/meme-anime-gif-22734101
2025-04-27 10:37:27	Think I fixed it, somehow a missing } wasn't breaking the windows builds, so we never caught it before
2025-04-27 10:38:38	Also made it only subsample YCbCr at low quality, since if we do want to expose RGB in cjpegli, we don't want that subsampled at all Should probably make RGB a parameter so people can disable it if they only want q100 YCbCr
2025-04-27 10:39:19	Though, I wonder how well subsampling everything and effectively halving the resolution but at higher quality would look....

A homosapien

	Think I fixed it, somehow a missing } wasn't breaking the windows builds, so we never caught it before
2025-04-27 10:55:47	I also fixed some incorrect chroma values for 440 and renamed `distance` to `qDistance` since it was already defined earlier.

jonnyawsom3

2025-04-28 12:36:36

Added APP14 to CMYK and fixed the issues... Hopefully. Tests soon™

runr855

2025-04-28 12:46:36	Is there a reason for why the jxl-x86-windows-static.zip 0.11.1 release of libjxl has 19/64 detections as a trojan? That seems very high for a false positive
2025-04-28 12:46:55	On Virustotal
2025-04-28 12:47:20	And it has been detected as a trojan for months

Demiurge

	<@245794734788837387> brought this up, sounds like a good idea. Using RGB at Quality 100/Distance 0 with a warning when subsampling is enabled
2025-04-28 01:12:33	Doesn't sound particularly appealing to me...

jonnyawsom3

	Demiurge Doesn't sound particularly appealing to me...
2025-04-28 01:13:13	Scores 2 points higher in SSIMULACRA, so it's something :P

Demiurge

2025-04-28 01:15:15	Higher than default ycbcr?
	Scores 2 points higher in SSIMULACRA, so it's something :P
2025-04-28 01:16:45	That's still kinda silly
2025-04-28 01:17:17	RGB JPEG is kind of uncommon

jonnyawsom3

	Demiurge Higher than default ycbcr?
2025-04-28 01:23:45	`cjpeg -quality 100` vs `cjpeg -quality 100 -rgb`
	Demiurge RGB JPEG is kind of uncommon
2025-04-28 01:25:30	And quality 100 generally shouldn't be used. It'd be a default, but I'd probably add the same `-rgb` flag to cjpegli, with 0 disabling the switch and 1 forcing RGB on... If it's feasible. Along with the printout about being RGB

Demiurge

2025-04-28 01:26:49

RGB JPEG shouldn't be used because it's not efficient and the quant tables are not even tuned for that.

jonnyawsom3

2025-04-28 01:27:16

There is no quant table at q 100, it's all 1...

Demiurge

2025-04-28 01:27:18	But for quality 100 that's a different story since that doesn't apply
2025-04-28 01:27:24	Yeah
2025-04-28 01:27:53	Still uncommon

jonnyawsom3

2025-04-28 01:28:31	Didn't stop XYB, and again, it'll only be a default for cjpegli. With a message about how to disable it, similar to JPEG transcoding with cjxl
2025-04-28 01:30:28	But it'll probably be a sperate PR. This one was meant to fix the biggest issues. APP14, weird XYB defaults and broken subsampling, with a tweak to enable 420 at q 30
2025-04-28 01:37:33	Might address https://discord.com/channels/794206087879852103/1301682361502531594 at some point too, but we need to do wider testing around what quality threshold to disable it

username

	`cjpeg -quality 100` vs `cjpeg -quality 100 -rgb`
2025-04-28 02:12:59	do a Xor compare of both of them against the lossless source, there are a lot less colors shifted around for RGB
	Demiurge RGB JPEG shouldn't be used because it's not efficient and the quant tables are not even tuned for that.
2025-04-28 02:14:34	if someone is specifying "quality 100" then I don't think they care about size
2025-04-28 02:17:15	I would presume the amount of people who care about size when defining the maximum available quality value are wayyy less then the people defining it because they expressly don't care about size and want a reference exchange file

Demiurge

2025-04-28 03:09:21	What difference does it actually make though?
2025-04-28 03:09:47	Aside from compatibility possibly.
2025-04-28 03:11:00	Possibly larger file size for no actual increase in fidelity?

jonnyawsom3

2025-04-28 03:17:40

We *just* said that it scores 2 points higher...

username

	Demiurge What difference does it actually make though?
2025-04-28 03:19:44	I have seen on multiple occasions both people and companies use "quality 100" JPEGs as original references or as an intermediate format or "master copy", such as for example an artist exporting a ref sheet with defined color areas you are supposed to use a color picker on OR a company serving and processing millions or more images a day.

Demiurge

	We just said that it scores 2 points higher...
2025-04-28 03:21:27	That doesn't demonstrate anything though frankly.
2025-04-28 03:23:06	Comparisons with color gradients would be an excellent real demonstration though
	username I have seen on multiple occasions both people and companies use "quality 100" JPEGs as original references or as an intermediate format or "master copy", such as for example an artist exporting a ref sheet with defined color areas you are supposed to use a color picker on OR a company serving and processing millions or more images a day.
2025-04-28 03:23:39	What are those white sqares and why do the originals have such bad banding?

jonnyawsom3

	Demiurge That doesn't demonstrate anything though frankly.
2025-04-28 03:24:42	YCbCr and RGB at q100 compared to the original with XOR

username

	Demiurge What are those white sqares and why do the originals have such bad banding?
2025-04-28 03:26:44	https://cloudinary.com/blog/why_jpeg_is_like_a_photocopier#why_does_this_happen_

Demiurge

	YCbCr and RGB at q100 compared to the original with XOR
2025-04-28 03:26:44	This is a cool comparison too, but not as convincing as just showing some color gradients, side by side.

username

2025-04-28 03:26:55

~~trolling arc~~

jonnyawsom3

2025-04-28 03:27:16

At least they haven't brought up noise again

username

2025-04-28 03:29:08

maybe they are genuinely worried about the compatibility concern although In my testing RGB JPEGs seem to work just fine in most software

Demiurge

2025-04-28 03:29:11

I'm sincerely asking questions and sincerely wondering how much of a difference it makes. It's unfortunate you assume I have bad intentions.

username

2025-04-28 03:30:11	with the context of your messages being in relation to a compatibility concern with software they make more sense
2025-04-28 03:30:48	otherwise they seem like you are either ignoring or don't understand what is being presented to you and why

Demiurge

2025-04-28 03:30:53

The xor comparison for example is a cool visualization but a better visualization would be a worst-case image like RGB color gradients and comparing the difference side by side.

jonnyawsom3

	Demiurge This is a cool comparison too, but not as convincing as just showing some color gradients, side by side.
2025-04-28 03:32:55	Original, YCbCr, RGB

Demiurge

2025-04-28 03:33:41	Nice! See? That very effectively demonstrates that it makes a real and positive difference.
2025-04-28 03:34:01	That's all I was asking.

jonnyawsom3

2025-04-28 03:34:06

Gradients were actually a good shout, my fairly noisy test image was masking most of it

Demiurge

2025-04-28 03:34:18	Exactly.
2025-04-28 03:34:38	I'm happy now. You demonstrated exactly what I was asking about.
2025-04-28 03:35:16	It's not trolling to ask a sincere and fair question...
2025-04-28 03:36:24	I genuinely didn't know if the color transformation would actually make a difference in practice.

jonnyawsom3

2025-04-28 03:37:23

It's nearly 5am and I *really* didn't wanna go into Krita to try and make a comparison image... Then I realised I had that 10-bit test image and could just let Discord do the comparing for me

username

	Demiurge It's not trolling to ask a sincere and fair question...
2025-04-28 03:42:17	I guess my reason for confusion was I couldn't gauge exactly why you kept seemingly fighting against a change that when presented as improving color sampling accuracy in a case where people treat something as a giant reference image. psychovisual tuning vs mathematical similarity or something idk

jonnyawsom3

2025-04-28 03:42:43

Bonus YCbCr vs XYB

username

2025-04-28 03:43:30

XYB still has that issue for me where stuff becomes darker

jonnyawsom3

	username XYB still has that issue for me where stuff becomes darker
2025-04-28 03:47:27	Importing to Krita the image is darker, but strangely converting the layer from XYB to sRGB fixes it, as if it's using the wrong transfer or something for rendering. In Irfanview it has a pink tint...

Demiurge

	username I guess my reason for confusion was I couldn't gauge exactly why you kept seemingly fighting against a change that when presented as improving color sampling accuracy in a case where people treat something as a giant reference image. psychovisual tuning vs mathematical similarity or something idk
2025-04-28 03:47:48	I was skeptical of what the actual difference was or whether it could actually be demonstrated.
2025-04-28 03:48:07	Or if it was just an assumption

jonnyawsom3

2025-04-28 03:48:38

If I'm honest I was skeptical it would show anything in a gradient, but then it clicked that an RGB gradient would be best *in* RGB, naturally

Demiurge

	If I'm honest I was skeptical it would show anything in a gradient, but then it clicked that an RGB gradient would be best in RGB, naturally
2025-04-28 03:49:13	Yep, it's basically a worst case scenario and the best contrived example to demonstrate the difference
2025-04-28 03:50:15	But it's still real enough to matter
2025-04-28 03:51:06	RGB looks just like the original whereas xyb and ycbcr are uneven steps

jonnyawsom3

2025-04-28 03:52:17

....it's the god damn gamma again

Demiurge

2025-04-28 03:52:41

To be fair, the uneven steps are because of rounding errors that can theoretically be fixed in the decoder/cms, but that's a whole other can of worms. And you're not going to fix everyone's broken software.

jonnyawsom3

2025-04-28 03:53:42

Old XYB, New XYB (Stripped gAMA from the PNG)

Demiurge

2025-04-28 03:54:44

The gAMA tag was messing up the png xyb?

jonnyawsom3

2025-04-28 03:55:35	cjpegli uses the guts of libjxl, so it correctly handles gamma in PNGs... Everything else, doesn't. So the XYB looks wrong in comparison
	runr855 Is there a reason for why the jxl-x86-windows-static.zip 0.11.1 release of libjxl has 19/64 detections as a trojan? That seems very high for a false positive
2025-04-28 05:25:51	Interesting, seems jpegli is the main culprit, not sure why though

Demiurge

2025-04-28 06:10:31	Lots of virus engines classify very broad categories as "trojan" like for example anything with the curl dll
2025-04-28 06:10:49	what do the virus engines call the supposed trojan?

runr855

2025-04-28 12:24:04	I believe it would be worth investigating. Widows Defender reacts to it, so no Windows users can use it without Defender intervening
2025-04-28 12:24:21	There is also the risk of supply chain attacks, which I don't think should be forgotten completely

novomesk

	runr855 I believe it would be worth investigating. Widows Defender reacts to it, so no Windows users can use it without Defender intervening
2025-04-28 04:16:42	https://www.virustotal.com/gui/file/aa950f4d37abc1e52a5dbca153479b7cba0303e35331deb7d5ee5b18adf7a23b It is necessary to contact those AV companies and to report the case as False Positive. I recommend to start with Avast/AVG - same company, same detection. BitDefender's engine is used by more different products - so resolving it there has big impact.

_wb_

2025-04-29 12:45:42

If someone feels like it, feel free to check https://app.codecov.io/gh/libjxl/libjxl?search=&displayType=list and try to figure out what's up with those ~15% lines of code currently not covered by tests. It could be various things: - missing tests that should actually be there - various rather trivial error conditions (e.g. invalid api usage) that we didn't bother to add tests for (though maybe we should?) - dead code that can be removed - dead code because of a bug

jonnyawsom3

2025-04-29 01:07:47

Seems to be errors or untested encode parameters like keeping invisible pixels

A homosapien

	Seems to be errors or untested encode parameters like keeping invisible pixels
2025-04-29 01:52:59	Do you think that could explain the excessive ram usage? I remember you said the math wasn't adding up.

Melirius

_wb_ If someone feels like it, feel free to check https://app.codecov.io/gh/libjxl/libjxl?search=&displayType=list and try to figure out what's up with those ~15% lines of code currently not covered by tests. It could be various things: - missing tests that should actually be there - various rather trivial error conditions (e.g. invalid api usage) that we didn't bother to add tests for (though maybe we should?) - dead code that can be removed - dead code because of a bug

2025-04-29 02:04:05

Will try to check

jonnyawsom3

	A homosapien Do you think that could explain the excessive ram usage? I remember you said the math wasn't adding up.
2025-04-29 02:06:54	You mean for progressive lossless? Because that was something else, I just mean the error conditions aren't being tested in the coverage

pshufb

2025-04-29 04:57:47	https://web.ist.utl.pt/nuno.lopes/pubs/ub-pldi25.pdf
2025-04-29 04:58:04	came across this in a paper and thought it may be relevant to devs here. I am slightly skeptical that there’s performance on the table here / that this will replicate, and it’s perhaps best dealt with by the LLVM developers, but <:shrugm:322486234142212107>

jonnyawsom3

2025-04-29 05:06:44

Clang alone is a 20% performance increase, 130% for fast lossless. It gets built as part of the tests on Github but discarded, with MSVC being uploaded to releases instead

pshufb

	Clang alone is a 20% performance increase, 130% for fast lossless. It gets built as part of the tests on Github but discarded, with MSVC being uploaded to releases instead
2025-04-29 07:33:26	My message is less about the speedup from compiler choice, and more about how a loop in clang builds of libjxl may be responsible for a lot of lost, but potentially easily recovered, performance.

jonnyawsom3

2025-04-29 07:34:35

Oh I didn't even see they used Clang, I was just saying much more than 7% is on the table

pshufb

2025-04-29 07:36:35	Unfortunately the paper doesn’t provide a whole lot of detail, and the regression is _probably_ a weird quirk of Sandy Bridge. (Which is weird since the Ivy Bridge cores aren’t much different from Sandy Bridge.) It’s a shame they don’t test on a modern architecture.
	Oh I didn't even see they used Clang, I was just saying much more than 7% is on the table
2025-04-29 07:36:40	Fair!

jonnyawsom3

2025-04-29 07:37:59	> jpegxl-1.5.0 Not sure where they found that version number...
2025-04-29 07:54:34	Ahh, it's a benchmarking suite version, not a library version https://openbenchmarking.org/test/pts/jpegxl
2025-04-29 07:55:12	So they ran the tests using 0.7 too... Not exactly representative for multithreading either then

A homosapien

	You mean for progressive lossless? Because that was something else, I just mean the error conditions aren't being tested in the coverage
2025-04-30 01:21:35	No that, more like how ram usage is double the size of the image as a raw bitmap after accounting for 32 bit float, ram was still 2x higher than it should be.
2025-04-30 01:25:50	Or maybe I'm misremembering

jonnyawsom3

	A homosapien No that, more like how ram usage is double the size of the image as a raw bitmap after accounting for 32 bit float, ram was still 2x higher than it should be.
2025-04-30 03:06:36	Oh, no. The code coverage is just code that doesn't run in the tests. So corrupted files or misconfigured settings

_wb_

2025-04-30 12:50:18

Finally this is all-green again

pshufb

	So they ran the tests using 0.7 too... Not exactly representative for multithreading either then
2025-04-30 02:36:33	Great catch! Thanks for digging into it.

Demiurge

2025-04-30 09:07:56

This is your regularly scheduled reminder that <:JXL:805850130203934781> is awesome and cool.

jonnyawsom3

2025-05-02 04:51:24

Looking though some old PRs that never made it. I wasn't expecting the Game of Life as a heuristic

A homosapien

2025-05-02 04:53:55

https://tenor.com/view/game-of-life-glider-grid-pixels-repeat-gif-27605519

jonnyawsom3

2025-05-02 07:49:06	Ended up doing more than we expected, but I think it's ready now https://github.com/google/jpegli/pull/130
2025-05-02 07:49:16	We wanted to have cjpegli display the new defaults when they're triggered, but couldn't get it working. We also wanted to disable XYB when the RGB at distance 0 is triggered, since the color transform causes artifacts similar to YCbCr
2025-05-02 07:50:27	Should give better results by default now though, with multiple bugs/strange behaviours fixed and the `-d 0`/`-q 100` RGB mode improving quality by a few points more

veluca

2025-05-02 09:46:30
2025-05-02 09:46:40	first (?) jxl-rs decoded image 🙂

jonnyawsom3

2025-05-02 10:44:03

And a 40MP image no less, so much for starting small xD

Meow

2025-05-03 05:54:53

Curious about its performance

veluca

	Meow Curious about its performance
2025-05-03 06:09:59	Slow, but not even too slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)

CrushedAsian255

	veluca Slow, but not even too slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)
2025-05-03 06:10:44	Was it a VarDCT or Modular image?

veluca

2025-05-03 06:10:53

Modular

CrushedAsian255

	veluca Modular
2025-05-03 06:11:19	Simple modular or with Squeeze/RCT/Delta?

veluca

2025-05-03 06:11:38

RCT, but no squeeze or other fun stuff

jonnyawsom3

	veluca Slow, but not even too slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)
2025-05-03 06:11:44	Is that 5x slower both singlethreaded? (Prepare yourself for the barrage of questions xD)

veluca

	Is that 5x slower both singlethreaded? (Prepare yourself for the barrage of questions xD)
2025-05-03 06:12:23	yup

Tirr

2025-05-03 06:34:32	jxl-rs is currently single thread only and doesn't have any handwritten SIMD routines
2025-05-03 06:35:17	just focusing on working implementation

jonnyawsom3

2025-05-03 06:45:08

I was moreso checking if libjxl was set to singlethreaded, but yeah. Glad we've hit this milestone and I'm sure more aren't far off

Meow

2025-05-03 11:39:31

Reaching the usable status is already a milestone

veluca

2025-05-03 12:01:36

not there yet 😛

Tirr

2025-05-04 10:37:08	it seems that libjxl is creating VarDCT image that its LF quant values exceed signed 16-bit range, but isn't marked as `modular_16bit_buffers = false` https://github.com/tirr-c/jxl-oxide/issues/456
2025-05-04 10:37:20	jxl-oxide decodes the image successfully when I turn off 16-bit buffer optimization
2025-05-04 10:39:33	(the problematic sample is at `c=0 y=25 x=39` in LF image which has value of `32894`)

_wb_

2025-05-04 10:43:18

We had something similar in libjxl-tiny in the implementation of a hw encoder. I suppose we should be more accurate in the range of quant factors to ensure the quantized lf stays within Level 5 constraints.

jonnyawsom3

2025-05-04 12:58:07	Realised the changelog had been neglected, so thought I'd try and catch it up https://github.com/libjxl/libjxl/pull/4224
2025-05-04 12:59:41	I'll have to make a mental note of adding to the changelog as part of my future PRs, if applicable, rather than trying to recall what's new since the last release

RaveSteel

2025-05-05 01:53:14

Is there an ETA or any milestone that needs to be met before 0.12 releases?

jonnyawsom3

2025-05-05 01:57:10

AFAIK no set goals/dates from the core devs, but I was hoping to get all the jpegli tweaks merged and copied over before the next release, since libjxl is still where most get it from https://github.com/google/jpegli/pull/130

_wb_

2025-05-05 08:30:18

Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228 (and at faster_decode=2, some slight decode speedup, at the cost of some density) Feel free to try it out on your favorite image/corpus.

2025-05-05 08:41:54

In general there is probably still some substantial room to improve MA tree learning heuristics. In particular we should implement some post-clustering tree pruning that (recursively) removes splits that go to two leaf nodes with identical predictor and context after clustering (and identical multiplier/offset). Such splits only cause some encode/decode slowdown (since the tree is unnecessarily deep) and some signaling overhead, without giving any compression benefit, so pruning them can only improve things — it just seems a bit tricky to do the code plumbing to do this pruning. <@179701849576833024> or <@1346460706345848868> do you want to give it a shot?

veluca

2025-05-05 08:42:52

I think I should dedicate my jxl time to jxl-rs 😄 also I remember trying that out and it not being helpful, but I might misremember

Mine18

2025-05-05 08:44:13

~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~

veluca

2025-05-05 08:49:30

I still feel like we should try non-greedy tree splitting, but who has the time...

_wb_

	Mine18 ~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~
2025-05-05 08:49:48	this is for lossless, where only speed and density matter. Lossy is a trickier thing
	veluca I think I should dedicate my jxl time to jxl-rs 😄 also I remember trying that out and it not being helpful, but I might misremember
2025-05-05 08:51:25	yeah, jxl-rs is more important than slight encoder improvements

veluca

2025-05-05 08:51:25	As in, for each property do a DP to figure out the best way to split more than 2-way along that property, then repeat recursively in each subtree
	veluca As in, for each property do a DP to figure out the best way to split more than 2-way along that property, then repeat recursively in each subtree
2025-05-05 08:52:18	The decoder could also optimize for things generated that way (especially if we limit this to using, say, two properties at most), and I imagine this would be massively faster to decode too
2025-05-05 08:53:13	(two properties makes this effectively be 3 lookups in a lookup table)

_wb_

2025-05-05 08:59:53

Why 3 lookups?

veluca

2025-05-05 09:00:21

2 1d lookups to reduce the range of the properties, and 1 2d lookup for the leaf

_wb_

2025-05-05 09:00:35	ah right
2025-05-05 09:01:47	if it's limited to _n_ properties you can do it with _n_ 1D lookups followed by 1 _n_ D lookups, right?

veluca

2025-05-05 09:02:05	yup
2025-05-05 09:02:34	I imagine as soon as n starts being more than 3 or 4 the n-D lookup becomes unpractical
2025-05-05 09:02:57	(depending on the # of distinct values)

_wb_

2025-05-05 09:03:26

where the size of the _n_ D lookup table is equal to the product of the number of nodes per property (+1)

veluca

2025-05-05 09:03:34	yup
2025-05-05 09:03:41	well, number of distinct nodes
2025-05-05 09:03:56	there's already a specialized codepath for n = 1 and property = gradient/wp

_wb_

2025-05-05 09:04:44

yeah it might be lower than the number of nodes if there's repetition in the subtrees

veluca

2025-05-05 09:05:07	but tbh, even if we don't table it up, a tree which has a relatively small number of parts that all share the same property should be significantly faster to decode as is
2025-05-05 09:06:02	(basically by making a tree of 1d lookup tables)
2025-05-05 09:06:36	(or even not lookup tables, if the # of possible values is small -- just do a SIMD-fied linear search...)

_wb_

2025-05-05 09:09:21

something like 7 buckets per property (large negative, medium negative, small negative, zero, small positive, medium positive, large positive) could already be pretty effective, so I can imagine you could pick the 3 most informative properties and make a lookup table of size `7*7*7`

veluca

2025-05-05 09:10:24	yeah that would work, and the LUT would either be small or just fit in a single SIMD registers (and effectively be 3 instructions or so)
2025-05-05 09:11:10	(fwiw you don't even need to decide those buckets, you can just let the DP figure out the best 7-way split :P)

A homosapien

	_wb_ Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228 (and at faster_decode=2, some slight decode speedup, at the cost of some density) Feel free to try it out on your favorite image/corpus.
2025-05-05 09:23:47	I'm getting mixed results, and the impact to decoding speed is relatively negligible (within 0.5-2%). t's hard to say if it benefits photographic images more than non-photo
2025-05-05 09:24:12	~~Also, I think I just found another huge regression with lossless.~~
2025-05-05 09:24:47	~~I'll post it in <#803645746661425173> when I'm done double checking my numbers~~
2025-05-05 09:32:29	Nevermind, got my numbers mixed up
2025-05-05 09:33:16	Was comparing two different images by accident lol 😅
2025-05-05 09:33:54	Welp, back to work addressing the smaller regression(s) with faster decoding 3
	_wb_ Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228 (and at faster_decode=2, some slight decode speedup, at the cost of some density) Feel free to try it out on your favorite image/corpus.
2025-05-05 10:02:06	Speaking of which, can I use this multiplier for faster decoding to increase the number of buckets? It's hurting large photos for faster decoding 1-3. https://github.com/libjxl/libjxl/pull/4201#issuecomment-2849934762

_wb_

2025-05-05 10:43:05

No, that's a different parameter I think.

A homosapien

2025-05-05 10:54:02	I'm making some edits in a fork and it turns out making the histogram "less efficient" is increasing density somehow.
2025-05-05 10:56:13	Granted idk what an "efficient histogram" means. I'm just going off what the PR and Chat GPT says and changing variables around

jonnyawsom3

	Mine18 ~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~
2025-05-06 01:39:05	Me and Sapien have been discussing that. When we have time, we're going to try changing values to their previous states, to see if we can get the quality back without outright reverting the PR
	_wb_ Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228 (and at faster_decode=2, some slight decode speedup, at the cost of some density) Feel free to try it out on your favorite image/corpus.
2025-05-06 01:44:24	The only difference is actually in non-static properties. Level 1 already disables WP entirely, giving a 2x speedup thanks to using the kNoWP tree type

Mine18

	Me and Sapien have been discussing that. When we have time, we're going to try changing values to their previous states, to see if we can get the quality back without outright reverting the PR
2025-05-06 03:57:56	hopefully the solution to the regression gets found sooner or later

jonnyawsom3

2025-05-06 07:06:12

I'm struggling to understand it due to my lack of C++ experience, but assuming `gi` is Global Image and `sg` is Single Group, isn't this trying to select per-group RCTs by measuing the entire image instead? <https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L1462>

_wb_

2025-05-06 07:24:35	Nah, at that point in the code, `gi` is a group image 🙂
2025-05-06 07:27:28	https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L1363

jonnyawsom3

2025-05-06 07:35:48	Ahh, we're trying to figure out why the RCT selection is worse than explicitly setting YCoCg in our tests, since it should be trying each RCT and then using the best result
2025-05-06 07:41:20	Similar was happening with per-group palette and squeeze for progressive lossless, we think it must be basing decisions on the smallest step
	_wb_ Nah, at that point in the code, `gi` is a group image 🙂
2025-05-06 08:42:18	EstimateCost is at the top of the file, so is that using the full image? https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L273 "at that point in the code" has me wanting to double check haha. Regardless, we're testing it now and globally setting a specific RCT is better than allowing local RCTs, so something is definitely wrong with the cost estimation

A homosapien

2025-05-06 08:44:10

``` cjxl smol.png smol.jxl -d 0 JPEG XL encoder v0.12.0 e87f2f87 [_AVX2_,SSE4,SSE2] Compressed to 24252.3 kB (7.616 bpp). 5828 x 4371, 5.662 MP/s [5.66, 5.66], , 1 reps, 12 threads. cjxl smol.png smol.jxl -d 0 -C 6 JPEG XL encoder v0.12.0 e87f2f87 [_AVX2_,SSE4,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 24093.9 kB (7.567 bpp). 5828 x 4371, 7.448 MP/s [7.45, 7.45], , 1 reps, 12 threads. ``` Choosing a global RCT `-C 6` or `-C 10` comes really close to the RCT heuristics, or in this case, even beating it.

Melirius

_wb_ In general there is probably still some substantial room to improve MA tree learning heuristics. In particular we should implement some post-clustering tree pruning that (recursively) removes splits that go to two leaf nodes with identical predictor and context after clustering (and identical multiplier/offset). Such splits only cause some encode/decode slowdown (since the tree is unnecessarily deep) and some signaling overhead, without giving any compression benefit, so pruning them can only improve things — it just seems a bit tricky to do the code plumbing to do this pruning. <@179701849576833024> or <@1346460706345848868> do you want to give it a shot?

2025-05-06 08:44:59

Yes, good idea

jonnyawsom3

EstimateCost is at the top of the file, so is that using the full image? https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L273 "at that point in the code" has me wanting to double check haha. Regardless, we're testing it now and globally setting a specific RCT is better than allowing local RCTs, so something is definitely wrong with the cost estimation

2025-05-06 09:00:47

We added some debug printout and it seems like it *is* running per-group, but it *isn't* taking into account predictor selection. Still looking into it though

Info

JPEG XL

General chat

Voice Channels

Archived

libjxl