|
intelfx
|
2025-04-01 04:06:30
|
Probably.
|
|
|
Demiurge
|
2025-04-01 04:06:40
|
What's wrong about it?
|
|
|
intelfx
|
2025-04-01 04:06:48
|
You don't need a separate process to run a WASM VM.
|
|
|
Demiurge
|
2025-04-01 04:07:48
|
You can link a wasm vm into your process but copying memory in and out of the vm is no different than serializing data between different processes. Just with even MORE overhead.
|
|
|
intelfx
|
2025-04-01 04:07:54
|
That's incorrect.
|
|
|
Demiurge
|
2025-04-01 04:08:19
|
Really?
|
|
2025-04-01 04:08:34
|
I don't understand then.
|
|
|
intelfx
|
2025-04-01 04:08:52
|
Yup, really. Context switches are expensive and are only getting more expensive (Meltdown/Spectre says hello).
|
|
2025-04-01 04:09:31
|
Besides, who said you need to copy in and out of the VM? If it's a few tens (or hundreds) of bytes it's easier to copy and virtually free, if it's more, you can absolutely do something to access WASM buffers directly.
|
|
2025-04-01 04:10:07
|
Anyway, embedding native code in JXL bitstream is an obvious non-starter for obvious reasons of ISA dependency. Even if you could sandbox it perfectly with zero cost (you can't).
|
|
|
Demiurge
|
|
intelfx
Anyway, embedding native code in JXL bitstream is an obvious non-starter for obvious reasons of ISA dependency. Even if you could sandbox it perfectly with zero cost (you can't).
|
|
2025-04-01 04:11:51
|
Oh, no one was arguing for that. Jon was just saying it might be worthwhile to make a very simple and basic JIT that generates native instructions to decode the MA trees.
|
|
|
intelfx
|
2025-04-01 04:12:04
|
Ah, then I misread that part.
|
|
2025-04-01 04:12:16
|
Then you have a JIT already, what's the problem with shipping WASM bitcode instead of that? :)
|
|
|
Demiurge
|
2025-04-01 04:12:31
|
It's a kind of risky idea but MAYBE it could be made guaranteed-safe if someone was very careful and clever?
|
|
|
intelfx
|
2025-04-01 04:12:42
|
They were, and they made wasm...
|
|
2025-04-01 04:12:52
|
Why reinvent the wheel?
|
|
|
Demiurge
|
2025-04-01 04:13:56
|
Well the MA trees have a lot simpler requirements than wasm. That would be like killing a mosquito with a grenade launcher
|
|
2025-04-01 04:14:43
|
wasm doesn't have good support for simd instructions yet either right?
|
|
|
intelfx
|
2025-04-01 04:18:22
|
There is support for simd, has been for quite some time. You can use SIMD from Rust compiled to wasm32, for instance. Is there any indication that it isn't good?
|
|
2025-04-01 04:18:53
|
Anyway. TL;DR of my position.
- Linux namespaces are totally irrelevant because they leave GIANT attack surface which is extremely excessive for running untrusted native code (Linux namespaces except userns aren't, and never were, a security boundary; userns are kinda trying to be, but a very shoddy one)
- Plan 9 mooning is totally irrelevant as well because we have it already and it's called seccomp mode 1, literally designed for running untrusted binary code with almost zero attack surface
- If the amount of untrusted binary code is tiny, the setup and communication overhead of ANY external process (be it seccomp 1 or whatever Plan 9 had) would be MASSIVELY exceeding the cost of the code itself
- if you want/need a JIT anyway, just use WASM (in whatever part of the pipeline) and don't reinvent the wheel
|
|
|
jonnyawsom3
|
2025-04-01 04:22:38
|
<:JXL:805850130203934781>
|
|
|
Demiurge
|
|
intelfx
Anyway. TL;DR of my position.
- Linux namespaces are totally irrelevant because they leave GIANT attack surface which is extremely excessive for running untrusted native code (Linux namespaces except userns aren't, and never were, a security boundary; userns are kinda trying to be, but a very shoddy one)
- Plan 9 mooning is totally irrelevant as well because we have it already and it's called seccomp mode 1, literally designed for running untrusted binary code with almost zero attack surface
- If the amount of untrusted binary code is tiny, the setup and communication overhead of ANY external process (be it seccomp 1 or whatever Plan 9 had) would be MASSIVELY exceeding the cost of the code itself
- if you want/need a JIT anyway, just use WASM (in whatever part of the pipeline) and don't reinvent the wheel
|
|
2025-04-01 05:35:16
|
unveil is a filesystem namespace with no overhead, and unveil+pledge is a way for processes to revoke their own privileges and have the kernel enforce it. They're good ideas and something similar should be adopted everywhere to make security easier.
|
|
|
intelfx
|
2025-04-01 05:36:01
|
both unveil and pledge combined are less powerful than seccomp mode 1, so again totally irrelevant
|
|
|
Demiurge
|
2025-04-01 05:36:12
|
I thought simd instructions were still in the planning stages for wasm
|
|
|
intelfx
|
2025-04-01 05:36:43
|
(seccomp mode 2 _is_ pledge-equivalent, btw, and yes, filesystem namespaces are unveil, but that's again not what we need here)
|
|
|
Demiurge
|
|
intelfx
both unveil and pledge combined are less powerful than seccomp mode 1, so again totally irrelevant
|
|
2025-04-01 05:37:46
|
Less powerful? It's not about power. All of the power of seccomp is totally useless if it's incomprehensible to use.
|
|
2025-04-01 05:38:10
|
Writing secure software needs to be practical and obvious and convenient.
|
|
|
intelfx
|
2025-04-01 05:38:28
|
It's about relevance. I don't understand what's the point of bringing irrelevant sandboxing features from other OSes into the discussion, what point are you making?
|
|
|
Demiurge
|
2025-04-01 05:40:27
|
Because you mentioned seccomp, which is a uselessly incomprehensible version of what could be a useful security feature
|
|
2025-04-01 05:40:40
|
But it's only useful if people actually want to use it
|
|
2025-04-01 05:40:52
|
And no one wants to use it if it's not easy to use like pledge
|
|
|
intelfx
|
2025-04-01 05:43:12
|
originally I mentioned seccomp mode 1, which has NO configuration
|
|
2025-04-01 05:43:19
|
so it can't be incomprehensible by definition
|
|
|
Demiurge
|
2025-04-01 05:43:34
|
And that is what I mean by namespace, btw
|
|
|
intelfx
|
2025-04-01 05:43:43
|
then you are using the words wrong
|
|
|
Demiurge
|
2025-04-01 05:43:47
|
Different processes having different views of the filesystem
|
|
|
intelfx
|
2025-04-01 05:43:48
|
namespaces mean a very specific thing
|
|
|
Demiurge
|
2025-04-01 05:43:59
|
That is often called a filesystem namespace
|
|
2025-04-01 05:44:58
|
And if different processes have access to different kernel syscalls, then that isn't usually called a namespace but you can probably assume the intent or meaning still by the context
|
|
|
intelfx
|
2025-04-01 05:45:06
|
Okay, we are going in circles. Namespaces are irrelevant. To run untrusted native code, you don't need "different views of the filesystem": you need *NO* view of the filesystem, like no access to the syscalls at all. Anything else is already an infinitely larger attack surface than you want. So in context of this discussion, filesystem namespaces, or any other namespaces at all, have exactly 0% relevance.
|
|
2025-04-01 05:45:34
|
I said that like an hour ago.
|
|
|
Demiurge
And if different processes have access to different kernel syscalls, then that isn't usually called a namespace but you can probably assume the intent or meaning still by the context
|
|
2025-04-01 05:46:09
|
I don't want to have to "assume the intent or meaning". Words have defined meanings, let's use them.
|
|
|
Demiurge
|
2025-04-01 05:56:08
|
I can understand your point of view and it's a good one.
|
|
|
intelfx
originally I mentioned seccomp mode 1, which has NO configuration
|
|
2025-04-01 06:01:14
|
I don't know what mode 1 is off the top of my head. It's a linux specific API with very fine grained control and not at all friendly to the typical programmer, with not even a wrapper in the C library for it.
|
|
|
jonnyawsom3
|
2025-04-01 06:23:00
|
<#806898911091753051>?
|
|
|
Demiurge
|
|
Lilli
I could not make the chunked API work. Is there an example somewhere, where it is used? I could not find one after looking for quite a while. :/
I set up `JxlChunkedFrameInputSource` with callbacks, which I then feed to
`JxlEncoderAddChunkedFrame(frame_settings, true, chunked)`
This essentially replaces the call to `JxlEncoderAddImageFrame(frame_settings, &pixel_format, image_data, image_data_size)`
|
|
2025-04-01 06:41:22
|
Sorry no one has gotten back to you on this. Chunked encode API is pretty new and I'm not sure how it works.
|
|
2025-04-01 06:41:30
|
I'm just a lurker.
|
|
|
jonnyawsom3
|
2025-04-03 08:13:21
|
Seems like auto-merge for PRs is being blocked by a formatting error <https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178>
Just added it to my changelog PR if you want to merge it quickly <@794205442175402004> https://github.com/libjxl/libjxl/pull/4169
|
|
|
|
Lucas Chollet
|
2025-04-03 08:35:17
|
Yikes, now they fail because of my recent changes, pls revert your typo fix lol
|
|
|
jonnyawsom3
|
|
Lucas Chollet
Yikes, now they fail because of my recent changes, pls revert your typo fix lol
|
|
2025-04-03 08:40:27
|
The only failing required test is due to your [CMAKE fix not being merged](<https://github.com/libjxl/libjxl/actions/runs/14251775663/job/39946056483?pr=4169>), and yours won't merge because of my [typo fix not being merged](<https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178>), so now we're at a gridlock of who's gets manually merged first haha
|
|
|
|
Lucas Chollet
|
2025-04-03 08:42:59
|
I didn't realize that you were hitting that issue too, I was referring to that log in your PR:
```
tools/jxltran.cc:9:#include "lib/include/jxl/decode.h"
tools/jxltran.cc:10:#include "lib/include/jxl/decode_cxx.h"
Don't add "include/" to the include path of public headers.
```
Isn't it a required test?
|
|
|
jonnyawsom3
|
|
Lucas Chollet
I didn't realize that you were hitting that issue too, I was referring to that log in your PR:
```
tools/jxltran.cc:9:#include "lib/include/jxl/decode.h"
tools/jxltran.cc:10:#include "lib/include/jxl/decode_cxx.h"
Don't add "include/" to the include path of public headers.
```
Isn't it a required test?
|
|
2025-04-03 08:44:07
|
Required for merging have the indicator on the right
|
|
|
|
Lucas Chollet
|
2025-04-03 08:45:20
|
Ah, didn't realize that 😅
|
|
|
jonnyawsom3
|
2025-04-03 08:45:42
|
Though, fixing it wouldn't hurt since we appear to have plenty of time
|
|
2025-04-03 08:46:50
|
I actually have permission to close PRs on the repo with my Triage role, but unfortunately I can't merge
|
|
|
|
Lucas Chollet
|
2025-04-03 08:47:16
|
I would like to fix the CMake first, but I would need another CI run for that. I'm a bit plaing a guess game here
|
|
|
Though, fixing it wouldn't hurt since we appear to have plenty of time
|
|
2025-04-03 08:47:55
|
I will do it on my next `jxltran` PR. But you can do too if you want
|
|
|
jonnyawsom3
|
2025-04-03 08:49:35
|
It doesn't seem to break anything, so I'll let you do it as part of the jxltran work
|
|
|
Seems like auto-merge for PRs is being blocked by a formatting error <https://github.com/libjxl/libjxl/actions/runs/14235466206/job/39943279145?pr=4178>
Just added it to my changelog PR if you want to merge it quickly <@794205442175402004> https://github.com/libjxl/libjxl/pull/4169
|
|
2025-04-03 08:52:37
|
<@794205442175402004> apologies for the double ping, but you'll need to force merge that due to the stalemate with the CMAKE sjpeg fix (which will require re-approval after too). Then Auto-Merge should start working again
|
|
|
intelfx
|
2025-04-05 04:52:40
|
OK, I'm playing with progressive encoding again... Last time y'all told me that progressive lossless encoding is basically broken with effort >=5 (although in the end I failed to understand why exactly).
However, I'm also getting the same thing if I use `--progressive_dc` instead of `-p`:
```
$ cjxl path/to/png path/to/jxl -d 0
JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 7]
Compressed to 161373.8 kB including container (10.997 bpp).
10672 x 11000, 7.569 MP/s [7.57, 7.57], , 1 reps, 32 threads.
cjxl path/to/png -d 0 365,93s user 16,98s system 2143% cpu 17,861 total
$ cjxl path/to/png path/to/jxl -d 0 --progressive_dc=1
JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 7]
^C
cjxl path/to/png -d 0 --progressive_dc= 392,90s user 233,60s system 104% cpu 9:58,64 total
```
— which, to my understanding, isn't supposed to actually do progressive encoding of the main image (i.e., it does not imply `--progressive_ac` or `--responsive`). Why is this?
|
|
|
A homosapien
|
2025-04-05 05:20:30
|
`progressive_dc` and `progressive_ac` only work for the lossy mode of libjxl. For lossless they don't actually make the image progressive, all it does is disables chunked encoding, which trades encoding time for more density.
|
|
|
intelfx
|
2025-04-05 05:25:08
|
Ah, it seems my understanding was incomplete. I did understand that `{q,}progressive_ac` was only applicable to the lossy mode (as it is fundamentally about encoding the AC/"HF" VarDCT coefficients), but I thought that `progressive_dc` was basically "prepending" a frame made of the LF coefficients onto the main image, regardless of whether the main image was lossy or lossless.
|
|
|
A homosapien
|
2025-04-05 05:29:51
|
Progressive lossless uses a different technique called squeeze.
|
|
2025-04-05 05:32:37
|
|
|
2025-04-05 05:32:37
|
All of JPEG XL's features are explained really well in this technical report. Section 5.1.3 explains what squeeze does alongside some images a few pages down.
|
|
|
jonnyawsom3
|
|
intelfx
OK, I'm playing with progressive encoding again... Last time y'all told me that progressive lossless encoding is basically broken with effort >=5 (although in the end I failed to understand why exactly).
However, I'm also getting the same thing if I use `--progressive_dc` instead of `-p`:
```
$ cjxl path/to/png path/to/jxl -d 0
JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 7]
Compressed to 161373.8 kB including container (10.997 bpp).
10672 x 11000, 7.569 MP/s [7.57, 7.57], , 1 reps, 32 threads.
cjxl path/to/png -d 0 365,93s user 16,98s system 2143% cpu 17,861 total
$ cjxl path/to/png path/to/jxl -d 0 --progressive_dc=1
JPEG XL encoder v0.11.1 794a5dcf [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 7]
^C
cjxl path/to/png -d 0 --progressive_dc= 392,90s user 233,60s system 104% cpu 9:58,64 total
```
— which, to my understanding, isn't supposed to actually do progressive encoding of the main image (i.e., it does not imply `--progressive_ac` or `--responsive`). Why is this?
|
|
2025-04-05 07:21:49
|
We're actively working on progressive lossless, if not almost done with it already https://discord.com/channels/794206087879852103/803645746661425173/1357850229247840401
Right now, progressive_dc disabling chunked works around an [issue](<https://github.com/libjxl/libjxl/issues/3823#issuecomment-2351120650>) with the TOC, where the 1:8 LF frame can't be rendered until the end of the file. Downside being the entire image has to be processed as a whole, instead of threading individual groups.
Squeeze, used for progressive lossless, also disables chunked due to downsampling the image as part of the transform
|
|
|
intelfx
|
2025-04-05 12:24:02
|
We're actively working on progressive
|
|
|
Crite Spranberry
|
2025-04-05 04:46:40
|
So I'm trying to compile libjxl and I followed this guide
https://github.com/libjxl/libjxl/blob/main/doc/developing_in_windows_vcpkg.md
Visual Studio doesn't show any errors, but I just have no binaries at all in the /out/build/x64-Clang-Release/tools folder
|
|
|
A homosapien
|
2025-04-05 05:53:09
|
I recommend using msys2, it's an easier process and it generates faster binaries
|
|
|
Crite Spranberry
|
2025-04-06 11:29:13
|
I got further by setting BUILD_TESTING to OFF in CMakeSettings.json
|
|
2025-04-06 11:31:30
|
Why does this always happen
|
|
2025-04-06 11:31:40
|
|
|
2025-04-06 11:33:57
|
What am I doing where everything always fails with the most obscure errors that nobody else has?
|
|
2025-04-06 11:56:49
|
huh msys2 just worked
|
|
2025-04-06 11:56:55
|
No errors, no bs
|
|
2025-04-06 11:57:34
|
Or well ig one error I had to work around (my cmake version is 4.0.0 idk wtf it on about)
|
|
2025-04-06 11:58:46
|
Why are these like quadruple the size they should be
|
|
2025-04-06 11:59:10
|
Actually more like 20-30x the size
|
|
2025-04-06 12:04:23
|
and they don't even work wtf
|
|
|
Demiurge
|
2025-04-06 12:36:11
|
Did it download and compile dependencies like brotli, hwy, skia/skcms and whatever?
|
|
2025-04-06 12:36:46
|
It's kind of a pain to compile libjxl
|
|
2025-04-06 12:37:10
|
But totally doable
|
|
|
jonnyawsom3
|
2025-04-06 12:54:54
|
Homosapien had the same issues with it not being static and massively larger. We fixed thrm (mostly) and he mentioned rewriting the build docs
|
|
|
Quackdoc
|
|
Crite Spranberry
and they don't even work wtf
|
|
2025-04-06 05:00:46
|
you compiled dynamic, you need to copy DLLs to the folder the exec is in
|
|
|
Homosapien had the same issues with it not being static and massively larger. We fixed thrm (mostly) and he mentioned rewriting the build docs
|
|
2025-04-06 05:01:20
|
in isolation static will always be smaller
|
|
|
spider-mario
|
|
Quackdoc
you compiled dynamic, you need to copy DLLs to the folder the exec is in
|
|
2025-04-06 05:29:10
|
or run them from the mingw shell you built them in, as it will have the `PATH` set appropriately
|
|
|
Crite Spranberry
Why are these like quadruple the size they should be
|
|
2025-04-06 05:29:53
|
maybe a debug (or at least unoptimised) build? you can run `cmake -DCMAKE_BUILD_TYPE=Release .` from the build directory (or edit `CMAKE_BUILD_TYPE` in `CMakeCache.txt` directly) and rebuild
|
|
|
jonnyawsom3
|
|
Quackdoc
in isolation static will always be smaller
|
|
2025-04-06 06:37:13
|
Our static builds are still 2-4x larger than the github releases. 10mb per binary
|
|
|
Quackdoc
|
2025-04-06 06:37:28
|
[av1_woag](https://cdn.discordapp.com/emojis/852007419474608208.webp?size=48&name=av1_woag)
|
|
|
jonnyawsom3
|
2025-04-06 06:37:59
|
It was 50, but that was debug as Mario said
|
|
|
spider-mario
|
2025-04-06 07:41:02
|
note that by default, a CMake build won’t be stripped
|
|
2025-04-06 07:41:22
|
so you can do that right after building, or add `-s` (if I recall correctly) to the linker flags
|
|
|
Crite Spranberry
|
|
spider-mario
maybe a debug (or at least unoptimised) build? you can run `cmake -DCMAKE_BUILD_TYPE=Release .` from the build directory (or edit `CMAKE_BUILD_TYPE` in `CMakeCache.txt` directly) and rebuild
|
|
2025-04-06 09:10:37
|
I did that, which changed my command to this
```
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_FORCE_SYSTEM_BROTLI=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 ..
```
The exes are still slightly too large, but much closer to normal release
From here how would I make a stripped static build?
|
|
|
jonnyawsom3
|
2025-04-06 09:42:02
|
<@207980494892040194> didn't you say two static flags are required?
|
|
|
spider-mario
|
2025-04-06 10:31:03
|
`-DCMAKE_EXE_LINKER_FLAGS=-s` might help
|
|
2025-04-06 10:31:45
|
(that, or running `strip` on all executables yourself)
|
|
|
A homosapien
|
|
Crite Spranberry
I did that, which changed my command to this
```
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_FORCE_SYSTEM_BROTLI=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 ..
```
The exes are still slightly too large, but much closer to normal release
From here how would I make a stripped static build?
|
|
2025-04-06 11:05:02
|
You need to specify two flags for a truly static libjxl. `-DBUILD_SHARED_LIBS=OFF` and `-DJPEGXL_STATIC=ON`. Also I recommend removing `-DJPEGXL_FORCE_SYSTEM_BROTLI=ON`, lots of errors pop up and the build fails with it on.
|
|
2025-04-06 11:10:58
|
Also I recommend using clang, according to my testing it's around 5-10% faster than GCC
|
|
|
Crite Spranberry
|
|
spider-mario
`-DCMAKE_EXE_LINKER_FLAGS=-s` might help
|
|
2025-04-06 11:41:44
|
This does decrease the size, but static builds are still 1-2MB bigger than official release
|
|
2025-04-06 11:41:49
|
I'll see about trying GCC
|
|
2025-04-06 11:42:03
|
Also my command currently
```
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_PLUGINS=ON -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_STATIC=ON -DCMAKE_EXE_LINKER_FLAGS=-s -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 ..
```
|
|
2025-04-06 11:44:35
|
pacman -S can't find clang-compiler-rt so well idk if it will work but ig I'll try
|
|
|
Crite Spranberry
What am I doing where everything always fails with the most obscure errors that nobody else has?
|
|
2025-04-06 11:46:40
|
aaaaaaaa
|
|
2025-04-06 11:46:55
|
welp I got what I could time to try build
|
|
2025-04-06 11:47:53
|
Same Cmake error even though I have 4.0 so ig I will add the minimum shit
|
|
2025-04-06 11:49:37
|
<:Think2:826218556453945364>
```
[344/379] Error: Can't generate doc since Doxygen not installed.
FAILED: CMakeFiles/doc C:/Users/Admin/Documents/GitHub/libjxl-mingw/build/CMakeFiles/doc
C:\Windows\system32\cmd.exe /C "cd /D C:\Users\Admin\Documents\GitHub\libjxl-mingw\build && false"
[353/379] Building CXX object tools/CMakeFiles/enc_fast_lossless.dir/__/lib/jxl/enc_fast_lossless.cc.obj
ninja: build stopped: subcommand failed.
+ retcode=1
```
|
|
|
A homosapien
|
2025-04-06 11:50:04
|
~~I think you don't need clang-rt to compile.~~ Also I think the inflated binary sizes are intrinsically tried to lib-c or p-threads or something like that. Not much you can do about it I think.
|
|
|
Crite Spranberry
|
2025-04-06 11:52:13
|
How do I disable doc generation with clang?
|
|
|
A homosapien
|
2025-04-06 11:52:48
|
strange, I got it to compile on my machine even though I don't have doxygen installed
|
|
2025-04-06 11:52:53
|
try adding `-DJPEGXL_ENABLE_DOXYGEN=OFF`
|
|
|
Crite Spranberry
|
2025-04-06 11:58:00
|
Now I get this
```
+ cmake --build /c/Users/Admin/Documents/GitHub/libjxl-mingw/build -- all doc
ninja: error: unknown target 'doc', did you mean 'jxl'?
```
|
|
2025-04-07 12:00:42
|
Oh I can just do this
|
|
|
A homosapien
|
2025-04-07 12:01:31
|
`Pacman -S --needed mingw-w64-clang-x86_64-compiler-rt mingw-w64-x86_64-doxygen `
|
|
|
Crite Spranberry
|
2025-04-07 12:02:45
|
wtf why do I get that error now
|
|
|
Crite Spranberry
Now I get this
```
+ cmake --build /c/Users/Admin/Documents/GitHub/libjxl-mingw/build -- all doc
ninja: error: unknown target 'doc', did you mean 'jxl'?
```
|
|
2025-04-07 12:03:31
|
I went back to the original command and I get that error and I have no idea what I did
|
|
2025-04-07 12:04:04
|
temp storing this here
```
./ci.sh opt -DBUILD_TESTING=OFF -DBUILD_SHARED_LIBS=OFF -DJPEGXL_ENABLE_BENCHMARK=OFF -DJPEGXL_ENABLE_MANPAGES=OFF -DJPEGXL_STATIC=ON -DJPEGXL_FORCE_SYSTEM_GTEST=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_BUILD_TYPE=Release -DJPEGXL_ENABLE_DOXYGEN=OFF
```
|
|
2025-04-07 12:06:15
|
I took a snapshot before I went to gcc because I knew some shit would happen so ig I'll try again
|
|
|
A homosapien
|
2025-04-07 12:07:50
|
wait you are using the script? I got it to compile like so, `export CC=clang && export CXX=clang++` and then running the regular cmake command
|
|
2025-04-07 12:08:12
|
I don't really trust that script I'll be honest
|
|
|
Crite Spranberry
|
2025-04-07 12:08:33
|
Oh I'm just following this
https://github.com/libjxl/libjxl/blob/main/doc/developing_in_windows_msys.md
|
|
2025-04-07 12:12:08
|
So DJPEGXL_ENABLE_DOXYGEN=OFF just permenantly breaks the build command even if I remove it and nuke the build folder
|
|
2025-04-07 12:12:57
|
So I get an error with doxygen so ig I'll try your method
|
|
|
A homosapien
|
|
spider-mario
note that by default, a CMake build won’t be stripped
|
|
2025-04-07 12:13:14
|
Does compiling libjxl with LTO optimizations work? It could be another way of reducing binary sizes but it always seems to fail for me.
|
|
|
Crite Spranberry
So I get an error with doxygen so ig I'll try your method
|
|
2025-04-07 12:14:07
|
yeah regular ol' cmake works just fine, and it builds faster too. Just use this command for it to use all of your cores `cmake --build . -- -j$(nproc)`
|
|
|
Crite Spranberry
|
2025-04-07 12:16:49
|
Well it's using clang now and idk if I notice any difference
The executables are still a bit bigger than the official release
|
|
|
A homosapien
|
2025-04-07 12:17:36
|
It's 20% faster than the official Windows releases so I would say it's a worthwhile trade off
|
|
|
Crite Spranberry
|
2025-04-07 12:17:52
|
cool
|
|
2025-04-07 12:20:14
|
So interesting findings, gcc compiled libjxl theoretically compatible with XP
Except I get this error and it works fine on Vista+ so idk what it's on about
|
|
2025-04-07 12:20:53
|
Or well all the executables I'm interested in (cjpegli, cjxl, djxl, and jxlinfo)
|
|
2025-04-07 12:21:32
|
They check out in Dependency Walker, but I get this with all
|
|
2025-04-07 12:27:31
|
The UTF8 manifest breaks XP
Ig I'll just compile without it
|
|
2025-04-07 12:29:39
|
Nevermind it can have a UTF8 manifest, it just is picky
|
|
2025-04-07 12:32:07
|
So adding a compatibility entry makes it work
```
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
<compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</compatibility>
</assembly>
```
|
|
|
Crite Spranberry
So interesting findings, gcc compiled libjxl theoretically compatible with XP
Except I get this error and it works fine on Vista+ so idk what it's on about
|
|
2025-04-07 12:33:05
|
clang*
|
|
2025-04-07 12:33:07
|
i already forgor
|
|
2025-04-07 12:43:15
|
Well time to see if libavif fares better now with clang as well
|
|
2025-04-07 12:50:01
|
Had to install yasm but it just builds as well holy shit
|
|
2025-04-07 12:50:20
|
oop need to make static build
|
|
2025-04-07 12:57:11
|
<:Think2:826218556453945364>
|
|
2025-04-07 01:19:20
|
I apparently can't do 32 bit build of libavif
|
|
2025-04-07 01:19:39
|
eh jxl better anyways
|
|
2025-04-07 02:13:29
|
Does clang not support utf8?
|
|
2025-04-07 02:13:38
|
Wait nvm
|
|
2025-04-07 02:13:41
|
I noticed the issue
|
|
|
A homosapien
|
2025-04-07 02:32:29
|
It would be funny to have a building doc for Windows XP <:KekDog:805390049033191445>
|
|
|
jonnyawsom3
|
|
Crite Spranberry
So interesting findings, gcc compiled libjxl theoretically compatible with XP
Except I get this error and it works fine on Vista+ so idk what it's on about
|
|
2025-04-07 06:41:27
|
Not sure what hardware you're running, but if it's period correct, you could be a good benchmark for our faster_decoding tweaks xD
|
|
|
|
Lucas Chollet
|
2025-04-07 02:52:02
|
Can I get someone to run the pipeline on [4165](<https://github.com/libjxl/libjxl/pull/4165>)
|
|
|
Crite Spranberry
|
|
Not sure what hardware you're running, but if it's period correct, you could be a good benchmark for our faster_decoding tweaks xD
|
|
2025-04-07 03:03:57
|
It's a VM, but I do have some period hardware
|
|
2025-04-07 03:04:14
|
but period hardware boring
|
|
2025-04-07 03:08:46
|
For older pre-Sandy Bridge or AVX, I have
462 Athlon XP (unlikely to run due to no SSE2)
478 Pentium 4 something or another i forgor
754 or 939 Athlon 64 I forgor again
775 Pentium 4 3.06GHz
775 C2D 1.86GHz
775 C2Q Q6700
AM3 Athlon II x4 something or another
AM3 Phenom II x4 955
AM3 Phenom II x6 1090t
1366 Xeon 5080
|
|
|
jonnyawsom3
|
2025-04-08 04:48:28
|
Interesting, I wonder why this wasn't done for jpegli, seeing as it's usually YCbCr (Could this have avoided RGB JPEGs?) <https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_modular.cc#L763>
|
|
|
damian101
|
2025-04-09 12:42:44
|
encoding in RGB is very inefficient
|
|
2025-04-09 12:44:18
|
jpegli can encode in XYB using an ICC v4 profile
|
|
|
Demiurge
|
2025-04-09 12:50:43
|
Yup, and it would be a lot more compatible with existing hardware and software if it would add a JPEG APP14 tag. Otherwise some decoders will treat it like a normal YCbCr JPEG and mess up the colors.
|
|
2025-04-09 12:55:26
|
Also jpegli STILL uses chroma subsampling by default for XYB JPEG, which is another hint for certain decoders to treat it like a normal YCbCr JPEG
|
|
2025-04-09 12:55:42
|
Since it makes no sense to use chroma subsamping for an RGB JPEG
|
|
2025-04-09 12:57:04
|
Pretty sure that would be a one-line fix too, and that alone would have a big impact. The APP14 header fix might be a 2 line fix.
|
|
|
jonnyawsom3
|
|
jpegli can encode in XYB using an ICC v4 profile
|
|
2025-04-09 01:20:39
|
That still uses RGB JPEG internally
|
|
|
damian101
|
|
That still uses RGB JPEG internally
|
|
2025-04-09 01:21:03
|
but it's not RGB
|
|
2025-04-09 01:21:23
|
I see what you mean
|
|
|
jonnyawsom3
|
2025-04-09 01:22:05
|
Yeah, I mean it could have used the usual YCoCg with YXB
|
|
|
Demiurge
|
2025-04-09 01:27:27
|
The ICC profile is applied AFTER reverse-YCoCg back to RGB
|
|
2025-04-09 01:30:42
|
And Adobe (APP14) JPEG is well supported by existing decoders
|
|
2025-04-09 01:36:54
|
Just like how CMYK JPEG is commonly supported, using the same exact Adobe tag
|
|
|
Yeah, I mean it could have used the usual YCoCg with YXB
|
|
2025-04-09 01:42:18
|
If you were to do this, then the decoder would still do the reverse YCbCr transform, but maybe it's possible to make an ICC profile that takes that into account and changes it back by applying another YCbCr transformation on top of the inverse XYB transform. Ugh, confusing. But it's probably doable.
|
|
|
username
|
2025-04-09 01:45:42
|
isn't the reason jpegli does XYB as an RGB JPEG because YCbCr in JPEG is not and cannot be lossless meaning you would get a double lossy color transform?
|
|
2025-04-09 01:46:40
|
also jpegli does chroma subsampling **on purpose** for XYB JPEGs it's not some random accident
|
|
2025-04-09 01:46:59
|
although it's not compatible with JPEG XL sadly :(
|
|
|
jonnyawsom3
|
|
username
isn't the reason jpegli does XYB as an RGB JPEG because YCbCr in JPEG is not and cannot be lossless meaning you would get a double lossy color transform?
|
|
2025-04-09 01:48:00
|
Ah yeah, I recall something along those lines
|
|
2025-04-09 01:49:15
|
Regardless, swapping the Y and X channels would give a more graceful image degredation than hot pink and green
|
|
|
Demiurge
|
2025-04-09 02:34:47
|
Graceful? I hear that it might actually be better for it to be ungraceful, so it's more obvious when something goes wrong.
|
|
2025-04-09 02:35:21
|
Also I thought the channels are already arranged "YXB"
|
|
|
jonnyawsom3
|
2025-04-09 02:37:59
|
I tried creating my own YXB JPEG by channel swapping and ICC editing, but it ended up still having a tint, but with the image still recognisable. So it might be worth it
|
|
|
Demiurge
|
2025-04-09 03:26:53
|
YBX would probably be kinda close to YCbCr right?
|
|
2025-04-09 03:27:34
|
If the goal was "graceful degradation"
|
|
2025-04-09 03:28:03
|
Which is arguably undesirable if it makes it harder to tell that a problem is present
|
|
2025-04-09 03:30:42
|
Maybe there should be some "tell" like intentionally making it look super bright and washed out without color management
|
|
2025-04-09 03:31:34
|
But that's not as obvious and cool as just making the whole thing look green
|
|
2025-04-09 03:31:42
|
😎
|
|
|
jonnyawsom3
|
2025-04-09 03:32:23
|
The B channel caused an overexposed Cr result, but I could have done something wrong
|
|
2025-04-09 03:32:45
|
Just a more graceful tint than eye searing pink and green
|
|
|
Demiurge
|
2025-04-09 07:16:09
|
Pink and green are cool tho
|
|
|
Meow
|
2025-04-10 02:47:51
|
Upgrading jpeg-xl
0.11.1 -> 0.11.1_1
👀
|
|
|
Demiurge
|
2025-04-10 08:29:51
|
Is it normal for effort=10 to give a larger file size for "JPEG lossless transcode" mode?
|
|
2025-04-10 08:30:02
|
Larger than effort=9 I mean
|
|
|
HCrikki
|
2025-04-10 08:46:56
|
i recall it performs worse than e9 for reversible jpeg transcode
|
|
|
Demiurge
|
2025-04-10 09:39:52
|
🧐 why
|
|
|
A homosapien
|
2025-04-10 11:00:14
|
Local MA trees are sometimes better than Global ones
|
|
2025-04-10 11:00:16
|
I think
|
|
|
Melirius
|
|
A homosapien
Local MA trees are sometimes better than Global ones
|
|
2025-04-11 02:35:47
|
Exactly, as separation along group number is limited to 256 splits in global tree, but not limited in local ones (each group has its own tree). Then this effect is more pronounced in large JPEGs with high variability
|
|
|
jonnyawsom3
|
2025-04-11 02:46:14
|
To be a bit more specific, an MA tree can have 255 contexts at most, so using a tree per group drastically increases the possible options
|
|
|
CrushedAsian255
|
|
To be a bit more specific, an MA tree can have 255 contexts at most, so using a tree per group drastically increases the possible options
|
|
2025-04-11 03:12:04
|
so if using local trees there can be at most 255 * group_count contexts?
|
|
|
jonnyawsom3
|
2025-04-11 03:20:29
|
As I understand it, yes
|
|
2025-04-11 03:25:48
|
I was actually contemplating changing group size dependent on image size and thread count. Using the largest group size that fully utilises the encoding threads
So for me, 1080p would be the current default of `-g 1`, 4K would be `-g 2` and 8K would be `-g 3` (based on megapixels/16 threads, compared to pixels per group size)
Depends how much of an impact on encode/decode speed it has though. Yet more testing to be done!
|
|
2025-04-11 09:36:15
|
I'm sure I've said it before, but I would've expected a memory reduction between 16 threads and 1 thread, since it's using local MA trees per group in lossless. Accounting for the image buffer being 32f instead of 8int, it's still using twice as much memory on something
|
|
|
Melirius
|
2025-04-12 03:53:58
|
Could somebody run CI on mine PR? I think I fixed all the problems, but cannot check it locally, thanks
|
|
2025-04-12 03:54:02
|
https://github.com/libjxl/libjxl/pull/4185
|
|
|
jonnyawsom3
|
2025-04-13 08:23:38
|
https://discord.com/channels/794206087879852103/1256302117379903498/1360893139233280092
|
|
|
Demiurge
|
2025-04-13 08:23:44
|
I can think of a good group size heuristic.
|
|
2025-04-13 08:26:18
|
For each possible group size, Calculate the area in px^2 of "unused space" and use the largest one?
|
|
2025-04-13 08:26:34
|
The largest one with the least unused space I mean
|
|
2025-04-13 08:27:44
|
Afaik there's no reason for small group sizes to be better than larger group size.
|
|
|
jonnyawsom3
|
2025-04-13 08:28:56
|
That's what I was thinking, but ended up just using a minimum "pixels per thread" to make sure every block was full before picking a higher group size. So all threads are always saturated in the first pass, then whatever's left runs after
|
|
|
Demiurge
Afaik there's no reason for small group sizes to be better than larger group size.
|
|
2025-04-13 08:29:08
|
Encoding speed, decoding speed, memory and density
|
|
|
Demiurge
|
2025-04-13 08:29:10
|
And I don't know how much cost "wasted space" actually has, if any at all
|
|
|
Encoding speed, decoding speed, memory and density
|
|
2025-04-13 08:30:09
|
Well in terms of density I mean, larger group sizes should not have any reason to be at a disadvantage.
|
|
|
jonnyawsom3
|
|
Demiurge
And I don't know how much cost "wasted space" actually has, if any at all
|
|
2025-04-13 08:30:55
|
From my testing, very little. At most, a memory hit from allocating the extra space, and a small speed penalty from the bigger groups. Hence why I thought I'd keep it simple with the 'minimum saturation'
|
|
|
Demiurge
Well in terms of density I mean, larger group sizes should not have any reason to be at a disadvantage.
|
|
2025-04-13 08:31:08
|
Better local MA trees in smaller groups
|
|
2025-04-13 08:33:20
|
This is what I had cooked up, split for legibility here
```C
uint64 pixels_per_thread;
pixels_per_thread = (xsize * ysize) / num_threads;
if (cparams.modular_group_size_shift == -1) {
if (cparams.speed_tier <= SpeedTier::kKitten &&
xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){
frame_header->group_size_shift = 3; }
} else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){
frame_header->group_size_shift = 2; }
} else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){
frame_header->group_size_shift = 1; }
} else {
frame_header->group_size_shift = 0; }
} else {
frame_header->group_size_shift = cparams.modular_group_size_shift;
}
```
|
|
|
Demiurge
|
|
Better local MA trees in smaller groups
|
|
2025-04-13 08:33:49
|
Oh... that makes sense.
|
|
|
jonnyawsom3
|
2025-04-13 08:34:13
|
Checks the image meets the dimensions of the group size, then checks if there's enough pixels at that group size to fill all threads, if not, try the next lower size
|
|
|
Demiurge
|
2025-04-13 08:35:17
|
If a bunch of similar dct blocks are all in the same group...
|
|
|
jonnyawsom3
|
2025-04-13 08:35:28
|
There are no DCT blocks, this is modular
|
|
|
Tirr
|
2025-04-13 08:36:21
|
also vardct has a fixed group size of 256
|
|
|
Demiurge
|
|
jonnyawsom3
|
2025-04-13 08:37:46
|
I went for hardware dependant settings since if you want the old behaviour, all you need to do is add `g 1`, and it's dependant on image resolution too so it would already vary based on input
|
|
|
Demiurge
|
2025-04-13 08:38:49
|
I think it makes more sense for it to vary based on image than based on hardware
|
|
2025-04-13 08:39:08
|
But it's not that big of a deal. Especially if it increases speed too
|
|
|
jonnyawsom3
|
2025-04-13 08:39:11
|
Gives a encode speed boost, sometimes a big decode speed boost (5x) and at worst a 0.1 bpp increase, or at best a 0.1 bpp decrease so far. I am gonna test it more though
|
|
|
Demiurge
|
2025-04-13 08:39:36
|
The density difference I would imagine is too small to matter also
|
|
|
jonnyawsom3
|
2025-04-13 08:40:34
|
Yeah, I *was* worrying about it, but you get more image-to-image variance than this ever seems to cause, and bumping up an effort level with the speed increase natually obliterates it
|
|
|
Demiurge
|
2025-04-13 08:50:51
|
The best way to improve libjxl right now is to make the source tree more logically organized into folders, making it easier to find and build only exactly what you want/need/expect/specify. While wasting the minimum amount of time figuring out the build system or fetching dependencies you don't even need. If I only need libjxl and not cjxl, I should be able to do that easily without being a cmake genius.
|
|
2025-04-13 08:52:43
|
Hopefully that also will improve programmer productivity in the long term too
|
|
|
jonnyawsom3
|
|
This is what I had cooked up, split for legibility here
```C
uint64 pixels_per_thread;
pixels_per_thread = (xsize * ysize) / num_threads;
if (cparams.modular_group_size_shift == -1) {
if (cparams.speed_tier <= SpeedTier::kKitten &&
xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){
frame_header->group_size_shift = 3; }
} else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){
frame_header->group_size_shift = 2; }
} else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){
frame_header->group_size_shift = 1; }
} else {
frame_header->group_size_shift = 0; }
} else {
frame_header->group_size_shift = cparams.modular_group_size_shift;
}
```
|
|
2025-04-13 08:53:52
|
Tried it on a few more images, the results are 'it varies'
Anything from a 1% size increase, to a 20% reduction. 25% faster encoding, or 5% slower. I'll probably stick it in a PR down the road so I can test it independently and more thoroughly
|
|
|
Demiurge
|
2025-04-13 08:56:43
|
Same thing with jpegli too. If I want to build that it should be easy to specify what I want, if I want a libjpeg-compatible static or dynamic library for linking, what version, or if I want a jpegli-specific library with jpegli symbols, and if I want to install header files for libjpeg and/or libjpegli
|
|
|
Tried it on a few more images, the results are 'it varies'
Anything from a 1% size increase, to a 20% reduction. 25% faster encoding, or 5% slower. I'll probably stick it in a PR down the road so I can test it independently and more thoroughly
|
|
2025-04-13 08:58:35
|
20% bitrate reduction? For lossy modular?
|
|
|
jonnyawsom3
|
|
Demiurge
20% bitrate reduction? For lossy modular?
|
|
2025-04-13 09:10:29
|
Lossless, and 12% because I forgot the .6 thanks to being up all night, but yeah
|
|
|
A homosapien
|
2025-04-13 09:31:25
|
All of the improvements jonny and I are working on are purely for lossless. It's somewhat easy to benchmark and gauge improvements since all we have to worry about is encode/decode speeds and density.
|
|
|
Demiurge
|
2025-04-13 09:33:49
|
12% is still pretty massive
|
|
2025-04-13 09:34:10
|
Smaller group size = 12% bitrate improvement??
|
|
2025-04-13 09:34:29
|
For progressive lossless only?
|
|
|
A homosapien
|
2025-04-13 09:53:53
|
Progressive lossless has some strange behavior, a lot of the settings which benefit regular lossless actually hurt progressive lossless. An example would be small group sizes, on average it's bad for normal lossless but good for progressive lossless. So we had to completely retune the codec for progressive.
|
|
|
jonnyawsom3
|
|
Demiurge
For progressive lossless only?
|
|
2025-04-13 09:54:03
|
Normal lossless, you can see the command in the image
|
|
2025-04-13 09:54:38
|
The group size and threading is an idea I had while we wrap up the progressive and faster decoding tweaks, but I think it's better suited as a separate PR
|
|
2025-04-13 09:56:29
|
For progressive lossless only, it's around 20% bitrate improvement and 600% encode speed improvement, 20% decode speed improvement
Faster Decoding lossless, 75% bitrate improvement and 40% decode speed improvement
|
|
|
A homosapien
|
2025-04-13 09:59:33
|
I thought progressive was more like 35 - 40% bitrate improvment?
|
|
2025-04-13 09:59:42
|
Ever since we fixed that RCT bug
|
|
|
jonnyawsom3
|
2025-04-13 09:59:54
|
Oh right yeah, with that fixed it's around 35%
|
|
2025-04-13 10:00:29
|
Some images don't like it, some love it. Heuristics are broken so we can't try both easily
|
|
|
A homosapien
|
2025-04-13 10:02:35
|
Yeah, some of the heuristics seem actively hurt progressive. So just disabling them and setting a global flag is better in most cases.
|
|
|
jonnyawsom3
|
2025-04-13 10:08:05
|
YCoCg *tends* to help more than hurt, so we're enabling it for progressive. Though I have seen a few images get bigger with it instead
|
|
|
A homosapien
Progressive lossless has some strange behavior, a lot of the settings which benefit regular lossless actually hurt progressive lossless. An example would be small group sizes, on average it's bad for normal lossless but good for progressive lossless. So we had to completely retune the codec for progressive.
|
|
2025-04-13 10:09:40
|
I only just thought, but it's the per channel (or global?) palette that breaks things. Smaller groups might be allowing it to use local palette still... Yet more testing to be done!
|
|
2025-04-13 10:10:40
|
Okay nevermind, I got them mixed up
|
|
|
A homosapien
|
2025-04-13 10:10:52
|
Yeah, we still might be able to eek out a little more bitrate saving with progressive. <:FeelsReadingMan:808827102278451241> <:jxl:1300131149867126814>
|
|
|
Demiurge
|
2025-04-13 10:11:13
|
<:JXL:805850130203934781>
|
|
|
jonnyawsom3
|
2025-04-13 10:11:18
|
```-X PERCENT, --pre-compact=PERCENT
Use global channel palette if the number of sample values is smaller
than this percentage of the nominal range.
-Y PERCENT, --post-compact=PERCENT
Use local (per-group) channel palette if the number of sample values is
smaller than this percentage of the nominal range.```
It's `-Y 0` that fixes progressive even on main cjxl
|
|
|
A homosapien
Yeah, we still might be able to eek out a little more bitrate saving with progressive. <:FeelsReadingMan:808827102278451241> <:jxl:1300131149867126814>
|
|
2025-04-13 10:17:16
|
Username found this, so we could try adding a few predictors for progressive to try
<https://github.com/libjxl/libjxl/blob/c496c521f99c13b8205c4fc4ff3eb3d652a1d1c3/lib/jxl/modular/encoding/enc_ma.cc#L535>
|
|
|
A homosapien
|
2025-04-13 10:18:47
|
Right, like a special heuristic predictor just for progressive
|
|
2025-04-13 10:19:29
|
Probably for efforts 8+
|
|
|
jonnyawsom3
|
2025-04-13 10:22:45
|
Nah, I think it could run lower. It'll only be maybe 4 predictors instead of the full 12 that P15 does. Maybe even just gradient and none for the decode speed
|
|
|
monad
|
|
I went for hardware dependant settings since if you want the old behaviour, all you need to do is add `g 1`, and it's dependant on image resolution too so it would already vary based on input
|
|
2025-04-13 09:36:22
|
not that it matters, but current default selects g2 for small enough images
|
|
|
This is what I had cooked up, split for legibility here
```C
uint64 pixels_per_thread;
pixels_per_thread = (xsize * ysize) / num_threads;
if (cparams.modular_group_size_shift == -1) {
if (cparams.speed_tier <= SpeedTier::kKitten &&
xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){
frame_header->group_size_shift = 3; }
} else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){
frame_header->group_size_shift = 2; }
} else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){
frame_header->group_size_shift = 1; }
} else {
frame_header->group_size_shift = 0; }
} else {
frame_header->group_size_shift = cparams.modular_group_size_shift;
}
```
|
|
2025-04-13 09:51:20
|
interesting concept for encoding, but wouldn't larger group sizes negatively affect decode speed on platforms with more threads than the origin? I'm also curious if it's really a net benefit to use the smallest group size in general, I recall it especially slowing down decode in some cases
|
|
|
jonnyawsom3
|
|
monad
not that it matters, but current default selects g2 for small enough images
|
|
2025-04-13 11:27:19
|
Yeah, images 400 x 400 or less, citing no multithreaded speedup and wasted space in half-full groups. In practice though, there's not much overhead from empty space in groups. Otherwise it's always g1, apart from e11 which overrides any given parameters anyway
|
|
|
monad
interesting concept for encoding, but wouldn't larger group sizes negatively affect decode speed on platforms with more threads than the origin? I'm also curious if it's really a net benefit to use the smallest group size in general, I recall it especially slowing down decode in some cases
|
|
2025-04-13 11:31:44
|
From our testing in <#1358733203619319858>, g0 is always faster, to the point we made faster decoding level 2 and up force g0. And with the overhauled decoding levels, I'm hoping people will be more likely to use them if they want to make sure decoding is fast.
The odds are highly unlikely the image exactly matches group dimensions, so there'll be extra groups that more threads can still take advantage of too. Surprisingly, g0 even helps when threads are already saturated at higher levels. Best guess is the shallower MA trees allow quicker traversal per group
|
|
2025-04-13 11:34:17
|
As always with image compression, results vary based on image content, but it seems to be a slight density improvement while making sure the threads you specify are actually being used. I think it's because the higher resolution the image, the more likely it is to have slow gradients instead of sharp edges, making larger groups more effective
|
|
|
itszn
|
2025-04-14 12:03:50
|
not quite sure what channel to post this, but I have something kinda cool to show off :)
I write security puzzles for hacking competitions (CTFs) and this year modified libjxl to include some extra predictor opcodes which had vulnerabilities. Teams had to craft a jxl image which could exploit these vulnerabilities and get code execution when the image is rendered to png.
https://github.com/Nautilus-Institute/quals-2025/tree/main/jxl4fun
Attached is what my final exploit image looked like, all of the grey parts are memory address leaks propagating through the predictor operators. It calculates ASLR (Address Space Layout Randomization, an exploit mitigation) bypass offsets using various operators. Finally the red pixels are from a new operator I added which had a out-of-bounds vulnerability. This is where the actual exploit triggers :)
Anyway I had a lot of fun learning libjxl internals so that I could modify it in this way for the puzzle. I hope some of you can appreciate the exploit and the puzzle
|
|
|
jonnyawsom3
|
2025-04-14 12:15:06
|
Ooh, intruiging
|
|
|
itszn
not quite sure what channel to post this, but I have something kinda cool to show off :)
I write security puzzles for hacking competitions (CTFs) and this year modified libjxl to include some extra predictor opcodes which had vulnerabilities. Teams had to craft a jxl image which could exploit these vulnerabilities and get code execution when the image is rendered to png.
https://github.com/Nautilus-Institute/quals-2025/tree/main/jxl4fun
Attached is what my final exploit image looked like, all of the grey parts are memory address leaks propagating through the predictor operators. It calculates ASLR (Address Space Layout Randomization, an exploit mitigation) bypass offsets using various operators. Finally the red pixels are from a new operator I added which had a out-of-bounds vulnerability. This is where the actual exploit triggers :)
Anyway I had a lot of fun learning libjxl internals so that I could modify it in this way for the puzzle. I hope some of you can appreciate the exploit and the puzzle
|
|
2025-04-14 12:19:25
|
This may interest you too https://github.com/google/google-ctf/tree/main/2023/quals/rev-jxl/solution
|
|
|
itszn
|
|
This may interest you too https://github.com/google/google-ctf/tree/main/2023/quals/rev-jxl/solution
|
|
2025-04-14 12:21:45
|
Yup, I've seen that one thanks for sharing :) Cool that you know about it.
For mine I wanted to take it all the way to code exec. I was inspired by this exploit: https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html
Which does similar style things in the JBIG2 image library to calculate offsets
|
|
|
jonnyawsom3
|
2025-04-14 12:26:48
|
Ahhh, my favourite. I remember discovering that a while ago and thinking "They built a microprocessor inside a PDF? Why didn't I hear about this sooner!"
|
|
|
monad
|
|
From our testing in <#1358733203619319858>, g0 is always faster, to the point we made faster decoding level 2 and up force g0. And with the overhauled decoding levels, I'm hoping people will be more likely to use them if they want to make sure decoding is fast.
The odds are highly unlikely the image exactly matches group dimensions, so there'll be extra groups that more threads can still take advantage of too. Surprisingly, g0 even helps when threads are already saturated at higher levels. Best guess is the shallower MA trees allow quicker traversal per group
|
|
2025-04-14 09:08:05
|
Maybe it's a high effort implication since that's mostly where I've permuted settings.
|
|
|
jonnyawsom3
|
|
monad
Maybe it's a high effort implication since that's mostly where I've permuted settings.
|
|
2025-04-14 09:24:59
|
I was considering limiting g3 to effort 9+ or when chunked is disabled, due to the 4x memory increase compared to current
|
|
|
monad
|
2025-04-14 11:20:35
|
I will try the suggestion posted. I tried something similar before, but quickly discarded it due to decode. Btw, at a glance it seems the "pixels_per_thread > min_target_group_pixels" enforcement ensures threads cannot be minimized when images cleanly tile with full groups. Intended?
|
|
|
jonnyawsom3
|
2025-04-14 11:45:24
|
Ah, good catch, I'll edit that now. If decode speed is still an issue, I recommend trying `--faster_decoding` in our fork <https://github.com/jonnyawsom3/libjxl/tree/FastSqueezeFixes>
It has a much cleaner scale of Density/Speed, with improvements exceeding main `--faster_decoding 4`
|
|
|
Quackdoc
|
2025-04-14 12:12:39
|
can't wait to test it in olive
|
|
|
Tirr
|
2025-04-14 12:15:32
|
in my testing fd4 got significantly faster with reasonable density tradeoff
|
|
|
jonnyawsom3
|
2025-04-14 12:36:55
|
25% faster and 25% smaller as a rule of thumb https://discord.com/channels/794206087879852103/1358733203619319858/1358735338817720330
|
|
|
monad
|
|
This is what I had cooked up, split for legibility here
```C
uint64 pixels_per_thread;
pixels_per_thread = (xsize * ysize) / num_threads;
if (cparams.modular_group_size_shift == -1) {
if (cparams.speed_tier <= SpeedTier::kKitten &&
xsize >= 1024 && ysize >= 1024 && pixels_per_thread >= 1048576){
frame_header->group_size_shift = 3; }
} else if (xsize >= 512 && ysize >= 512 && pixels_per_thread >= 262144){
frame_header->group_size_shift = 2; }
} else if (xsize >= 256 && ysize >= 256 && pixels_per_thread >= 65536){
frame_header->group_size_shift = 1; }
} else {
frame_header->group_size_shift = 0; }
} else {
frame_header->group_size_shift = cparams.modular_group_size_shift;
}
```
|
|
2025-04-14 05:18:25
|
```images per bucket
g3 g2 g1 g0
20t 1 31 261 529
1t 299 306 146 71
bpp 20t dec MP/s 1t dec MP/s
git head e7 4.59 66.575 10.559
modified 20t e7 4.57 81.226 10.366
modified 1t e7 4.56 40.009 9.389
git head e8 4.47 72.912 10.044
modified 20t e8 4.46 78.802 9.717
modified 1t e8 4.45 19.530 8.164```
|
|
|
jonnyawsom3
|
2025-04-14 08:28:51
|
Intriguing
|
|
|
Demiurge
|
2025-04-17 09:26:29
|
If someone fixes the color ringing/desaturation issue, and the overzealous crushing of shadows, then libjxl will leap 2 generations ahead and surpass libaom...
|
|
|
CrushedAsian255
|
|
Demiurge
If someone fixes the color ringing/desaturation issue, and the overzealous crushing of shadows, then libjxl will leap 2 generations ahead and surpass libaom...
|
|
2025-04-18 02:25:02
|
isn't it mainly just a tuning issue?
|
|
|
Demiurge
|
2025-04-18 03:38:34
|
You could call it that
|
|
|
jonnyawsom3
|
2025-04-18 12:30:21
|
Is there anything blocking turning the jpegli folder into a submodule of the Google repo? We've noticed some commits not being mirrored and it would remove any confusion or ambiguity
|
|
|
Demiurge
|
2025-04-18 12:47:00
|
So would deleting the separate repo 😂
|
|
|
jonnyawsom3
|
2025-04-18 12:56:50
|
If we use it as a submodule, then we can treat it as an actual library instead of a growth clinging onto libjxl. Ideally also using it instead of libjpeg
|
|
|
Demiurge
|
2025-04-18 03:32:06
|
Whether it's an actual library or not doesn't depend on it being in a separate repo. It depends on the build system making it easy to build static/dynamic libraries and whether it comes with header files to make it easier to use as a library.
|
|
2025-04-18 03:33:59
|
It uses a lot of code and files from libjxl. But libjxl needs to be easier to build as a library only, without setting up a bunch of dependencies that are only used for cjxl/djxl
|
|
2025-04-18 03:36:29
|
libjpegli builds itself as a single libjpeg compatible library but doesn't let you build different API versions of the library at the same time, and it doesn't come with any header files.
|
|
2025-04-18 03:37:27
|
Those seem like far more useful things to fix than separating redundant copies of the code into confusingly diverging repos
|
|
2025-04-18 03:40:03
|
It's cool that they share lots of code and that improvements and adoption of one helps both...
|
|
|
Torn
|
2025-04-19 09:12:34
|
Might be a bit much to ask, but is there an example of initializing an ImageBundle with my own data?
(If this is the wrong channel, point me to the right one.)
|
|
|
spider-mario
|
2025-04-19 09:38:54
|
isn’t ImageBundle the internal API?
|
|
|
Torn
|
2025-04-19 09:39:30
|
It's a jxl class, yeah?
|
|
|
spider-mario
|
2025-04-19 10:07:13
|
one that is not meant to be exposed outside of libjxl
|
|
|
Torn
|
2025-04-19 10:11:15
|
I'm just trying to make a version of ssimulacra2 that accepts images as piped data, instead of command line arguments that are file names.
It seems to be quite attached to jxl and operates on ImageBundles, as far as I can tell.
|
|
|
CrushedAsian255
|
|
spider-mario
one that is not meant to be exposed outside of libjxl
|
|
2025-04-20 05:38:52
|
oops forgot to sign the ImageBundle NDA
|
|
|
monad
|
2025-04-22 08:28:38
|
guys, we can finally transcode our JPEGs https://github.com/libjxl/libjxl/pull/2704
|
|
|
jonnyawsom3
|
2025-04-22 08:33:51
|
> the fraction of JPEGs with empty DHT markers found in the wild seems to have grown recently and it is now a substantial amount
|
|
2025-04-22 08:34:01
|
Interesting, I wonder what changed
|
|
|
monad
|
2025-04-22 08:55:25
|
I think it's just change in visibility given sufficient time
|
|
|
_wb_
|
2025-04-22 10:26:14
|
I think WhatsApp for some reason started to produce jpegs with empty dht markers
|
|
|
Melirius
|
2025-04-22 02:06:30
|
OK, I've tried several approaches for DCT coefficient order determination other than simple histogram (taking into account two-coef correlations growing from beginning and end of coefficients, remaking histogram for zero-runs after each coef selection, etc.), all of them are much slower and produce on my JPEG test suite at best (2-3)*10^-5 relative size improvement, so I think to stop here and try other improvements
|
|
2025-04-22 02:08:25
|
Maybe Glacier mode can benefit from the best of them, otherwise it is pointless
|
|
|
_wb_
|
|
Melirius
OK, I've tried several approaches for DCT coefficient order determination other than simple histogram (taking into account two-coef correlations growing from beginning and end of coefficients, remaking histogram for zero-runs after each coef selection, etc.), all of them are much slower and produce on my JPEG test suite at best (2-3)*10^-5 relative size improvement, so I think to stop here and try other improvements
|
|
2025-04-23 08:39:26
|
Thanks for trying, it's of course more satisfying if there's an improvement but it's also good to know when the simpler thing is as good as it gets.
|
|
|
MSLP
|
2025-04-24 05:43:15
|
Apart from everything, it's great that jxl-rs is getting more love recently!
|
|
|
Traneptora
|
|
Torn
I'm just trying to make a version of ssimulacra2 that accepts images as piped data, instead of command line arguments that are file names.
It seems to be quite attached to jxl and operates on ImageBundles, as far as I can tell.
|
|
2025-04-25 07:33:51
|
Can you pass "-" as the filename? this works with djxl
|
|
|
Torn
|
|
Traneptora
Can you pass "-" as the filename? this works with djxl
|
|
2025-04-25 08:41:06
|
No. I already rewrote the main method. It had no logic previously to get the data in any other manner than reading it itself.
I got it to run with my own data, but either I put it at slightly wrong addresses or in the wrong format, so I'll have to continue debugging it on the weekend.
|
|
|
Traneptora
|
|
Torn
No. I already rewrote the main method. It had no logic previously to get the data in any other manner than reading it itself.
I got it to run with my own data, but either I put it at slightly wrong addresses or in the wrong format, so I'll have to continue debugging it on the weekend.
|
|
2025-04-25 09:34:13
|
possibly relevant is there's a separate ssimulacra2 repo
|
|
2025-04-25 09:34:23
|
https://github.com/cloudinary/ssimulacra2
|
|
2025-04-25 09:34:30
|
dunno how much is actually just stuff pulled from upstream libjxl
|
|
|
Torn
|
2025-04-25 09:42:49
|
Pretty much all of it.
|
|
|
_wb_
|
2025-04-25 09:48:47
|
It's identical
|
|
|
jonnyawsom3
|
2025-04-27 09:38:12
|
Assuming I didn't break anything, this should help a few longstanding issues https://github.com/google/jpegli/pull/130
|
|
|
spider-mario
|
2025-04-27 09:41:31
|
how much benefit is gained from the less compatible blue subsampling with XYB?
|
|
|
username
|
|
spider-mario
how much benefit is gained from the less compatible blue subsampling with XYB?
|
|
2025-04-27 09:45:35
|
this I guess? https://discord.com/channels/794206087879852103/803645746661425173/1362628965860380823 (not-specified = blue subsampling)
|
|
|
jonnyawsom3
|
|
spider-mario
how much benefit is gained from the less compatible blue subsampling with XYB?
|
|
2025-04-27 09:46:20
|
Subsampling of RGB JPEG at all is a bit exotic, but generally not a whole lot... The channel is already heavily quantized, making the subsampling have less impact.
In the PR I made 444 default for compatibility, and merely fixed the `--chroma_subsampling` parameter so that it doesn't subsample the Y channel for XYB JPEGs, if people choose to use subsampling anyway for the extra few percent (and sacraficing the 20% from JXL...)
|
|
2025-04-27 09:47:50
|
We did try subsampling X too, since if we're risking compatibility anyway, we might as well make the most of it. It ended up hurting image quality far too much though, so we stuck with only B subsampling instead
|
|
|
spider-mario
|
|
username
this I guess? https://discord.com/channels/794206087879852103/803645746661425173/1362628965860380823 (not-specified = blue subsampling)
|
|
2025-04-27 09:53:16
|
oh, yes, that must be it, thanks (I assume those bits are not really “per pixel”, though)
|
|
|
jonnyawsom3
|
|
Assuming I didn't break anything, this should help a few longstanding issues https://github.com/google/jpegli/pull/130
|
|
2025-04-27 10:00:00
|
We don't know why, but we also discovered that Vertical and Horizontal subsampling are swapped between YCbCr and XYB, so we had to reverse the values to get the expected results...
|
|
|
CrushedAsian255
|
2025-04-27 10:31:08
|
What happens is you subsample luma and keep chroma at 1:1
|
|
|
jonnyawsom3
|
2025-04-27 10:40:33
|
You get a blurry image
|
|
|
Demiurge
|
2025-04-27 02:32:21
|
Honestly it should refuse to use subsampling for RGB JPEG...
|
|
2025-04-27 02:33:08
|
Also what about the jpegli files in the libjxl source, are they just diverging into two separate redundant branches and files now?
|
|
2025-04-27 03:07:46
|
Good start though 👀
|
|
2025-04-27 03:50:48
|
I think usually you would put those in 3 separate pull requests though
|
|
2025-04-27 03:51:51
|
Also does it actually write the correct adobe tag or does it write a ycck tag?
|
|
|
username
|
|
Demiurge
Also does it actually write the correct adobe tag or does it write a ycck tag?
|
|
2025-04-27 08:57:53
|
seems like it should write the correct value for the APP14 marker: https://github.com/google/jpegli/blob/bc19ca2393f79bfe0a4a9518f77e4ad33ce1ab7a/lib/jpegli/bitstream.cc#L58
|
|
|
jonnyawsom3
|
|
Demiurge
Honestly it should refuse to use subsampling for RGB JPEG...
|
|
2025-04-27 08:58:17
|
Probably... Maybe I could have it print a warning about compatibility issues.
Can't just make it a submodule because it has duplicate JXL code and the folder structure is a level too high.
Probably should be separate PRs. Only the second 'real' PR I've actually made along with <#1358733203619319858>, so still learning best practices. It was mostly one line changes so didn't seem worth splitting to me.
Correct tags. Default value of APP14 is 0, which is RGB and CMYK JPEG. YCbCr gets assigned 1 (Not that it's used anyway) and YCCK 2
|
|
2025-04-27 08:59:01
|
|
|
2025-04-27 09:00:18
|
<@245794734788837387> brought this up, sounds like a good idea. Using RGB at Quality 100/Distance 0 with a warning when subsampling is enabled
|
|
|
username
|
2025-04-27 09:24:30
|
it would also kinda line up with libjxl does things since it also has special behavior for distance 0
|
|
2025-04-27 09:26:09
|
oh and if there is a warning added about JXL-transcoding for XYB and quality 100 it should probably be present around the API and CLI option(s) for subsampling
|
|
2025-04-27 09:32:40
|
maybe something like "**WARNING:** Using Values other then 444 in conjunction with either XYB or Quality 100/Distance 0 will result in files that cannot be losslessly transcoded to JPEG XL!"
|
|
|
jonnyawsom3
|
2025-04-27 09:43:37
|
As previously said, subsampled RGB isn't common anyway, so it could just be
`Note: Implicit-default for Quality 100/Distance 0 is RGB JPEG`
`Warning: Subsampled RGB JPEG may cause compatibility issues`
|
|
2025-04-27 09:48:08
|
Also <@207980494892040194>, I'm gonna need a hand to fix... Everything xD
|
|
|
A homosapien
|
2025-04-27 09:48:58
|
https://tenor.com/view/meme-anime-gif-22734101
|
|
|
jonnyawsom3
|
|
A homosapien
https://tenor.com/view/meme-anime-gif-22734101
|
|
2025-04-27 10:37:27
|
Think I fixed it, somehow a missing } wasn't breaking the windows builds, so we never caught it before
|
|
2025-04-27 10:38:38
|
Also made it only subsample *YCbCr* at low quality, since if we do want to expose RGB in cjpegli, we don't want that subsampled at all
Should probably make RGB a parameter so people can disable it if they only want q100 YCbCr
|
|
2025-04-27 10:39:19
|
Though, I wonder how well subsampling everything and effectively halving the resolution but at higher quality would look....
|
|
|
A homosapien
|
|
Think I fixed it, somehow a missing } wasn't breaking the windows builds, so we never caught it before
|
|
2025-04-27 10:55:47
|
I also fixed some incorrect chroma values for 440 and renamed `distance` to `qDistance` since it was already defined earlier.
|
|
|
jonnyawsom3
|
2025-04-28 12:36:36
|
Added APP14 to CMYK and fixed the issues... Hopefully. Tests soon™
|
|
|
|
runr855
|
2025-04-28 12:46:36
|
Is there a reason for why the jxl-x86-windows-static.zip 0.11.1 release of libjxl has 19/64 detections as a trojan? That seems very high for a false positive
|
|
2025-04-28 12:46:55
|
On Virustotal
|
|
2025-04-28 12:47:20
|
And it has been detected as a trojan for months
|
|
|
Demiurge
|
|
<@245794734788837387> brought this up, sounds like a good idea. Using RGB at Quality 100/Distance 0 with a warning when subsampling is enabled
|
|
2025-04-28 01:12:33
|
Doesn't sound particularly appealing to me...
|
|
|
jonnyawsom3
|
|
Demiurge
Doesn't sound particularly appealing to me...
|
|
2025-04-28 01:13:13
|
Scores 2 points higher in SSIMULACRA, so it's something :P
|
|
|
Demiurge
|
2025-04-28 01:15:15
|
Higher than default ycbcr?
|
|
|
Scores 2 points higher in SSIMULACRA, so it's something :P
|
|
2025-04-28 01:16:45
|
That's still kinda silly
|
|
2025-04-28 01:17:17
|
RGB JPEG is kind of uncommon
|
|
|
jonnyawsom3
|
|
Demiurge
Higher than default ycbcr?
|
|
2025-04-28 01:23:45
|
`cjpeg -quality 100` vs `cjpeg -quality 100 -rgb`
|
|
|
Demiurge
RGB JPEG is kind of uncommon
|
|
2025-04-28 01:25:30
|
And quality 100 generally shouldn't be used. It'd be a default, but I'd probably add the same `-rgb` flag to cjpegli, with 0 disabling the switch and 1 forcing RGB on... If it's feasible. Along with the printout about being RGB
|
|
|
Demiurge
|
2025-04-28 01:26:49
|
RGB JPEG shouldn't be used because it's not efficient and the quant tables are not even tuned for that.
|
|
|
jonnyawsom3
|
2025-04-28 01:27:16
|
There is no quant table at q 100, it's all 1...
|
|
|
Demiurge
|
2025-04-28 01:27:18
|
But for quality 100 that's a different story since that doesn't apply
|
|
2025-04-28 01:27:24
|
Yeah
|
|
2025-04-28 01:27:53
|
Still uncommon
|
|
|
jonnyawsom3
|
2025-04-28 01:28:31
|
Didn't stop XYB, and again, it'll only be a default for cjpegli. With a message about how to disable it, similar to JPEG transcoding with cjxl
|
|
2025-04-28 01:30:28
|
But it'll probably be a sperate PR. This one was meant to fix the biggest issues. APP14, weird XYB defaults and broken subsampling, with a tweak to enable 420 at q 30
|
|
2025-04-28 01:37:33
|
Might address https://discord.com/channels/794206087879852103/1301682361502531594 at some point too, but we need to do wider testing around what quality threshold to disable it
|
|
|
username
|
|
`cjpeg -quality 100` vs `cjpeg -quality 100 -rgb`
|
|
2025-04-28 02:12:59
|
do a Xor compare of both of them against the lossless source, there are a lot less colors shifted around for RGB
|
|
|
Demiurge
RGB JPEG shouldn't be used because it's not efficient and the quant tables are not even tuned for that.
|
|
2025-04-28 02:14:34
|
if someone is specifying "quality 100" then I don't think they care about size
|
|
2025-04-28 02:17:15
|
I would presume the amount of people who care about size when defining the maximum available quality value are wayyy less then the people defining it because they expressly don't care about size and want a reference exchange file
|
|
|
Demiurge
|
2025-04-28 03:09:21
|
What difference does it actually make though?
|
|
2025-04-28 03:09:47
|
Aside from compatibility possibly.
|
|
2025-04-28 03:11:00
|
Possibly larger file size for no actual increase in fidelity?
|
|
|
jonnyawsom3
|
2025-04-28 03:17:40
|
We *just* said that it scores 2 points higher...
|
|
|
username
|
|
Demiurge
What difference does it actually make though?
|
|
2025-04-28 03:19:44
|
I have seen on multiple occasions both people and companies use "quality 100" JPEGs as original references or as an intermediate format or "master copy", such as for example an artist exporting a ref sheet with defined color areas you are supposed to use a color picker on **OR** a company serving and processing millions or more images a day.
|
|
|
Demiurge
|
|
We *just* said that it scores 2 points higher...
|
|
2025-04-28 03:21:27
|
That doesn't demonstrate anything though frankly.
|
|
2025-04-28 03:23:06
|
Comparisons with color gradients would be an excellent real demonstration though
|
|
|
username
I have seen on multiple occasions both people and companies use "quality 100" JPEGs as original references or as an intermediate format or "master copy", such as for example an artist exporting a ref sheet with defined color areas you are supposed to use a color picker on **OR** a company serving and processing millions or more images a day.
|
|
2025-04-28 03:23:39
|
What are those white sqares and why do the originals have such bad banding?
|
|
|
jonnyawsom3
|
|
Demiurge
That doesn't demonstrate anything though frankly.
|
|
2025-04-28 03:24:42
|
YCbCr and RGB at q100 compared to the original with XOR
|
|
|
username
|
|
Demiurge
What are those white sqares and why do the originals have such bad banding?
|
|
2025-04-28 03:26:44
|
https://cloudinary.com/blog/why_jpeg_is_like_a_photocopier#why_does_this_happen_
|
|
|
Demiurge
|
|
YCbCr and RGB at q100 compared to the original with XOR
|
|
2025-04-28 03:26:44
|
This is a cool comparison too, but not as convincing as just showing some color gradients, side by side.
|
|
|
username
|
2025-04-28 03:26:55
|
~~trolling arc~~
|
|
|
jonnyawsom3
|
2025-04-28 03:27:16
|
At least they haven't brought up noise again
|
|
|
username
|
2025-04-28 03:29:08
|
maybe they are genuinely worried about the compatibility concern although In my testing RGB JPEGs seem to work just fine in most software
|
|
|
Demiurge
|
2025-04-28 03:29:11
|
I'm sincerely asking questions and sincerely wondering how much of a difference it makes. It's unfortunate you assume I have bad intentions.
|
|
|
username
|
2025-04-28 03:30:11
|
with the context of your messages being in relation to a compatibility concern with software they make more sense
|
|
2025-04-28 03:30:48
|
otherwise they seem like you are either ignoring or don't understand what is being presented to you and why
|
|
|
Demiurge
|
2025-04-28 03:30:53
|
The xor comparison for example is a cool visualization but a better visualization would be a worst-case image like RGB color gradients and comparing the difference side by side.
|
|
|
jonnyawsom3
|
|
Demiurge
This is a cool comparison too, but not as convincing as just showing some color gradients, side by side.
|
|
2025-04-28 03:32:55
|
Original, YCbCr, RGB
|
|
|
Demiurge
|
2025-04-28 03:33:41
|
Nice! See? That very effectively demonstrates that it makes a real and positive difference.
|
|
2025-04-28 03:34:01
|
That's all I was asking.
|
|
|
jonnyawsom3
|
2025-04-28 03:34:06
|
Gradients were actually a good shout, my fairly noisy test image was masking most of it
|
|
|
Demiurge
|
2025-04-28 03:34:18
|
Exactly.
|
|
2025-04-28 03:34:38
|
I'm happy now. You demonstrated exactly what I was asking about.
|
|
2025-04-28 03:35:16
|
It's not trolling to ask a sincere and fair question...
|
|
2025-04-28 03:36:24
|
I genuinely didn't know if the color transformation would actually make a difference in practice.
|
|
|
jonnyawsom3
|
2025-04-28 03:37:23
|
It's nearly 5am and I *really* didn't wanna go into Krita to try and make a comparison image... Then I realised I had that 10-bit test image and could just let Discord do the comparing for me
|
|
|
username
|
|
Demiurge
It's not trolling to ask a sincere and fair question...
|
|
2025-04-28 03:42:17
|
I guess my reason for confusion was I couldn't gauge exactly *why* you kept seemingly fighting against a change that when presented as improving color sampling accuracy in a case where people treat something as a giant reference image. psychovisual tuning vs mathematical similarity or something idk
|
|
|
jonnyawsom3
|
2025-04-28 03:42:43
|
Bonus YCbCr vs XYB
|
|
|
username
|
2025-04-28 03:43:30
|
XYB still has that issue for me where stuff becomes darker
|
|
|
jonnyawsom3
|
|
username
XYB still has that issue for me where stuff becomes darker
|
|
2025-04-28 03:47:27
|
Importing to Krita the image is darker, but strangely converting the layer from XYB to sRGB fixes it, as if it's using the wrong transfer or something for rendering. In Irfanview it has a pink tint...
|
|
|
Demiurge
|
|
username
I guess my reason for confusion was I couldn't gauge exactly *why* you kept seemingly fighting against a change that when presented as improving color sampling accuracy in a case where people treat something as a giant reference image. psychovisual tuning vs mathematical similarity or something idk
|
|
2025-04-28 03:47:48
|
I was skeptical of what the actual difference was or whether it could actually be demonstrated.
|
|
2025-04-28 03:48:07
|
Or if it was just an assumption
|
|
|
jonnyawsom3
|
2025-04-28 03:48:38
|
If I'm honest I was skeptical it would show anything in a gradient, but then it clicked that an RGB gradient would be best *in* RGB, naturally
|
|
|
Demiurge
|
|
If I'm honest I was skeptical it would show anything in a gradient, but then it clicked that an RGB gradient would be best *in* RGB, naturally
|
|
2025-04-28 03:49:13
|
Yep, it's basically a worst case scenario and the best contrived example to demonstrate the difference
|
|
2025-04-28 03:50:15
|
But it's still real enough to matter
|
|
2025-04-28 03:51:06
|
RGB looks just like the original whereas xyb and ycbcr are uneven steps
|
|
|
jonnyawsom3
|
2025-04-28 03:52:17
|
....it's the god damn gamma again
|
|
|
Demiurge
|
2025-04-28 03:52:41
|
To be fair, the uneven steps are because of rounding errors that can theoretically be fixed in the decoder/cms, but that's a whole other can of worms. And you're not going to fix everyone's broken software.
|
|
|
jonnyawsom3
|
2025-04-28 03:53:42
|
Old XYB, New XYB (Stripped gAMA from the PNG)
|
|
|
Demiurge
|
2025-04-28 03:54:44
|
The gAMA tag was messing up the png xyb?
|
|
|
jonnyawsom3
|
2025-04-28 03:55:35
|
cjpegli uses the guts of libjxl, so it correctly handles gamma in PNGs... Everything else, doesn't. So the XYB looks wrong in comparison
|
|
|
runr855
Is there a reason for why the jxl-x86-windows-static.zip 0.11.1 release of libjxl has 19/64 detections as a trojan? That seems very high for a false positive
|
|
2025-04-28 05:25:51
|
Interesting, seems jpegli is the main culprit, not sure why though
|
|
|
Demiurge
|
2025-04-28 06:10:31
|
Lots of virus engines classify very broad categories as "trojan" like for example anything with the curl dll
|
|
2025-04-28 06:10:49
|
what do the virus engines call the supposed trojan?
|
|
|
|
runr855
|
2025-04-28 12:24:04
|
I believe it would be worth investigating. Widows Defender reacts to it, so no Windows users can use it without Defender intervening
|
|
2025-04-28 12:24:21
|
There is also the risk of supply chain attacks, which I don't think should be forgotten completely
|
|
|
novomesk
|
|
runr855
I believe it would be worth investigating. Widows Defender reacts to it, so no Windows users can use it without Defender intervening
|
|
2025-04-28 04:16:42
|
https://www.virustotal.com/gui/file/aa950f4d37abc1e52a5dbca153479b7cba0303e35331deb7d5ee5b18adf7a23b
It is necessary to contact those AV companies and to report the case as False Positive.
I recommend to start with Avast/AVG - same company, same detection.
BitDefender's engine is used by more different products - so resolving it there has big impact.
|
|
|
_wb_
|
2025-04-29 12:45:42
|
If someone feels like it, feel free to check https://app.codecov.io/gh/libjxl/libjxl?search=&displayType=list and try to figure out what's up with those ~15% lines of code currently not covered by tests.
It could be various things:
- missing tests that should actually be there
- various rather trivial error conditions (e.g. invalid api usage) that we didn't bother to add tests for (though maybe we should?)
- dead code that can be removed
- dead code because of a bug
|
|
|
jonnyawsom3
|
2025-04-29 01:07:47
|
Seems to be errors or untested encode parameters like keeping invisible pixels
|
|
|
A homosapien
|
|
Seems to be errors or untested encode parameters like keeping invisible pixels
|
|
2025-04-29 01:52:59
|
Do you think that could explain the excessive ram usage? I remember you said the math wasn't adding up.
|
|
|
Melirius
|
|
_wb_
If someone feels like it, feel free to check https://app.codecov.io/gh/libjxl/libjxl?search=&displayType=list and try to figure out what's up with those ~15% lines of code currently not covered by tests.
It could be various things:
- missing tests that should actually be there
- various rather trivial error conditions (e.g. invalid api usage) that we didn't bother to add tests for (though maybe we should?)
- dead code that can be removed
- dead code because of a bug
|
|
2025-04-29 02:04:05
|
Will try to check
|
|
|
jonnyawsom3
|
|
A homosapien
Do you think that could explain the excessive ram usage? I remember you said the math wasn't adding up.
|
|
2025-04-29 02:06:54
|
You mean for progressive lossless? Because that was something else, I just mean the error conditions aren't being tested in the coverage
|
|
|
pshufb
|
2025-04-29 04:57:47
|
https://web.ist.utl.pt/nuno.lopes/pubs/ub-pldi25.pdf
|
|
2025-04-29 04:58:04
|
came across this in a paper and thought it may be relevant to devs here. I am slightly skeptical that there’s performance on the table here / that this will replicate, and it’s perhaps best dealt with by the LLVM developers, but <:shrugm:322486234142212107>
|
|
|
jonnyawsom3
|
2025-04-29 05:06:44
|
Clang alone is a 20% performance increase, 130% for fast lossless. It gets built as part of the tests on Github but discarded, with MSVC being uploaded to releases instead
|
|
|
pshufb
|
|
Clang alone is a 20% performance increase, 130% for fast lossless. It gets built as part of the tests on Github but discarded, with MSVC being uploaded to releases instead
|
|
2025-04-29 07:33:26
|
My message is less about the speedup from compiler choice, and more about how a loop in clang builds of libjxl may be responsible for a lot of lost, but potentially easily recovered, performance.
|
|
|
jonnyawsom3
|
2025-04-29 07:34:35
|
Oh I didn't even see they used Clang, I was just saying much more than 7% is on the table
|
|
|
pshufb
|
2025-04-29 07:36:35
|
Unfortunately the paper doesn’t provide a whole lot of detail, and the regression is _probably_ a weird quirk of Sandy Bridge. (Which is weird since the Ivy Bridge cores aren’t much different from Sandy Bridge.) It’s a shame they don’t test on a modern architecture.
|
|
|
Oh I didn't even see they used Clang, I was just saying much more than 7% is on the table
|
|
2025-04-29 07:36:40
|
Fair!
|
|
|
jonnyawsom3
|
2025-04-29 07:37:59
|
> jpegxl-1.5.0
Not sure where they found that version number...
|
|
2025-04-29 07:54:34
|
Ahh, it's a benchmarking suite version, not a library version https://openbenchmarking.org/test/pts/jpegxl
|
|
2025-04-29 07:55:12
|
So they ran the tests using 0.7 too... Not exactly representative for multithreading either then
|
|
|
A homosapien
|
|
You mean for progressive lossless? Because that was something else, I just mean the error conditions aren't being tested in the coverage
|
|
2025-04-30 01:21:35
|
No that, more like how ram usage is double the size of the image as a raw bitmap after accounting for 32 bit float, ram was still 2x higher than it should be.
|
|
2025-04-30 01:25:50
|
Or maybe I'm misremembering
|
|
|
jonnyawsom3
|
|
A homosapien
No that, more like how ram usage is double the size of the image as a raw bitmap after accounting for 32 bit float, ram was still 2x higher than it should be.
|
|
2025-04-30 03:06:36
|
Oh, no. The code coverage is just code that doesn't run in the tests. So corrupted files or misconfigured settings
|
|
|
_wb_
|
2025-04-30 12:50:18
|
Finally this is all-green again
|
|
|
pshufb
|
|
So they ran the tests using 0.7 too... Not exactly representative for multithreading either then
|
|
2025-04-30 02:36:33
|
Great catch! Thanks for digging into it.
|
|
|
Demiurge
|
2025-04-30 09:07:56
|
This is your regularly scheduled reminder that <:JXL:805850130203934781> is awesome and cool.
|
|
|
jonnyawsom3
|
2025-05-02 04:51:24
|
Looking though some old PRs that never made it. I wasn't expecting the Game of Life as a heuristic
|
|
|
A homosapien
|
2025-05-02 04:53:55
|
https://tenor.com/view/game-of-life-glider-grid-pixels-repeat-gif-27605519
|
|
|
jonnyawsom3
|
2025-05-02 07:49:06
|
Ended up doing more than we expected, but I think it's ready now https://github.com/google/jpegli/pull/130
|
|
2025-05-02 07:49:16
|
We wanted to have cjpegli display the new defaults when they're triggered, but couldn't get it working. We also wanted to disable XYB when the RGB at distance 0 is triggered, since the color transform causes artifacts similar to YCbCr
|
|
2025-05-02 07:50:27
|
Should give better results by default now though, with multiple bugs/strange behaviours fixed and the `-d 0`/`-q 100` RGB mode improving quality by a few points more
|
|
|
|
veluca
|
2025-05-02 09:46:30
|
|
|
2025-05-02 09:46:40
|
first (?) jxl-rs decoded image 🙂
|
|
|
jonnyawsom3
|
2025-05-02 10:44:03
|
And a 40MP image no less, so much for starting small xD
|
|
|
Meow
|
2025-05-03 05:54:53
|
Curious about its performance
|
|
|
|
veluca
|
|
Meow
Curious about its performance
|
|
2025-05-03 06:09:59
|
Slow, but not even *too* slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)
|
|
|
CrushedAsian255
|
|
veluca
Slow, but not even *too* slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)
|
|
2025-05-03 06:10:44
|
Was it a VarDCT or Modular image?
|
|
|
|
veluca
|
2025-05-03 06:10:53
|
Modular
|
|
|
CrushedAsian255
|
|
veluca
Modular
|
|
2025-05-03 06:11:19
|
Simple modular or with Squeeze/RCT/Delta?
|
|
|
|
veluca
|
2025-05-03 06:11:38
|
RCT, but no squeeze or other fun stuff
|
|
|
jonnyawsom3
|
|
veluca
Slow, but not even *too* slow for not having any performance optimization whatsoever (5x slower than libjxl on this image)
|
|
2025-05-03 06:11:44
|
Is that 5x slower both singlethreaded? (Prepare yourself for the barrage of questions xD)
|
|
|
|
veluca
|
|
Is that 5x slower both singlethreaded? (Prepare yourself for the barrage of questions xD)
|
|
2025-05-03 06:12:23
|
yup
|
|
|
Tirr
|
2025-05-03 06:34:32
|
jxl-rs is currently single thread only and doesn't have any handwritten SIMD routines
|
|
2025-05-03 06:35:17
|
just focusing on working implementation
|
|
|
jonnyawsom3
|
2025-05-03 06:45:08
|
I was moreso checking if libjxl was set to singlethreaded, but yeah. Glad we've hit this milestone and I'm sure more aren't far off
|
|
|
Meow
|
2025-05-03 11:39:31
|
Reaching the usable status is already a milestone
|
|
|
|
veluca
|
2025-05-03 12:01:36
|
not there yet 😛
|
|
|
Tirr
|
2025-05-04 10:37:08
|
it seems that libjxl is creating VarDCT image that its LF quant values exceed signed 16-bit range, but isn't marked as `modular_16bit_buffers = false` https://github.com/tirr-c/jxl-oxide/issues/456
|
|
2025-05-04 10:37:20
|
jxl-oxide decodes the image successfully when I turn off 16-bit buffer optimization
|
|
2025-05-04 10:39:33
|
(the problematic sample is at `c=0 y=25 x=39` in LF image which has value of `32894`)
|
|
|
_wb_
|
2025-05-04 10:43:18
|
We had something similar in libjxl-tiny in the implementation of a hw encoder. I suppose we should be more accurate in the range of quant factors to ensure the quantized lf stays within Level 5 constraints.
|
|
|
jonnyawsom3
|
2025-05-04 12:58:07
|
Realised the changelog had been neglected, so thought I'd try and catch it up https://github.com/libjxl/libjxl/pull/4224
|
|
2025-05-04 12:59:41
|
I'll have to make a mental note of adding to the changelog as part of my future PRs, if applicable, rather than trying to recall what's new since the last release
|
|
|
RaveSteel
|
2025-05-05 01:53:14
|
Is there an ETA or any milestone that needs to be met before 0.12 releases?
|
|
|
jonnyawsom3
|
2025-05-05 01:57:10
|
AFAIK no set goals/dates from the core devs, but I was hoping to get all the jpegli tweaks merged and copied over before the next release, since libjxl is still where most get it from https://github.com/google/jpegli/pull/130
|
|
|
_wb_
|
2025-05-05 08:30:18
|
Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228
(and at faster_decode=2, some slight decode speedup, at the cost of some density)
Feel free to try it out on your favorite image/corpus.
|
|
2025-05-05 08:41:54
|
In general there is probably still some substantial room to improve MA tree learning heuristics. In particular we should implement some post-clustering tree pruning that (recursively) removes splits that go to two leaf nodes with identical predictor and context after clustering (and identical multiplier/offset). Such splits only cause some encode/decode slowdown (since the tree is unnecessarily deep) and some signaling overhead, without giving any compression benefit, so pruning them can only improve things — it just seems a bit tricky to do the code plumbing to do this pruning. <@179701849576833024> or <@1346460706345848868> do you want to give it a shot?
|
|
|
|
veluca
|
2025-05-05 08:42:52
|
I think I should dedicate my jxl time to jxl-rs 😄 also I remember trying that out and it not being helpful, but I might misremember
|
|
|
Mine18
|
2025-05-05 08:44:13
|
~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~
|
|
|
|
veluca
|
2025-05-05 08:49:30
|
I still feel like we should try non-greedy tree splitting, but who has the time...
|
|
|
_wb_
|
|
Mine18
~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~
|
|
2025-05-05 08:49:48
|
this is for lossless, where only speed and density matter. Lossy is a trickier thing
|
|
|
veluca
I think I should dedicate my jxl time to jxl-rs 😄 also I remember trying that out and it not being helpful, but I might misremember
|
|
2025-05-05 08:51:25
|
yeah, jxl-rs is more important than slight encoder improvements
|
|
|
|
veluca
|
2025-05-05 08:51:25
|
As in, for each property do a DP to figure out the best way to split *more than 2-way* along that property, then repeat recursively in each subtree
|
|
|
veluca
As in, for each property do a DP to figure out the best way to split *more than 2-way* along that property, then repeat recursively in each subtree
|
|
2025-05-05 08:52:18
|
The decoder could also optimize for things generated that way (especially if we limit this to using, say, two properties at most), and I imagine this would be massively faster to decode too
|
|
2025-05-05 08:53:13
|
(two properties makes this effectively be 3 lookups in a lookup table)
|
|
|
_wb_
|
2025-05-05 08:59:53
|
Why 3 lookups?
|
|
|
|
veluca
|
2025-05-05 09:00:21
|
2 1d lookups to reduce the range of the properties, and 1 2d lookup for the leaf
|
|
|
_wb_
|
2025-05-05 09:00:35
|
ah right
|
|
2025-05-05 09:01:47
|
if it's limited to _n_ properties you can do it with _n_ 1D lookups followed by 1 _n_ D lookups, right?
|
|
|
|
veluca
|
2025-05-05 09:02:05
|
yup
|
|
2025-05-05 09:02:34
|
I imagine as soon as n starts being more than 3 or 4 the n-D lookup becomes unpractical
|
|
2025-05-05 09:02:57
|
(depending on the # of distinct values)
|
|
|
_wb_
|
2025-05-05 09:03:26
|
where the size of the _n_ D lookup table is equal to the product of the number of nodes per property (+1)
|
|
|
|
veluca
|
2025-05-05 09:03:34
|
yup
|
|
2025-05-05 09:03:41
|
well, number of distinct nodes
|
|
2025-05-05 09:03:56
|
there's already a specialized codepath for n = 1 and property = gradient/wp
|
|
|
_wb_
|
2025-05-05 09:04:44
|
yeah it might be lower than the number of nodes if there's repetition in the subtrees
|
|
|
|
veluca
|
2025-05-05 09:05:07
|
but tbh, even if we don't table it up, a tree which has a relatively small number of parts that all share the same property should be significantly faster to decode as is
|
|
2025-05-05 09:06:02
|
(basically by making a tree of 1d lookup tables)
|
|
2025-05-05 09:06:36
|
(or even not lookup tables, if the # of possible values is small -- just do a SIMD-fied linear search...)
|
|
|
_wb_
|
2025-05-05 09:09:21
|
something like 7 buckets per property (large negative, medium negative, small negative, zero, small positive, medium positive, large positive) could already be pretty effective, so I can imagine you could pick the 3 most informative properties and make a lookup table of size `7*7*7`
|
|
|
|
veluca
|
2025-05-05 09:10:24
|
yeah that would work, and the LUT would either be small or just fit in a single SIMD registers (and effectively be 3 instructions or so)
|
|
2025-05-05 09:11:10
|
(fwiw you don't even need to decide those buckets, you can just let the DP figure out the best 7-way split :P)
|
|
|
A homosapien
|
|
_wb_
Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228
(and at faster_decode=2, some slight decode speedup, at the cost of some density)
Feel free to try it out on your favorite image/corpus.
|
|
2025-05-05 09:23:47
|
I'm getting mixed results, and the impact to decoding speed is relatively negligible (within 0.5-2%). t's hard to say if it benefits photographic images more than non-photo
|
|
2025-05-05 09:24:12
|
~~Also, I think I just found another huge regression with lossless.~~
|
|
2025-05-05 09:24:47
|
~~I'll post it in <#803645746661425173> when I'm done double checking my numbers~~
|
|
2025-05-05 09:32:29
|
Nevermind, got my numbers mixed up
|
|
2025-05-05 09:33:16
|
Was comparing two different images by accident lol 😅
|
|
2025-05-05 09:33:54
|
Welp, back to work addressing the smaller regression(s) with faster decoding 3
|
|
|
_wb_
Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228
(and at faster_decode=2, some slight decode speedup, at the cost of some density)
Feel free to try it out on your favorite image/corpus.
|
|
2025-05-05 10:02:06
|
Speaking of which, can I use this multiplier for faster decoding to increase the number of buckets? It's hurting large photos for faster decoding 1-3.
https://github.com/libjxl/libjxl/pull/4201#issuecomment-2849934762
|
|
|
_wb_
|
2025-05-05 10:43:05
|
No, that's a different parameter I think.
|
|
|
A homosapien
|
2025-05-05 10:54:02
|
I'm making some edits in a fork and it turns out making the histogram "less efficient" is increasing density somehow.
|
|
2025-05-05 10:56:13
|
Granted idk what an "efficient histogram" means. I'm just going off what the PR and Chat GPT says and changing variables around
|
|
|
jonnyawsom3
|
|
Mine18
~~what if you removed that regression so jxl's image quality goes back to 0.8, and then you can claim a MASSIVE quality improvement!~~
|
|
2025-05-06 01:39:05
|
Me and Sapien have been discussing that. When we have time, we're going to try changing values to their previous states, to see if we can get the quality back without outright reverting the PR
|
|
|
_wb_
Some slight density improvement for lossless, at the cost of some decode slowdown: https://github.com/libjxl/libjxl/pull/4228
(and at faster_decode=2, some slight decode speedup, at the cost of some density)
Feel free to try it out on your favorite image/corpus.
|
|
2025-05-06 01:44:24
|
The only difference is actually in non-static properties. Level 1 already disables WP entirely, giving a 2x speedup thanks to using the kNoWP tree type
|
|
|
Mine18
|
|
Me and Sapien have been discussing that. When we have time, we're going to try changing values to their previous states, to see if we can get the quality back without outright reverting the PR
|
|
2025-05-06 03:57:56
|
hopefully the solution to the regression gets found sooner or later
|
|
|
jonnyawsom3
|
2025-05-06 07:06:12
|
I'm struggling to understand it due to my lack of C++ experience, but assuming `gi` is Global Image and `sg` is Single Group, isn't this trying to select per-group RCTs by measuing the entire image instead? <https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L1462>
|
|
|
_wb_
|
2025-05-06 07:24:35
|
Nah, at that point in the code, `gi` is a group image 🙂
|
|
2025-05-06 07:27:28
|
https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L1363
|
|
|
jonnyawsom3
|
2025-05-06 07:35:48
|
Ahh, we're trying to figure out why the RCT selection is worse than explicitly setting YCoCg in our tests, since it should be trying each RCT and then using the best result
|
|
2025-05-06 07:41:20
|
Similar was happening with per-group palette and squeeze for progressive lossless, we think it must be basing decisions on the smallest step
|
|
|
_wb_
Nah, at that point in the code, `gi` is a group image 🙂
|
|
2025-05-06 08:42:18
|
EstimateCost is at the top of the file, so is that using the full image? https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L273
"at that point in the code" has me wanting to double check haha. Regardless, we're testing it now and globally setting a specific RCT is better than allowing local RCTs, so something is definitely wrong with the cost estimation
|
|
|
A homosapien
|
2025-05-06 08:44:10
|
```
cjxl smol.png smol.jxl -d 0
JPEG XL encoder v0.12.0 e87f2f87 [_AVX2_,SSE4,SSE2]
Compressed to 24252.3 kB (7.616 bpp).
5828 x 4371, 5.662 MP/s [5.66, 5.66], , 1 reps, 12 threads.
cjxl smol.png smol.jxl -d 0 -C 6
JPEG XL encoder v0.12.0 e87f2f87 [_AVX2_,SSE4,SSE2]
Encoding [Modular, lossless, effort: 7]
Compressed to 24093.9 kB (7.567 bpp).
5828 x 4371, 7.448 MP/s [7.45, 7.45], , 1 reps, 12 threads.
```
Choosing a global RCT `-C 6` or `-C 10` comes really close to the RCT heuristics, or in this case, even beating it.
|
|
|
Melirius
|
|
_wb_
In general there is probably still some substantial room to improve MA tree learning heuristics. In particular we should implement some post-clustering tree pruning that (recursively) removes splits that go to two leaf nodes with identical predictor and context after clustering (and identical multiplier/offset). Such splits only cause some encode/decode slowdown (since the tree is unnecessarily deep) and some signaling overhead, without giving any compression benefit, so pruning them can only improve things — it just seems a bit tricky to do the code plumbing to do this pruning. <@179701849576833024> or <@1346460706345848868> do you want to give it a shot?
|
|
2025-05-06 08:44:59
|
Yes, good idea
|
|
|
jonnyawsom3
|
|
EstimateCost is at the top of the file, so is that using the full image? https://github.com/libjxl/libjxl/blob/0855a037d7ac65249f0f4700995bbd9decb3b47d/lib/jxl/enc_modular.cc#L273
"at that point in the code" has me wanting to double check haha. Regardless, we're testing it now and globally setting a specific RCT is better than allowing local RCTs, so something is definitely wrong with the cost estimation
|
|
2025-05-06 09:00:47
|
We added some debug printout and it seems like it *is* running per-group, but it *isn't* taking into account predictor selection. Still looking into it though
|
|