|
π°πππ
|
|
Hmm, have you tried decoding with output disabled? (remove `-o enc.png` from Oxide and change djxl to `djxl enc.jxl nul --output_format ppm --bits_per_sample 8`)
|
|
2025-08-01 05:02:59
|
|
|
2025-08-01 05:03:18
|
but I am not sure if it's directly comparable
|
|
|
jonnyawsom3
|
2025-08-01 05:03:21
|
Yeah, that's more like what I expected
|
|
|
π°πππ
|
2025-08-01 05:03:24
|
it seems there is initialization difference
|
|
2025-08-01 05:03:55
|
how can png encoding be 2x slower then?
|
|
|
jonnyawsom3
|
2025-08-01 05:03:56
|
Then use the MP/s both tools output instead of external metrics
|
|
|
Quackdoc
|
|
π°πππ
how can png encoding be 2x slower then?
|
|
2025-08-01 05:04:20
|
libjxl's png encoder is dogshit slow
|
|
|
jonnyawsom3
|
2025-08-01 05:04:24
|
That's what you're meant to do to ignore output and IO overhead
|
|
|
π°πππ
|
|
Quackdoc
libjxl's png encoder is dogshit slow
|
|
2025-08-01 05:04:36
|
but isn't this important for actual decoding?
|
|
|
jonnyawsom3
|
|
Quackdoc
libjxl's png encoder is dogshit slow
|
|
2025-08-01 05:04:43
|
Oxide is probably zlib-ng libjxl is zlib
|
|
|
π°πππ
|
2025-08-01 05:04:43
|
for example I need png images
|
|
|
Quackdoc
|
2025-08-01 05:04:46
|
it depends,
|
|
|
|
afed
|
2025-08-01 05:04:47
|
different defaults, different png libs
and depends on which png library libjxl is compiled with
|
|
|
Quackdoc
|
2025-08-01 05:05:00
|
something like imagemagick or ffmpeg have their own png encoders
|
|
|
jonnyawsom3
|
2025-08-01 05:05:26
|
It used to be much worse, but we set libjxl to zlib level 1 to remove most overhead without exploding PNG filesize
|
|
|
Quackdoc
|
2025-08-01 05:06:09
|
I wonder if I could add rust-png to libjxl hmmm
|
|
|
jonnyawsom3
|
|
π°πππ
but isn't this important for actual decoding?
|
|
2025-08-01 05:06:16
|
'actual decoding' is just getting pixel data from the file, saving that data to another format is then encoding with a different format
|
|
|
π°πππ
|
|
'actual decoding' is just getting pixel data from the file, saving that data to another format is then encoding with a different format
|
|
2025-08-01 05:06:41
|
yes, I understand that
|
|
2025-08-01 05:06:47
|
but for example, in order to use CVVDP
|
|
2025-08-01 05:06:55
|
I need a PNG file
|
|
2025-08-01 05:07:15
|
so if I do graphs for example (decoding to PNG many times)
|
|
2025-08-01 05:07:26
|
there can be an hour of decoding difference between jxl-oxide and djxl
|
|
2025-08-01 05:08:09
|
so the bottleneck for djxl here is strictly encoding into png
|
|
|
Quackdoc
|
2025-08-01 05:08:42
|
yeah, it would be better to use magick or something
|
|
|
jonnyawsom3
|
2025-08-01 05:09:06
|
Thinking about it, there's a chance libjxl is still using filters while Oxide isn't. I can check later, but I'd expect it's because of zlib vs zlib-ng
|
|
2025-08-01 05:09:35
|
fpnge would be ideal, but as x86 only it's not suitable for libjxl
|
|
|
π°πππ
|
|
Quackdoc
|
|
Thinking about it, there's a chance libjxl is still using filters while Oxide isn't. I can check later, but I'd expect it's because of zlib vs zlib-ng
|
|
2025-08-01 05:10:54
|
zlib-ng vs zlib?
|
|
|
π°πππ
|
|
π°πππ
|
|
2025-08-01 05:11:08
|
|
|
|
|
afed
|
|
fpnge would be ideal, but as x86 only it's not suitable for libjxl
|
|
2025-08-01 05:11:09
|
yeah, but unfortunately it's not a priority even just for x86
|
|
|
Quackdoc
|
2025-08-01 05:11:11
|
image-png uses their own zlib
|
|
|
π°πππ
|
2025-08-01 05:11:20
|
so actual decoding is 3x faster for djxl
|
|
2025-08-01 05:11:41
|
And what's the limitation for jxl-oxide here?
Security reasons I assume
|
|
|
Quackdoc
|
2025-08-01 05:12:46
|
no C api for image-png
https://cdn.discordapp.com/emojis/720670067091570719?size=64
|
|
|
π°πππ
And what's the limitation for jxl-oxide here?
Security reasons I assume
|
|
2025-08-01 05:13:13
|
jxl-oxide is not focused on speed/efficiency, best to wait for jxl-rs for that
|
|
|
|
afed
|
2025-08-01 05:16:28
|
libjxl is heavily simd optimized (and also some other optimizations), but rust versions are not and it is more difficult for rust (but there are some planned improvements in nightly rust for simd, as far as I know)
|
|
|
jonnyawsom3
|
2025-08-01 05:18:44
|
Oxide was solely made by Tirr while libjxl has had half a decade of half a dozen google employees making it multithreaded and utilising instruction sets
|
|
2025-08-01 05:19:08
|
(though Eustas has been making a lot more SIMD for the encoder in the past week)
|
|
|
π°πππ
|
|
afed
libjxl is heavily simd optimized (and also some other optimizations), but rust versions are not and it is more difficult for rust (but there are some planned improvements in nightly rust for simd, as far as I know)
|
|
2025-08-01 05:36:07
|
you can utilize SIMD in Rust
|
|
2025-08-01 05:36:22
|
but of course, the project itself may have limitations
|
|
2025-08-01 05:36:42
|
rav1e <:SmileDoge2:1320855554683441163>
|
|
|
Kupitman
|
|
Quackdoc
|
|
π°πππ
rav1e <:SmileDoge2:1320855554683441163>
|
|
2025-08-01 05:40:30
|
bring back rav1e <:PepeSad:815718285877444619>
|
|
|
Orum
|
|
π°πππ
rav1e <:SmileDoge2:1320855554683441163>
|
|
2025-08-01 05:43:39
|
...and yet it's slow AF
|
|
|
π°πππ
|
2025-08-01 05:44:08
|
SIMD isn't magically fast
|
|
|
Orum
|
2025-08-01 05:44:13
|
exactly
|
|
|
|
afed
|
2025-08-01 05:44:29
|
when asm becomes more than 90%, rav1e will come to life <:kekw:808717074305122316>
|
|
|
π°πππ
|
|
Orum
...and yet it's slow AF
|
|
2025-08-01 05:51:01
|
it was the fastest encoder back then, though
|
|
2025-08-01 05:51:13
|
the reason it's slow is that the development stopped long ago
|
|
2025-08-01 06:02:45
|
the about page is still the same π
|
|
|
Orum
|
2025-08-01 06:07:10
|
yeah I think at the time that was written it was the only AV1 encoder
|
|
|
|
afed
|
2025-08-01 06:29:24
|
libaom was available as a normal encoder even before av1 was finalized
and rav1e didn't have many av1 tools at first (and even now not everything is fully implemented), so maybe that's why it was faster <:kekw:808717074305122316>
also because a lot of asm things were ported from dav1d
|
|
|
Quackdoc
|
2025-08-01 06:45:36
|
rav1e used to be fast and good, [cheems](https://cdn.discordapp.com/emojis/720670067091570719.webp?size=48&name=cheems)
|
|
|
Orum
|
2025-08-01 06:51:43
|
it was never fast, and it's only really good at intra encoding
|
|
|
π°πππ
|
|
Orum
it was never fast, and it's only really good at intra encoding
|
|
2025-08-01 06:55:34
|
relatively
|
|
2025-08-01 06:55:43
|
it was faster than AOM at some point as far as I remember
|
|
2025-08-01 06:57:14
|
It could easily be competitive if funding and development hadn't stopped.
|
|
|
Quackdoc
|
|
Orum
it was never fast, and it's only really good at intra encoding
|
|
2025-08-01 07:01:35
|
no, it was quite good in the past
|
|
2025-08-01 07:02:30
|
it was slow as molasses since it couldnt thread worth shit, but if you did chunked encoding it was genuinely the best
|
|
2025-08-01 07:02:47
|
using a threaded tool it was fast
|
|
|
π°πππ
|
|
Quackdoc
it was slow as molasses since it couldnt thread worth shit, but if you did chunked encoding it was genuinely the best
|
|
2025-08-01 07:04:43
|
hmm, true
I was using it with av1an
|
|
|
jonnyawsom3
|
2025-08-01 07:13:53
|
<#805176455658733570>
|
|
|
Orum
|
|
Quackdoc
using a threaded tool it was fast
|
|
2025-08-01 07:18:55
|
which isn't possible in many circumstances
|
|
|
A homosapien
|
|
Oxide is probably zlib-ng libjxl is zlib
|
|
2025-08-01 08:01:11
|
Benchmarks indicate that rust's png crates are faster than zlib-ng. So it's probably like 3x faster instead of 2x.
https://www.reddit.com/r/rust/comments/1ha7uyi/memorysafe_png_decoders_now_vastly_outperform_c/
|
|
|
Tirr
|
2025-08-01 08:21:07
|
oxide uses png crate which uses miniz-oxide
|
|
|
A homosapien
|
2025-08-01 08:58:16
|
and it's presumably faster than libpng /w zlib-ng
|
|
2025-08-01 08:59:00
|
But I wonder how easy it would be to compile libjxl with zlib-ng
|
|
2025-08-01 08:59:37
|
We have numbers here showing that reading/writing PNG files became significantly faster
|
|
|
jonnyawsom3
|
2025-08-01 08:59:56
|
Ideally there'd be a dependancy overhaul. Swapping leftover libjpeg with jpegli, zlib with zlib-ng, ect
|
|
|
π°πππ
|
|
A homosapien
Benchmarks indicate that rust's png crates are faster than zlib-ng. So it's probably like 3x faster instead of 2x.
https://www.reddit.com/r/rust/comments/1ha7uyi/memorysafe_png_decoders_now_vastly_outperform_c/
|
|
2025-08-01 10:04:03
|
so jxl-oxide would also be probably faster than using `djxl` + another png encoder
|
|
|
A homosapien
|
2025-08-01 10:04:33
|
maybe, it's just speculation
|
|
|
π°πππ
|
2025-08-01 10:04:40
|
I mean externally
|
|
2025-08-01 10:04:46
|
such as `magick` for example
|
|
2025-08-01 10:04:57
|
since we call two different processes, it's also another overhead
|
|
2025-08-01 10:05:28
|
and since the actual decoding difference is milliseconds (especially for normal sized images) the bottleneck is the png encoding
|
|
|
jonnyawsom3
|
2025-08-01 10:09:19
|
fpnge could still work with piping
|
|
|
|
afed
|
2025-08-01 10:09:27
|
djxl + fpnge should be pretty fast if the ssd/hdd isn't a bottleneck
fpnge is the fastest png encoder
|
|
|
π°πππ
|
|
afed
djxl + fpnge should be pretty fast if the ssd/hdd isn't a bottleneck
fpnge is the fastest png encoder
|
|
2025-08-01 10:11:28
|
let's do a benchmark
|
|
|
jonnyawsom3
|
2025-08-01 10:14:51
|
Oh, hmm... I forgot about that
`Usage: fpnge [options] in.png out.png`
|
|
|
A homosapien
|
2025-08-01 10:17:22
|
Modifing fpnge to take in raw bitmap inputs sounds like a fun side project
|
|
|
|
afed
|
|
Oh, hmm... I forgot about that
`Usage: fpnge [options] in.png out.png`
|
|
2025-08-01 10:18:19
|
then yes, I asked to add ppm
but I don't remember if it was added or not
|
|
|
π°πππ
|
2025-08-01 10:24:21
|
interesting size
|
|
|
Quackdoc
|
|
π°πππ
so jxl-oxide would also be probably faster than using `djxl` + another png encoder
|
|
2025-08-01 10:24:56
|
I mean, you could wire up libjxl to rust-png and call that good, that would be the "fastest"
|
|
|
π°πππ
|
|
Oh, hmm... I forgot about that
`Usage: fpnge [options] in.png out.png`
|
|
2025-08-01 10:25:57
|
so it doesn't have ppm, or piped input π
|
|
2025-08-06 11:33:45
|
JXL has a problem with screen content.
JPEGLI and AVIF don't have the same problem (the sizes and the metric scores are completely linear)
|
|
2025-08-06 11:39:44
|
```
cjxl v0.12.0 73beeb54
-e 10 --keep_invisible=0 --progressive_dc=0 --brotli_effort=11 -x strip=all
```
|
|
2025-08-06 11:39:59
|
reference image used
|
|
2025-08-06 11:51:38
|
The behavior is similar with `resampling=1` `ec_resampling=1`
|
|
2025-08-06 11:54:09
|
The scores and the output sizes are completely non-linear
And a much lower size can get a higher score or vice versa.
|
|
|
jonnyawsom3
|
|
π°πππ
The behavior is similar with `resampling=1` `ec_resampling=1`
|
|
2025-08-07 01:45:15
|
What about --resampling 1 and --patches 0?
|
|
2025-08-07 01:45:49
|
I'd bet the size discrepancies are from patches falling in and out of detection as the quality changes
|
|
|
π°πππ
|
2025-08-07 01:57:33
|
same image on AVIFENC/AOM is much more linear both in size and metric scores (especially for the second SCD mode)
Comparison with its new screen content detection mode
|
|
|
Kupitman
|
2025-08-07 05:26:15
|
AV2 released?
|
|
2025-08-07 05:26:37
|
Oh, scd
|
|
|
gb82
|
|
π°πππ
rav1e <:SmileDoge2:1320855554683441163>
|
|
2025-08-11 06:40:10
|
rav1e just borrowed from dav1d
|
|
|
_wb_
|
2025-08-11 07:51:33
|
if someone feels like setting up the github actions, it would be nice to make https://github.com/libjxl/bench so that it runs all the tests in actions and auto-updates when the various decoder implementations get updated
|
|
|
jonnyawsom3
|
2025-08-11 02:08:43
|
Also <@794205442175402004>, do you know if setting this to 0 is disabling the palette? The wording is confusing me, but setting it to 0 is giving better density on all images I try
```-Y PERCENT, --post-compact=PERCENT
Use local (per-group) channel palette if the number of sample values is
smaller than this percentage of the nominal range.```
|
|
2025-08-11 02:12:45
|
It's also what we changed to fix progressive lossless, so I'm wondering if it's broken entirely
|
|
|
_wb_
|
2025-08-11 02:33:55
|
it could be that this no longer works correctly after moving to chunked encoding
|
|
|
jonnyawsom3
|
2025-08-11 02:39:39
|
Chunked is disabled for images under 2048 x 2048 though, so it applies with and without buffering
|
|
2025-08-11 02:49:45
|
It's a 0.1% density change, but it's consistent and given it made progressive lossless 30% larger, it hints something isn't working right
|
|
|
_wb_
|
2025-08-11 06:46:51
|
Oh, I guess in combination with squeeze it is very counterproductive to do channel palette since squeeze introduces many small channels.
|
|
2025-08-11 06:47:41
|
Without squeeze it is probably often not worth its signaling overhead so the heuristics should be made stricter
|
|
|
Snafuh
|
|
_wb_
if someone feels like setting up the github actions, it would be nice to make https://github.com/libjxl/bench so that it runs all the tests in actions and auto-updates when the various decoder implementations get updated
|
|
2025-08-16 10:44:30
|
I thought about the same last week when someone asked about the current status of decoders. I'll take a look. The instructions for running it locally seem to be alright, so getting it into actions should be doable
|
|
2025-08-16 01:26:06
|
Got the bench repo running in an action
https://github.com/Snafuh/bench/actions/runs/17008565099
Seems like all test exit with 1 but the dump files seem to be updated. I have not looked at the code at all so far, just created the action.
Artifacts:
https://github.com/Snafuh/bench/actions/runs/17008565099/artifacts/3779719050
I think some things could be adjusted in the scripts. It expected the dump files to be already there. So it's a bit hard to judge what was updated and what not.
Also the website could get more info. Especially Date of creation, version/commit info
|
|
|
Kupitman
|
2025-08-19 12:28:54
|
e9
849 710 | 0.12
849 764 | 0.11
<:Yes:1368822664382119966>
e10
621 786 | 0.12
621 171 | 0.11
<:tfw:843857104439607327>
|
|
|
Demiurge
|
2025-08-23 09:09:23
|
There is a portable C simd intrinsics lib that can be used to make fpnge portable
|
|
2025-08-23 09:11:16
|
https://simd-everywhere.github.io/blog/2020/06/22/transitioning-to-arm-with-simde.html
|
|
|
|
veluca
|
2025-08-23 09:30:04
|
not a great idea
|
|
2025-08-23 09:30:23
|
if I ever port it, it'll be to highway or the system we'll use for jxl-rs
|
|
|
A homosapien
|
2025-08-23 09:32:02
|
What system for SIMD are you planning to use for jxl-rs?
|
|
|
Demiurge
|
2025-08-23 09:32:44
|
Rust intrinsics are still in an early, very Rusty state
|
|
|
|
veluca
|
2025-08-23 09:33:02
|
nah, at least not on ARM and x86 π
|
|
|
Demiurge
|
2025-08-23 09:33:33
|
I thought they couldn't decide if they actually wanted to have them or not
|
|
|
|
veluca
|
|
A homosapien
What system for SIMD are you planning to use for jxl-rs?
|
|
2025-08-23 09:34:04
|
the final one is a bit of a work in progress (see https://github.com/rust-lang/rust/issues/143352) but for now see https://github.com/libjxl/jxl-rs/tree/main/jxl/src/simd -- I'm writing it with a design that will hopefully be very easy to adapt once we have the appropriate language features
|
|
|
Demiurge
|
2025-08-23 09:38:44
|
Speaking of libhwy, I heard something similar is part of c++26 language now. Idk if clang supports it.
|
|
|
A homosapien
|
|
veluca
the final one is a bit of a work in progress (see https://github.com/rust-lang/rust/issues/143352) but for now see https://github.com/libjxl/jxl-rs/tree/main/jxl/src/simd -- I'm writing it with a design that will hopefully be very easy to adapt once we have the appropriate language features
|
|
2025-08-30 11:55:40
|
Do the compiled binaries use the handwritten SIMD code yet? I did some benchmarks and there's little to no change in performance.
|
|
|
|
veluca
|
2025-08-30 01:16:08
|
what images did you benchmark?
|
|
2025-08-30 01:16:18
|
(and on what CPU family?)
|
|
2025-08-30 01:16:38
|
for now most of the improvements are for vardct images
|
|
|
A homosapien
|
2025-08-30 01:24:01
|
I'm afk at the moment. I'll post my numbers and compile settings when I get home.
|
|
|
jonnyawsom3
|
|
veluca
for now most of the improvements are for vardct images
|
|
2025-08-31 01:15:38
|
I was trying an 8K VarDCT image on a ryzen Zen1 CPU, it was giving slower results than an older build
|
|
|
|
veluca
|
2025-08-31 02:32:13
|
Weird
|
|
2025-08-31 02:32:24
|
Can you share the image? (Ideally in an issue)
|
|
|
jonnyawsom3
|
|
veluca
Can you share the image? (Ideally in an issue)
|
|
2025-08-31 02:36:09
|
I'm tempted to ask if you have a compiled x86 binary we could test, to see if we're just not passing the right compile flags or something
|
|
|
|
veluca
|
2025-08-31 02:44:58
|
You just need the --release flag
|
|
|
A homosapien
|
2025-08-31 03:07:20
|
I did use the release flag, however, I also had additional compile flags for speed like `codegen-units=1` and `lto=fat`. Also `-Ctarget-cpu=znver2` might change things.
|
|
2025-08-31 03:08:43
|
I'm out being a tourist in Italy so I'm still afk.
|
|
|
|
veluca
|
2025-08-31 03:09:06
|
(at the risk of going off topic, in Italy where? :D)
|
|
|
A homosapien
|
2025-08-31 03:14:10
|
I'm Sicily right now in Naxos on a bus ride to Taorminas
|
|
2025-08-31 03:14:28
|
Tomorrow I'm going to see Mount Edna and then off to the mainland (Naples & Pompeii)
|
|
2025-08-31 03:21:29
|
So far I have visited Syracuse, Ragusa, and Palermo
|
|
2025-08-31 03:24:46
|
I know for a fact I'm going to transcode all the photos I've taken into JXL, The problem is they are ultra HDR. π
|
|
|
Orum
|
2025-08-31 03:26:42
|
why is that an issue?
|
|
|
A homosapien
|
2025-08-31 03:27:42
|
I'm not sure if cjxl preserves the HDR gainmap part of the image
|
|
2025-08-31 03:28:16
|
Also just viewing the HDR version outside of Android is annoying
|
|
|
Orum
|
2025-08-31 03:28:25
|
it *should*, but I haven't tested
|
|
|
|
veluca
|
2025-08-31 03:30:10
|
haven't spent a lot of time in Sicily (was just in Catania for organizing informatics olympiads), but it's a nice place π
|
|
|
jonnyawsom3
|
|
Orum
it *should*, but I haven't tested
|
|
2025-08-31 04:52:54
|
gainmaps were added to the API in 0.11 but the CLI tools weren't updated to use them
|
|
|
A homosapien
|
|
veluca
Can you share the image? (Ideally in an issue)
|
|
2025-08-31 09:18:57
|
Idk if my build env was bad or if I was testing on a lossless image by mistake, but I don't see a regression. I redid everything, I nuked my jxl-rs folder and recompiled and retested. The SIMD code seems to work, main is slightly faster (+5-10%) compared to pre-SIMD (d4b5df1) jxl-rs. With and without my additional flags.
|
|
2025-09-01 06:35:10
|
<@238552565619359744> you were the first one to tell me there was regression. I DM'ed you fresh binaries, does it still occur?
|
|
|
Lilli
|
2025-09-01 09:13:29
|
I'm using libjxl (0.10.2 in an embedded device, and it performs rather poorly)
For effort 3: 8s
For effort 5: 56s
that sounds like too big of a jump to me
|
|
2025-09-01 09:23:20
|
Is there something I likely did wrong?
|
|
2025-09-01 09:27:38
|
I checked, I'm not swapping
|
|
|
jonnyawsom3
|
2025-09-01 09:35:12
|
Effort 3 lossy is basically just a standard JPEG, effort 5 starts using features of JPEG XL like Variable block sizes
|
|
|
Lilli
|
2025-09-01 09:47:31
|
Yes, but why is it 7 times slower ? :/
|
|
2025-09-01 09:47:49
|
When on my laptop it's only about 2-3times slower
|
|
|
Mine18
|
2025-09-01 09:50:11
|
threading, instructions, cache, stuffff
|
|
|
_wb_
|
2025-09-01 11:22:06
|
is this lossy or lossless?
|
|
|
Lilli
|
2025-09-01 11:59:37
|
lossy
|
|
|
_wb_
|
2025-09-01 12:54:34
|
how many threads?
|
|
|
|
veluca
|
2025-09-01 01:17:29
|
I imagine 1 b/c embedded
|
|
|
jonnyawsom3
|
|
A homosapien
<@238552565619359744> you were the first one to tell me there was regression. I DM'ed you fresh binaries, does it still occur?
|
|
2025-09-01 01:23:47
|
Still getting around a 10% regression on every VarDCT image I try
Pre-SIMD
``` Wall time: 0 days, 00:00:08.770 (8.77 seconds)
User time: 0 days, 00:00:02.203 (2.20 seconds)
Kernel time: 0 days, 00:00:05.921 (5.92 seconds)```
Post-SIMD
``` Wall time: 0 days, 00:00:09.291 (9.29 seconds)
User time: 0 days, 00:00:02.031 (2.03 seconds)
Kernel time: 0 days, 00:00:06.625 (6.62 seconds)```
|
|
|
_wb_
|
2025-09-01 01:28:36
|
interesting, usr time goes down but sys time goes up
|
|
|
Lilli
|
|
_wb_
how many threads?
|
|
2025-09-01 02:04:33
|
2 threads before, I just tried with 8, just about 10s difference using the `JxlResizeableParallelRunner`
8->7.3
55->45
And I'm not sure I can do 8 while in production tbh
|
|
|
_wb_
|
2025-09-01 02:47:03
|
It would be interesting to check why e5 is so much slower than e3 on that device. It's of course expected that there is some gap, but indeed not that large...
|
|
|
Lilli
|
2025-09-01 02:59:36
|
yes, my thoughts exactly, the device is ARM based, it's the compute module (CM4) of a raspberry pi 4
|
|
2025-09-01 03:00:05
|
(and it doesn't have 8 threads, just 4, but I tried a bunch of things anyway)
|
|
|
_wb_
|
2025-09-01 03:27:53
|
actually I kind of get the same big gap on my macbook β which is also ARM based but substantially beefier than a raspberry pi π
|
|
2025-09-01 03:27:55
|
```
001.png
Encoding kPixels Bytes BPP E MP/s D MP/s Max norm SSIMULACRA2 PSNR pnorm BPP*pnorm QABPP Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:3 1084 188913 1.3934206 33.951 59.844 1.67419956 85.39267932 43.50 0.66288994 0.923684510472 2.333 0
jxl:4 1084 191619 1.4133800 33.915 58.583 1.67329057 85.76321222 43.53 0.66185216 0.935448636824 2.365 0
jxl:5 1084 184631 1.3618366 4.604 48.690 1.63430530 85.04619567 43.27 0.68055976 0.926811204959 2.226 0
jxl:6 1084 183920 1.3565923 2.685 54.254 1.62873990 84.91382279 43.21 0.68212848 0.925370233425 2.210 0
jxl:7 1084 184129 1.3581339 2.435 53.850 1.83446444 84.93351387 43.20 0.68402386 0.928995971028 2.491 0
Aggregate: 1084 186617 1.3764854 8.090 54.900 1.68738647 85.20988478 43.34 0.67421937 0.928053147031 2.323 0
```
|
|
2025-09-01 03:28:58
|
^ this is homebrew libjxl 0.11
|
|
2025-09-01 03:29:24
|
```
001.png
Encoding kPixels Bytes BPP E MP/s D MP/s Max norm SSIMULACRA2 PSNR pnorm BPP*pnorm QABPP Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:3 1084 188875 1.3931403 37.762 63.803 1.67419956 85.39267932 43.50 0.66288994 0.923498710599 2.332 0
jxl:4 1084 191586 1.4131366 36.777 60.061 1.67329057 85.76321222 43.53 0.66185216 0.935287536907 2.365 0
jxl:5 1084 184496 1.3608409 6.501 50.113 1.63113330 85.33134233 43.47 0.66017514 0.898393306174 2.220 0
jxl:6 1084 184952 1.3642043 3.745 56.992 1.62598521 85.27235686 43.44 0.65957970 0.899801466869 2.218 0
jxl:7 1084 185176 1.3658565 3.326 56.406 1.62600045 85.28563934 43.44 0.66006019 0.901547521492 2.221 0
Aggregate: 1084 186997 1.3792855 10.238 57.294 1.64596675 85.40904601 43.48 0.66091024 0.911583915676 2.270 0
```
^ current git libjxl
|
|
2025-09-01 03:30:58
|
basically e3 and e4 are pretty much the same speed, e5 is 7x slower
|
|
|
Lilli
|
2025-09-02 08:04:41
|
Okay, so it's a matter of differing optimizations then?
|
|
2025-09-02 08:04:55
|
Thanks a lot for checking that out by the way ! π
|
|
|
_wb_
|
2025-09-02 08:34:08
|
the gap between e4 and e5 is a bit too large imo; also e3 is not really much faster than e4. Perhaps we should rename e3 to e2, e4 to e3, and make the new e4 do something between current e4 and e5 at a speed roughly halfway between the two.
|
|
|
Lilli
|
2025-09-02 09:32:15
|
Yes that would be ideal ! The jump in quality is also quite noticeable, so an intermediate would also make sense.
|
|
|
|
afed
|
2025-09-02 09:33:33
|
but e2 is useful and also noticeably faster than e3
|
|
2025-09-02 09:38:17
|
perhaps it would be better to just replace e3 with e4 and make a new e4
because the current e1, e2, and e4 have a noticeable difference speed and compression and usefulness for their use cases
|
|
|
jonnyawsom3
|
|
afed
perhaps it would be better to just replace e3 with e4 and make a new e4
because the current e1, e2, and e4 have a noticeable difference speed and compression and usefulness for their use cases
|
|
2025-09-02 10:18:31
|
They shouldn't though. e1 and e2 are identical, e3 and e4 only do ANS and Coefficient reordering
<https://github.com/libjxl/libjxl/blob/main/doc/encode_effort.md>
|
|
2025-09-02 10:19:16
|
Then e5 does VarDCT, AQ, Gabor and CFL
|
|
2025-09-02 10:23:30
|
It used to say e4 had simple VarDCT and AQ, but I checked the code and it was wrong, so I updated the docs a while back. Maybe it should have been implemented instead
|
|
|
|
afed
|
2025-09-02 10:30:58
|
ah, I thought it also meant for lossless, if only for lossy, then yeah
|
|
|
Lilli
|
|
It used to say e4 had simple VarDCT and AQ, but I checked the code and it was wrong, so I updated the docs a while back. Maybe it should have been implemented instead
|
|
2025-09-02 10:32:51
|
I would like only VarDCT and AQ π
|
|
|
jonnyawsom3
|
2025-09-02 10:33:53
|
You could try manually disabling the Gabor and CFL flags, but I think most of the time is from VarDCT
|
|
|
Lilli
|
2025-09-02 10:56:46
|
I see, maybe that is not worth it then
|
|
2025-09-02 11:01:05
|
I find it strange that when doing lossless I obtain these values, so, setting the distance to zero:
```
Input size: 300MB
|effort|duration (s)|size (MB)|
|------|------------|---------|
| 0 | 86.9 | 62 |
| 1 | 12.3 | 116 |
| 2 | 12.3 | 116 |
| 3 | 14 | 114 |
| 5 | 48.9 | 134 |
```
I suppose that means I'm also not setting a few other things that I should ? (these timings are on the CM4)
|
|
|
jonnyawsom3
|
|
Lilli
I find it strange that when doing lossless I obtain these values, so, setting the distance to zero:
```
Input size: 300MB
|effort|duration (s)|size (MB)|
|------|------------|---------|
| 0 | 86.9 | 62 |
| 1 | 12.3 | 116 |
| 2 | 12.3 | 116 |
| 3 | 14 | 114 |
| 5 | 48.9 | 134 |
```
I suppose that means I'm also not setting a few other things that I should ? (these timings are on the CM4)
|
|
2025-09-02 11:04:37
|
What bitdepth is your image data? Effort 1 has a specialised fast encoder, but will fall back to effort 2 if above 16bit or something else is incompatible
|
|
|
Lilli
|
2025-09-02 11:04:52
|
16bits yep, float data
|
|
2025-09-02 11:05:29
|
I mean, it's 16 bits uint16, and I also use float sometimes
|
|
2025-09-02 11:05:43
|
these tests are 16bits uint
|
|
|
jonnyawsom3
|
2025-09-02 11:08:38
|
Hmmm, odd
|
|
|
Lilli
|
2025-09-02 11:11:03
|
I did not set `JxlEncoderSetFrameLossless` only the distance to 0
|
|
2025-09-02 11:11:11
|
So I guess modular isn't activated? I added more results to the table
|
|
|
jonnyawsom3
|
2025-09-02 11:33:35
|
Ah right, I assume you're on v0.11? IIRC we made it error now if you set distance to 0 without actually setting it as lossless in the API
|
|
|
Lilli
|
2025-09-02 11:43:05
|
0.10.2 on the embedded device, 0.11.1 on my laptop
|
|
2025-09-02 11:49:18
|
I just tried on my laptop enabling `JxlEncoderSetFrameLossless` when using distance 0 and the results are identical in terms of filesize (I don't have the results on the embedded)
|
|
2025-09-02 11:56:05
|
I first set `JxlEncoderSetFrameLossless` and then the distance to 0, all in linear sRGB
|
|
|
_wb_
|
2025-09-02 04:03:33
|
effort 0 is not a valid effort value iirc
|
|
2025-09-02 04:05:27
|
effort 4-5 lossless being larger than e3 can happen on some images. Especially for photographic images, e3 is quite good and higher effort doesn't necessarily beat it.
|
|
2025-09-02 04:06:28
|
but you should try the current git head version because things might be different there, some bugs have been fixed and encoder choices improved a bit
|
|
|
Kupitman
|
|
_wb_
effort 4-5 lossless being larger than e3 can happen on some images. Especially for photographic images, e3 is quite good and higher effort doesn't necessarily beat it.
|
|
2025-09-02 09:51:31
|
can you fix that?π π
|
|
|
monad
|
2025-09-03 06:21:43
|
e3 lossless is also inefficient for images characteristically different than photos. for varying content, e4 is much safer
|
|
|
Kupitman
can you fix that?π π
|
|
2025-09-03 06:33:10
|
generally, higher efforts are more dense, despite exceptional individual cases
|
|
2025-09-03 06:42:45
|
117 photo/film
``` Mpx/s real (mean)
B bpp (mean) mins unique mins best of
84969245 6.2769264 94.02% 94.02% 4.1837 cjxl_0.11.0_d0e4num_threads0
86605683 6.4035571 5.98% 5.98% 10.032 cjxl_0.11.0_d0e3num_threads0```
|
|
|
_wb_
|
2025-09-03 07:44:21
|
it is hard to give guarantees that higher effort means better compression on a per-image basis, the only way to do that is to brute force it (try all lower efforts too) but that would come at a substantial cost
|
|
|
Lilli
|
|
_wb_
effort 0 is not a valid effort value iirc
|
|
2025-09-03 09:37:04
|
yes, but it still does something somehow haha
they are non-perceptual raw 16bits linear near-dark images, with intensity target 64000 (but I guess intensity target doesn't do anything in lossless)
So you're saying e4 should be better? I'll have to test that then. You're saying it's better regarding compression ratio only or speed ? My metric is time spent vs final size (for a threshold quality, in theory, but we're talking about lossless anyway). So I need the best compromise in terms of compression ratio and time.
|
|
|
jonnyawsom3
|
|
monad
117 photo/film
``` Mpx/s real (mean)
B bpp (mean) mins unique mins best of
84969245 6.2769264 94.02% 94.02% 4.1837 cjxl_0.11.0_d0e4num_threads0
86605683 6.4035571 5.98% 5.98% 10.032 cjxl_0.11.0_d0e3num_threads0```
|
|
2025-09-03 10:49:50
|
By the way, do you have a script or something you use for your corpus tests?
Me and some friends were thinking of finding the optimal parameters for our use case
|
|
|
monad
|
2025-09-03 05:24:44
|
yes, a beautiful monstrosity
|
|
|
jonnyawsom3
|
|
monad
yes, a beautiful monstrosity
|
|
2025-09-04 12:47:08
|
Don't suppose you could share it?
|
|
|
monad
|
2025-09-04 12:51:27
|
sure. it has some idiosyncrasies and arcana. and it's linuxy. but I can write some explanation
|
|
2025-09-04 08:04:31
|
<@238552565619359744> I documented the interface, but made no attempt to clarify the code. Oh, and I forgot to mention it assumes cjxl prints size in bytes, which is non-standard.
|
|
|
Exorcist
|
2025-09-15 06:01:19
|
https://halide.cx/blog/consistency/
|
|
2025-09-15 06:02:38
|
Since JXL author is also SSIMULACRA2 author, why JXL is worst consistency?
|
|
|
jonnyawsom3
|
2025-09-15 07:17:02
|
Could be because of the PRs that caused a regression in SSIMULACRA2 scores
|
|
|
|
afed
|
2025-09-15 07:34:35
|
and also because it is still a small dataset, on a larger and more diverse one it may be very different
and because jxl has zero tuning for ssimulacra2 (even though it has the same authors, but when the race for metric numbers begins, these metrics and such results can no longer be considered very accurate and truthful)
in some pr, even on the opposite, ssimulacra2 scores got worse
though, jpegli has some tuning for ssimulacra2, mainly I think because there was no time for careful visual tuning (and tuning for metrics is at least better than nothing)
|
|
|
jonnyawsom3
|
2025-09-15 08:17:34
|
Both use butter internally
|
|
|
|
afed
|
2025-09-15 08:23:23
|
yeah, but jpegli had some tuning for ssimulacra2, even though it's not used internally
<https://github.com/libjxl/libjxl/pull/2646>
|
|
2025-09-15 08:25:16
|
btw, there were also some old benchmarks, and probably some from Jon as well, where according to the metrics, the best in terms of "consistency" was the standard libjpeg, and libjxl was the worst or one of the worst, if I'm not mistaken
but this is not the case in practice, libjxl is quite consistent in terms of visual quality, at least in photos
and is also actually the only encoder where the quality settings use not just a quantizer, but one that is closely integrated with butteraugli
so for other encoders, Q means almost nothing, because it is just a quantizer, not some metric or, even more so, a real quality indicator, and the actual quality is highly variable depending on the content with the same Q
with any encoder, it's possible to use encoding based on metrics, but then the encoder has to make many encodings until it reaches a value close to this metric, so the encoding time is multiplied by the number of tries
and also, without deep integration, it only works on the whole image (not just on the needed parts or at least blocks)
|
|
|
_wb_
|
2025-09-15 01:26:25
|
libjxl is not tuned for or optimizing for ssimu2, in fact ssimu2 was only created after most of the current libjxl encoder was already there.
|
|
2025-09-15 01:30:35
|
it would be interesting in that blog post to see the libaom numbers without tune iq π
|
|
|
gb82
|
2025-09-15 04:35:26
|
yeah I can look into a follow-up
|
|
2025-09-15 04:36:13
|
also I think libjxl having less consistent performance isn't the end of the world β like I mentioned at the end of the blog post, efficiency and speed are still the most important, and libjxl at e7 is really fast and efficient
|
|
|
afed
btw, there were also some old benchmarks, and probably some from Jon as well, where according to the metrics, the best in terms of "consistency" was the standard libjpeg, and libjxl was the worst or one of the worst, if I'm not mistaken
but this is not the case in practice, libjxl is quite consistent in terms of visual quality, at least in photos
and is also actually the only encoder where the quality settings use not just a quantizer, but one that is closely integrated with butteraugli
so for other encoders, Q means almost nothing, because it is just a quantizer, not some metric or, even more so, a real quality indicator, and the actual quality is highly variable depending on the content with the same Q
with any encoder, it's possible to use encoding based on metrics, but then the encoder has to make many encodings until it reaches a value close to this metric, so the encoding time is multiplied by the number of tries
and also, without deep integration, it only works on the whole image (not just on the needed parts or at least blocks)
|
|
2025-09-15 04:39:47
|
> Q means almost nothing, because it is just a quantizer, not some metric or, even more so, a real quality indicator, and the actual quality is highly variable depending on the content with the same Q
if you read the blog post, this is exactly what I tested...
|
|
|
|
afed
|
2025-09-15 05:04:02
|
just briefly
i mean for actual use other encoders don't even have anything but quantizer for quality settings, libjxl is basically the only one that uses metrics as a basis, at least from open source and without additional external tools
then the dataset is too small or very uniform for such tests, like for any flat, charts, not very photographic images, libjpeg and many other encoders completely fall apart in keeping consistency, like for some images Q60 is very good quality, for some Q90 is not good enough, when as for the most modern metrics scores
even for VMAF there is a much more confident understanding of what this value will give in quality, when as for quantization it's a large randomness (if not taken in advance inflated values and ignore overshoots, when most images will look good)
|
|
|
gb82
|
2025-09-15 05:13:48
|
but when you say "the actual quality is highly variable depending on the content with the same Q" you realize this can be true for distance as well, right?
|
|
|
|
afed
|
2025-09-15 05:20:54
|
yeah, I mean butteraugli isn't perfect either, but simple quantizer is much less consistent in quality and I noticed this quite well in practice with jpeg (and all jpeg encoders that existed since then), when as libjxl was much more consistent and which I also liked when jxl was introduced
|
|
|
gb82
|
2025-09-15 05:21:25
|
> simple quantizer is much less consistent in quality
but if you *read the blog post* you'll see exactly how consistent each encoder's quality scale really is!
|
|
2025-09-15 05:22:09
|
and yes, we're using SSIMU2 with a photographic dataset, but that's really the best we can do for a proof of concept. and besides, subset1 is reputable & used for a lot of testing anyway
|
|
|
|
afed
|
2025-09-15 05:24:05
|
popularity for datasets is rather a disadvantage, as I've said before, because most encoders are hard tuned for known datasets
|
|
|
gb82
|
2025-09-15 05:25:19
|
I can tell you Iris is not, and I don't believe libaom is either. it seems reductive to handwave all of this because of some theoretical cope for why JXL is secretly the best
|
|
2025-09-15 05:26:10
|
it is very much possible that an encoder's Q-scale can be more consistent than JXL's distance scale. in fact ... all of them are!
|
|
|
juliobbv
|
2025-09-15 05:32:06
|
the other thing is that ssimu2 starts losing target quality properties at scores of 65 and lower
|
|
2025-09-15 05:34:14
|
so having a high stddev for ssimu2 <65 average scores might actually be the desirable thing for actual perceived consistency
|
|
|
|
afed
|
2025-09-15 05:36:07
|
for some modern encoders, let's say it's possible, but as I said about libjpeg, I strongly do not confirm this in practice, at least for a variety of content, libjxl is much, much more consistent
and I've used it, like, probably on so many images, like comparable to the amount that goes through a small cdn
though I haven't used low quality, but libjxl is not really designed and tuned for lower quality from the beginning, and neither is butteraugli and even ssimulacra2
|
|
|
juliobbv
|
2025-09-15 05:36:27
|
so IMO this is the most informative range of the graph
|
|
|
_wb_
it would be interesting in that blog post to see the libaom numbers without tune iq π
|
|
2025-09-15 05:50:29
|
btw, libaom's tune iq in 3.13 is no longer optimized for just ssimu2, that'd be `tune=ssimulacra2`
|
|
|
gb82
|
|
afed
for some modern encoders, let's say it's possible, but as I said about libjpeg, I strongly do not confirm this in practice, at least for a variety of content, libjxl is much, much more consistent
and I've used it, like, probably on so many images, like comparable to the amount that goes through a small cdn
though I haven't used low quality, but libjxl is not really designed and tuned for lower quality from the beginning, and neither is butteraugli and even ssimulacra2
|
|
2025-09-15 05:51:18
|
You need some sort of numbers to back up your argument
|
|
|
|
afed
|
2025-09-15 05:51:23
|
also ssimulacra2 probably isn't the best for consistency comparisons, and for libjxl it's more like comparing butteraugli 3-norm and ssimulacra2 than anything else
though MOS comparisons are also difficult for that purpose, especially for overshoots
|
|
|
gb82
|
2025-09-15 05:51:42
|
βJXL is more consistent because it feels like itβ is not an argument I can refute
|
|
|
afed
also ssimulacra2 probably isn't the best for consistency comparisons, and for libjxl it's more like comparing butteraugli 3-norm and ssimulacra2 than anything else
though MOS comparisons are also difficult for that purpose, especially for overshoots
|
|
2025-09-15 05:52:23
|
It is a good visual quality metric, itβs not like thereβs anything better I could have used
|
|
2025-09-15 05:52:41
|
Yes, MOS is the best, but none of us can measure that with reasonable accuracy
|
|
2025-09-15 05:53:34
|
Also, numbers for a specific metric can generalize because it means the tools are available for the purposes of improving other metrics or VQ, as long as these tools arenβt purposefully designed to overfit for a metric
|
|
|
|
afed
|
|
gb82
βJXL is more consistent because it feels like itβ is not an argument I can refute
|
|
2025-09-15 06:01:47
|
yeah I know, but for good benchmarks I don't have the time, and the bad and smaller ones don't make sense
also there were similar benchmarks even from Jon and many others similar even long before jxl existed
also in this jxl is also worse on metrics on consistency and libjpeg is beating all other encoders (if i understand the graphs correctly)
<https://jon-cld.s3.amazonaws.com/test/index.html>
|
|
|
monad
|
2025-09-16 02:20:00
|
Does Iris actually exist, or is it non-free? I am also wondering how it performs on non-photo where typical WebP implodes.
|
|
|
jonnyawsom3
|
2025-09-16 07:58:01
|
Continuing from https://discord.com/channels/794206087879852103/804324493420920833/1417419139655663686
Ignore the weird formatting, I shoved my console output into GPT to save time
|
|
|
|
ignaloidas
|
2025-09-16 10:53:45
|
FWIW I don't think you're really measuring consistency by measuring how perceptual quality metrics change across some encoding quality setting - given an edge case with an all-solid color picture, to keep it "consistent" by such metric basically any encoder would have to degrade the quality of the image somehow. Some images will compress better than others, and I don't see any realistic need to enforce that the degradation of the quality is the same across a wide range of images.
|
|
2025-09-16 10:56:56
|
What you'd really want is some setting where you essentially say "I don't want my images to dip bellow this quality" - which now of course brings the problem that you're explicitly optimizing for some specific metric, which can be gamed
|
|
2025-09-16 10:58:45
|
But I think just looking at stdev can just end up penalizing overperforming on easier images
|
|
|
|
afed
|
2025-09-16 11:05:23
|
not dropping below a certain quality is most important, but overshooting quality higher than necessary also matters, because it's a waste of traffic, bandwidth, page loads, costs, etc, especially for large services
although this is not so important for personal use, where some extra quality is not a bad thing
|
|
|
|
ignaloidas
|
2025-09-16 11:09:46
|
Sure, but you don't always need that much extra bits to hit a high quality. I guess the ideal encoder would have 2 settings - bpp_max and quality_min
|
|
2025-09-16 11:10:24
|
and then maybe some prioritization between the two targets (and encode speed)
|
|
|
|
afed
|
2025-09-16 11:28:05
|
libjxl already has close to the perfect setting for quality
but the thing is that there are no perfect metrics yet, especially after very dense use of metrics (although I quite like the way butteraugli is used in libjxl and I'm not sure if replacing it with something else would be better for the same purpose, though tuning and encoder improvements are still needed)
I can say that there are no absolutely good ones, there are some that are worse, some that are better for certain content, they can be as some helper and for some process automation
but any, even very advanced and best metrics often miss on many things and many people are overly focused on increasing metrics scores, though there is no other way to show any improvement or difference without personal subjectivity
|
|
|
jonnyawsom3
|
|
afed
not dropping below a certain quality is most important, but overshooting quality higher than necessary also matters, because it's a waste of traffic, bandwidth, page loads, costs, etc, especially for large services
although this is not so important for personal use, where some extra quality is not a bad thing
|
|
2025-09-16 11:33:22
|
When tweaking resampling for cjxl, I was having issues with over and undershooting. I ended up erring on the side of caution, and going for slightly higher bpp to make sure images hit a minimum level of quality
|
|
|
_wb_
|
|
ignaloidas
But I think just looking at stdev can just end up penalizing overperforming on easier images
|
|
2025-09-16 11:46:00
|
Yes, I agree looking at stdev is not quite right. I think what matters in practice is the spread between p1 worst and p50 (median) quality. The median quality is what you expect to get (e.g. if you encode a few images with a setting, chances are you'll see something close to median behavior), while the p1 worst-case (or some other percentile) is what can happen and how bad it can get.
|
|
2025-09-16 11:47:09
|
if some images are better than the median, that's usually OK. But if 1% of images look like crap, it's a problem.
|
|
|
|
afed
|
2025-09-16 12:05:05
|
yep, and for libjpeg I have pretty often noticed that some images look really bad, ringing artifacts, banding, with the same Q when most other images look good
so even had to use some additional metrics based encoding to avoid this
but with libjxl I hardly ever encountered anything like that, it is much safer to use the same quality settings (at least if they are not very low)
but such and similar benchmarks show that libjpeg is almost the best in quality consistency, which is not entirely true, at least if there is enough content variety
|
|
|
Mine18
|
|
monad
Does Iris actually exist, or is it non-free? I am also wondering how it performs on non-photo where typical WebP implodes.
|
|
2025-09-16 12:11:13
|
you would have to ask <@237665944942411777> about that
|
|
|
Trix
|
2025-09-16 12:12:29
|
I think you mean <@703028154431832094>, he's the Iris dev, I don't have access to it
|
|
|
jonnyawsom3
|
2025-09-16 12:13:31
|
He said before that it's closed source and he was looking into whether to allow public encoding or not
|
|
|
username
|
2025-09-16 12:18:08
|
https://discord.com/channels/794206087879852103/805176455658733570/1392294193690316923
> for now I'm looking into licensing, but I'd strongly consider making this open-source because I love open source. I just need to be able to support myself while working on this
|
|
|
monad
|
2025-09-16 12:24:52
|
I hope it makes money. I will have to forget about it and stick with JPEG-LI.
|
|
|
Exorcist
|
|
afed
yep, and for libjpeg I have pretty often noticed that some images look really bad, ringing artifacts, banding, with the same Q when most other images look good
so even had to use some additional metrics based encoding to avoid this
but with libjxl I hardly ever encountered anything like that, it is much safer to use the same quality settings (at least if they are not very low)
but such and similar benchmarks show that libjpeg is almost the best in quality consistency, which is not entirely true, at least if there is enough content variety
|
|
2025-09-16 12:25:03
|
use victorvde/jpeg2png when see the JPEG artifacts
|
|
|
|
afed
|
2025-09-16 12:29:26
|
yeah, but this filtering is also quite strong and destructive in my opinion, I generally try to avoid filtering where possible
and I'm also not about how to improve images after compression, but how to optimally compress and have images in jpeg (and other) formats
when I already have high quality and lossless sources
|
|
|
username
|
|
Exorcist
use victorvde/jpeg2png when see the JPEG artifacts
|
|
2025-09-16 12:47:08
|
you should try this fork of jpeg2png as it gives better results: https://discord.com/channels/794206087879852103/794206087879852107/1372897818607616011
|
|
|
Exorcist
|
2025-09-16 12:49:12
|
The very common side-by-side comparison show the JPEG are worst (so many blocking & banding)
Actually, JPEG is not so bad
|
|
|
gb82
|
|
_wb_
Yes, I agree looking at stdev is not quite right. I think what matters in practice is the spread between p1 worst and p50 (median) quality. The median quality is what you expect to get (e.g. if you encode a few images with a setting, chances are you'll see something close to median behavior), while the p1 worst-case (or some other percentile) is what can happen and how bad it can get.
|
|
2025-09-16 12:56:34
|
I disagree that overshooting doesnβt matter; this basically ignores overshooting
|
|
2025-09-16 12:57:29
|
> A measure of how spread out data values are around the mean
This is a relevant definition for a TQ loop, which is why it is worth measuring imo. I tried to approach this post with the perspective of someone setting up a TQ loop, not an individual user using an encoder
|
|
|
|
afed
|
|
Exorcist
The very common side-by-side comparison show the JPEG are worst (so many blocking & banding)
Actually, JPEG is not so bad
|
|
2025-09-16 12:59:32
|
yeah, actually if use filtering, it's easier to remove some artifacts than to restore missing details (which is mostly impossible), which is done by codecs that tend to prefer more smoothing to avoid any artifacts
|
|
|
|
ignaloidas
|
|
gb82
I disagree that overshooting doesnβt matter; this basically ignores overshooting
|
|
2025-09-16 01:08:37
|
so what an encoder should do when an image compresses really well, make it worse or something? Especially with artificial images, there will be a bunch of images that compress unusually well without dropping much in quality - there is no reason to penalize the encoder for managing to keep the quality high *as long as the size doesn't grow as well*
|
|
|
_wb_
|
|
gb82
I disagree that overshooting doesnβt matter; this basically ignores overshooting
|
|
2025-09-16 01:36:11
|
Overshooting does matter. But the issue is not that the quality is too high β the issue is bytes may be wasted. If you can do lossless in fewer bytes than "consistent quality" lossy, it's always a win.
Basically what counts for quality is the worst case. And what counts for file size is the total (or the average).
|
|
|
|
ignaloidas
|
|
_wb_
Overshooting does matter. But the issue is not that the quality is too high β the issue is bytes may be wasted. If you can do lossless in fewer bytes than "consistent quality" lossy, it's always a win.
Basically what counts for quality is the worst case. And what counts for file size is the total (or the average).
|
|
2025-09-16 01:41:33
|
I guess to quantify how bad overshooting is you could check how the average bpp of overshooting images differs from the overall average/median bpp
|
|
|
_wb_
|
2025-09-16 01:45:53
|
I suppose. If you look at average bpp over a corpus, you're including overshooting. I think average bpp / p5 quality is the plot that makes the most sense, if you have to aggregate bitrate-distortion plots of a corpus of images.
|
|
|
gb82
|
|
ignaloidas
so what an encoder should do when an image compresses really well, make it worse or something? Especially with artificial images, there will be a bunch of images that compress unusually well without dropping much in quality - there is no reason to penalize the encoder for managing to keep the quality high *as long as the size doesn't grow as well*
|
|
2025-09-16 02:33:33
|
the dataset I chose intentionally controls for this β daala subset1 4:4:4 is pretty uniform in terms of image complexity. I agree with your point otherwise β including non-photographic image content would make this test useless, for example.
|
|
|
_wb_
Overshooting does matter. But the issue is not that the quality is too high β the issue is bytes may be wasted. If you can do lossless in fewer bytes than "consistent quality" lossy, it's always a win.
Basically what counts for quality is the worst case. And what counts for file size is the total (or the average).
|
|
2025-09-16 02:34:11
|
but in this case, looking at the range I chose on the dataset I picked, there is *no chance* of lossless being better here
|
|
2025-09-16 02:34:35
|
if I was testing the effectiveness of my target quality loop, I'd use exactly the metric I proposed; I don't think 1% lows mean anything in that context
|
|
|
A homosapien
|
|
gb82
the dataset I chose intentionally controls for this β daala subset1 4:4:4 is pretty uniform in terms of image complexity. I agree with your point otherwise β including non-photographic image content would make this test useless, for example.
|
|
2025-09-16 02:38:26
|
The Daala image set linked on the website is not 4:4:4, it's nearest neighbor upscaled 4:2:0.
Unless that's the wrong [link.](https://github.com/WyohKnott/image-formats-comparison/tree/gh-pages/comparisonfiles/subset1/Original)
|
|
|
|
ignaloidas
|
|
gb82
the dataset I chose intentionally controls for this β daala subset1 4:4:4 is pretty uniform in terms of image complexity. I agree with your point otherwise β including non-photographic image content would make this test useless, for example.
|
|
2025-09-16 02:39:11
|
I mean, even with photographic images of similar complexity, the subjects matter. Right now it's not used, but JXL does have a way to encode splines, which can help some images a bunch, but not all of them, and if spline encoding was added this would lead to some images overshooting in quality without it mattering in some subset of images.
|
|
|
gb82
|
|
ignaloidas
I mean, even with photographic images of similar complexity, the subjects matter. Right now it's not used, but JXL does have a way to encode splines, which can help some images a bunch, but not all of them, and if spline encoding was added this would lead to some images overshooting in quality without it mattering in some subset of images.
|
|
2025-09-16 02:40:18
|
not necessarily, if using splines was controlled for internally
|
|
2025-09-16 02:40:51
|
it is flawed reasoning to consider that in order to be more efficient, an encoder must be less consistent. it can very much be both with smart internal heuristics
|
|
|
|
ignaloidas
|
2025-09-16 02:41:48
|
You could control for it internally, but I don't think dropping quality in one place if you can get free wins on it in another is a good choice
|
|
|
gb82
|
2025-09-16 02:42:20
|
that's not how that works
|
|
|
|
ignaloidas
|
2025-09-16 02:44:33
|
why not? If you're using splines, you'll get higher quality, so if you want to keep it from overshooting on quality, you'll drop the quality somewhere else
|
|
|
gb82
|
2025-09-16 02:44:43
|
> how closely an image encoder's user-configurable quality index matches a perceptual quality index.
this is the entire point of having a quality slider
|
|
|
|
ignaloidas
|
|
gb82
> how closely an image encoder's user-configurable quality index matches a perceptual quality index.
this is the entire point of having a quality slider
|
|
2025-09-16 02:45:25
|
I use it not as a "be as close to this as possible", but "be at least this", and I believe most other users do as well
|
|
|
gb82
|
|
ignaloidas
why not? If you're using splines, you'll get higher quality, so if you want to keep it from overshooting on quality, you'll drop the quality somewhere else
|
|
2025-09-16 02:45:56
|
this is like saying "adding bigger blocks makes it cheaper to encode parts of the image, so we must drop the quality in other parts to achieve consistency"
|
|
|
ignaloidas
I use it not as a "be as close to this as possible", but "be at least this", and I believe most other users do as well
|
|
2025-09-16 02:46:19
|
individual users, sure; I already addressed this. this is also not Iris's target audience and never will be
|
|
|
A homosapien
The Daala image set linked on the website is not 4:4:4, it's nearest neighbor upscaled 4:2:0.
Unless that's the wrong [link.](https://github.com/WyohKnott/image-formats-comparison/tree/gh-pages/comparisonfiles/subset1/Original)
|
|
2025-09-16 02:48:46
|
oops; do u have the 444 link somewhere?
|
|
|
|
ignaloidas
|
|
gb82
individual users, sure; I already addressed this. this is also not Iris's target audience and never will be
|
|
2025-09-16 02:53:21
|
I don't know what the target audience is then - I don't think it's smart for CDNs to be chasing quality targets instead of bpp/size targets with acceptable quality. Nor does the page with the comparison specify who the target audience is.
|
|
|
gb82
|
2025-09-16 02:54:34
|
The page doesnβt need to, the user just decides whether itβs relevant to them. Also, why in the world would a CDN chase a size target? Thatβs silly
|
|
2025-09-16 02:55:00
|
You want minimum quality to maximize user engagement, which ends up giving you the minimum size
|
|
|
|
ignaloidas
|
2025-09-16 03:01:37
|
Overshoot only matters if when compressing the same image with a smaller target you'll actually get a meaningful improvement in compression - which is not a given, especially in images that are easy to compress. Since the testing doesn't test the "chasing" behavior on said images, I don't think it's correct to make assumptions that all of the overshoots are bad, or are entirely undesirable.
|
|
2025-09-16 03:02:32
|
FWIW I myself wouldn't want to use a CDN that goes out of it's way to drop the quality for a minimum improvement.
|
|
|
_wb_
|
2025-09-16 03:02:35
|
There's always a quality target, not a size target. Some images are basically mostly some background (solid or simple gradient), while other images are high-entropy all over. Fixed bpp targets would lead to horrible variation in quality.
|
|
|
gb82
|
|
ignaloidas
Overshoot only matters if when compressing the same image with a smaller target you'll actually get a meaningful improvement in compression - which is not a given, especially in images that are easy to compress. Since the testing doesn't test the "chasing" behavior on said images, I don't think it's correct to make assumptions that all of the overshoots are bad, or are entirely undesirable.
|
|
2025-09-16 03:03:03
|
Your argument here is that consistency doesnβt matter in pursuit of efficiency, which I completely agree with; I should make that very clear if those blog hasnβt already
|
|
2025-09-16 03:03:38
|
Youβre totally right to say that if you overshoot with a more efficient encoder, that is still a better encoder
|
|
|
_wb_
|
2025-09-16 03:04:25
|
And the quality target is always a minimum acceptable quality. Doing better is no problem (unless of course it comes at a big cost in filesize while a much smaller file is still acceptable), doing worse is a problem.
|
|
|
Quackdoc
|
2025-09-16 03:05:27
|
for a CDN IMO consistency is extremely important if the set target quality is too low, Variation in image quality can be a very big issue for things like image galleries or sets
|
|
|
gb82
|
2025-09-16 03:06:01
|
What Iβve seen is that if youβre doing better and it costs 10% more bits, it is probably worth it to save on those 10% if you know that your quality doesnβt need to be higher than a given baseline
|
|
2025-09-16 03:06:31
|
<@794205442175402004> what is considered a big cost where youβd wanna scale down?
|
|
|
_wb_
|
2025-09-16 03:06:40
|
There is something asymmetric: undershooting makes an image look bad, overshooting makes it load a bit slower than it should. When in doubt, nearly everyone will prefer overshooting to undershooting, at least when it's their images and reputation that is at stake.
|
|
|
gb82
|
2025-09-16 03:06:52
|
That makes complete sense
|
|
2025-09-16 03:08:07
|
So with a consistent encoder with no TQ loop, Iβd guess youβd pick the Q that has its lower bound at the point youβre looking for?
|
|
2025-09-16 03:08:51
|
eg 85 +/- 5 is useful for tq80?
|
|
|
_wb_
|
2025-09-16 03:11:52
|
Costs are measured in total terabytes of bandwidth/storage so usually people don't complain about a single image being too large (unless it's something like a 1 MB image where a 50 kb one would look just as good), but they do want the total to be as low as possible, of course.
But they do complain about a single image if it looks like crap. Especially marketeers: they usually don't notice large files (unless it affects SEO metrics), but they will notice artifacts, especially in things like brand logos or hero images of a landing page.
|
|
|
Mine18
|
|
Trix
I think you mean <@703028154431832094>, he's the Iris dev, I don't have access to it
|
|
2025-09-16 03:15:16
|
oops, my bad
i kinda confuse you both as the same person π
|
|
|
|
afed
|
2025-09-16 03:17:38
|
also for overshooting it's not really useless data, it still improves quality
and yeah, it's totally incomparable in importance to undershooting if the image looks worse than required or even really bad
|
|
|
gb82
|
|
_wb_
Costs are measured in total terabytes of bandwidth/storage so usually people don't complain about a single image being too large (unless it's something like a 1 MB image where a 50 kb one would look just as good), but they do want the total to be as low as possible, of course.
But they do complain about a single image if it looks like crap. Especially marketeers: they usually don't notice large files (unless it affects SEO metrics), but they will notice artifacts, especially in things like brand logos or hero images of a landing page.
|
|
2025-09-16 03:18:35
|
Gotcha, makes sense β I have talked to some companies that appear to care a lot more about size though, considering they run their own CDNs that are large enough to matter for size. I guess in Cloudinaryβs case where customers pay per byte on some plans, it is on the customer to notice as far as I understand
|
|
|
Mine18
oops, my bad
i kinda confuse you both as the same person π
|
|
2025-09-16 03:19:07
|
It is an honor to be confused with Trix <:BlobYay:806132268186861619>
|
|
|
|
afed
|
|
afed
also for overshooting it's not really useless data, it still improves quality
and yeah, it's totally incomparable in importance to undershooting if the image looks worse than required or even really bad
|
|
2025-09-16 03:29:03
|
and for TQ, libjxl basically already uses TQ for quality/distance internally for slower efforts (and even more advanced, not for the whole image) when other encoders don't, at least not in this way and not such complex metrics
also a separate TQ is needed for those who don't like butteraugli results, but it's basically a double TQ with double the encoding time than needed (although butteraugli is somewhat simplified and also uses some heuristics as far as I know, but still, for slower efforts it's close to the real butteraugli)
|
|
|
spider-mario
|
2025-09-16 03:38:29
|
what is TQ?
|
|
|
|
afed
|
2025-09-16 03:42:07
|
target quality
|
|
|
_wb_
|
|
gb82
Gotcha, makes sense β I have talked to some companies that appear to care a lot more about size though, considering they run their own CDNs that are large enough to matter for size. I guess in Cloudinaryβs case where customers pay per byte on some plans, it is on the customer to notice as far as I understand
|
|
2025-09-16 04:23:44
|
Sure, size does matter, and it does happen regularly that a customer asks for a size target approach ("I have this byte budget"). But so far, every time customers said they had a per-image byte budget, it turned out to not really actually be the case that they wanted to target a particular size rather than just getting the lowest possible overall size (in terms of overall traffic, not per-image) while getting a particular minimum acceptable quality (which varies per customer, for some the minimum acceptable quality is quite low while for others it is very high).
|
|
|
gb82
|
2025-09-16 04:35:45
|
that's kind of what im saying, so that makes sense. when I say "care a lot about size" I just mean they care that they aren't wasting bytes on images that look "too good"
|
|
|
jonnyawsom3
|
2025-09-17 08:46:58
|
<@179701849576833024> I don't suppose you have an old build of fjxl/fast_lossless anywhere? I was reading some old benchmarks and they show singlethreaded speeds 4-12x faster than what I'm getting with `cjxl -d 0 -e 1 --num_threads 0`. Trying to figure out if there was a major regression somewhere or if I'm reading the wrong data
|
|
|
|
veluca
|
2025-09-17 08:47:55
|
This is the wrong month to ask me π€£
|
|
|
jonnyawsom3
|
2025-09-17 08:49:12
|
Yeah, I can imagine it's pretty hectic π
|
|
2025-09-17 09:08:30
|
We'll do a bit more investigating ourselves
|
|
|
_wb_
|
|
<@179701849576833024> I don't suppose you have an old build of fjxl/fast_lossless anywhere? I was reading some old benchmarks and they show singlethreaded speeds 4-12x faster than what I'm getting with `cjxl -d 0 -e 1 --num_threads 0`. Trying to figure out if there was a major regression somewhere or if I'm reading the wrong data
|
|
2025-09-17 09:24:14
|
e1 can fall back to e2 if the input is not in one of the pixel formats supported by e1, maybe that's what's going on? Try using ppm input...
|
|
|
jonnyawsom3
|
|
_wb_
e1 can fall back to e2 if the input is not in one of the pixel formats supported by e1, maybe that's what's going on? Try using ppm input...
|
|
2025-09-17 09:29:00
|
It's standard RGB 24bit PNG. Same result with PPM. Looking at old results from years ago, 1 thread hits around 200-300MP/s, with a clang optimized main build I'm only hitting 50MP/s
|
|
2025-09-17 09:29:56
|
Effort 2 is a further 5x slower, so it's not falling back
|
|
|
_wb_
|
2025-09-17 09:33:03
|
that's strange. Can you bisect to find the commit that caused this?
|
|
|
jonnyawsom3
|
2025-09-17 09:34:12
|
Okay yeah, there's a major regression
```cjxl8 -d 0 -e 1 --num_threads 0 Test.ppm nul
JPEG XL encoder v0.8.4 [AVX2,SSE4,SSSE3,Unknown]
Read 3840x2160 image, 24883217 bytes, 4142.6 MP/s
Encoding [Modular, lossless, effort: 1],
Compressed to 7817776 bytes (7.540 bpp).
3840 x 2160, 141.04 MP/s [141.04, 141.04], 1 reps, 0 threads.```
6x slower since v0.9
```cjxl9 -d 0 -e 1 --num_threads 0 Test.ppm nul
JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2]
Encoding [Modular, lossless, effort: 1]
Compressed to 7818.6 kB (7.541 bpp).
3840 x 2160, 22.910 MP/s [22.91, 22.91], 1 reps, 0 threads.```
The optimized Clang build is over 2x faster than v0.9, but still 3x slower than v0.8
```cjxl -d 0 -e 1 --num_threads 0 Test.ppm nul
JPEG XL encoder v0.12.0 029cec42 [_AVX2_] {Clang 20.1.8}
Encoding [Modular, lossless, effort: 1]
Compressed to 7818.6 kB (7.541 bpp).
3840 x 2160, 54.364 MP/s, 0 threads.```
|
|
|
_wb_
|
2025-09-17 09:36:14
|
I don't think there have been any recent changes to the e1 code itself so this is quite strange. Do you get the same thing when doing `--num_reps=30`?
|
|
|
jonnyawsom3
|
2025-09-17 09:38:59
|
v0.8
```cjxl8 -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul
JPEG XL encoder v0.8.4 [AVX2,SSE4,SSSE3,Unknown]
Read 3840x2160 image, 24883217 bytes, 4074.7 MP/s
Encoding [Modular, lossless, effort: 1],
Compressed to 7817776 bytes (7.540 bpp).
3840 x 2160, geomean: 157.25 MP/s [132.82, 175.94], 100 reps, 0 threads.```
v0.9
```cjxl9 -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul
JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2]
Read 3840x2160 image, 24883217 bytes, 4071.5 MP/s
Encoding [Modular, lossless, effort: 1]
Compressed to 7818.6 kB (7.541 bpp).
3840 x 2160, geomean: 23.319 MP/s [21.60, 24.43], 100 reps, 0 threads.```
Optimized main
```cjxl -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul
JPEG XL encoder v0.12.0 029cec42 [_AVX2_] {Clang 20.1.8}
Read 3840x2160 image, 24883217 bytes, 5587.6 MP/s
Encoding [Modular, lossless, effort: 1]
Compressed to 7818.6 kB (7.541 bpp).
3840 x 2160, geomean: 54.500 MP/s [50.697, 58.616], 100 reps, 0 threads.```
|
|
|
_wb_
I don't think there have been any recent changes to the e1 code itself so this is quite strange. Do you get the same thing when doing `--num_reps=30`?
|
|
2025-09-17 09:40:47
|
No, but it has been nearly 2 years since v0.9, so it wouldn't have been a recent change
|
|
2025-09-17 09:41:30
|
Glad I found it while we're wrapping up v0.12, means we can hopefully find and fix the problem, making it even more of a golden release
|
|
|
monad
|
2025-09-17 10:54:47
|
haven't seen any evidence of such behavior in my environment
|
|
|
spider-mario
|
|
v0.8
```cjxl8 -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul
JPEG XL encoder v0.8.4 [AVX2,SSE4,SSSE3,Unknown]
Read 3840x2160 image, 24883217 bytes, 4074.7 MP/s
Encoding [Modular, lossless, effort: 1],
Compressed to 7817776 bytes (7.540 bpp).
3840 x 2160, geomean: 157.25 MP/s [132.82, 175.94], 100 reps, 0 threads.```
v0.9
```cjxl9 -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul
JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2]
Read 3840x2160 image, 24883217 bytes, 4071.5 MP/s
Encoding [Modular, lossless, effort: 1]
Compressed to 7818.6 kB (7.541 bpp).
3840 x 2160, geomean: 23.319 MP/s [21.60, 24.43], 100 reps, 0 threads.```
Optimized main
```cjxl -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul
JPEG XL encoder v0.12.0 029cec42 [_AVX2_] {Clang 20.1.8}
Read 3840x2160 image, 24883217 bytes, 5587.6 MP/s
Encoding [Modular, lossless, effort: 1]
Compressed to 7818.6 kB (7.541 bpp).
3840 x 2160, geomean: 54.500 MP/s [50.697, 58.616], 100 reps, 0 threads.```
|
|
2025-09-17 11:10:42
|
on my corp machine, both 0.8.0 and 0.9.0 are about 85MP/s
|
|
2025-09-17 11:10:49
|
so I canβt really bisect
|
|
2025-09-17 11:11:47
|
thatβs with clang 19
|
|
2025-09-17 11:12:55
|
ah, main is 50
|
|
|
monad
|
2025-09-17 11:13:12
|
moved to more recent versions and I see a difference between 0.10 and 0.11
|
|
2025-09-17 11:24:42
|
```JPEG XL encoder v0.10.0 bf2b7655 [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 1]
Compressed to 64122 bytes (0.247 bpp).
1920 x 1080, geomean: 1332.567 MP/s [591.44, 1560.84], 5 reps, 0 threads.```
```JPEG XL encoder v0.11.0 4df1e9ec [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 1]
Compressed to 23224 bytes (0.090 bpp).
1920 x 1080, geomean: 88.376 MP/s [49.61, 88.52], , 5 reps, 0 threads.```
|
|
|
spider-mario
|
2025-09-17 11:28:52
|
my bisection leads me to https://github.com/libjxl/libjxl/pull/3658
|
|
2025-09-17 11:29:00
|
maybe a bug in the CPU feature detection?
|
|
|
jonnyawsom3
|
2025-09-17 11:29:57
|
We figured out it's 2 separate regressions. One is compiling libjxl with MSVC, the other is with standalone fast lossless
MSVC is also giving different output to Clang, apparently
```cjxlGithub -v -d 0 -e 1 --num_reps 50 --num_threads 0 Test.ppm Github.jxl
JPEG XL encoder v0.9.0 6768ea8 [AVX2,SSE4,SSSE3,SSE2]
Read 3840x2160 image, 24883217 bytes, 3964.6 MP/s
Encoding [Modular, lossless, effort: 1]
Compressed to 7818.6 kB (7.541 bpp).
3840 x 2160, geomean: 22.902 MP/s [21.32, 23.44], 50 reps, 0 threads.
Wall time: 0 days, 00:00:18.147 (18.15 seconds)
User time: 0 days, 00:00:00.359 (0.36 seconds)
Kernel time: 0 days, 00:00:17.781 (17.78 seconds)
cjxlClang -v -d 0 -e 1 --num_reps 50 --num_threads 0 Test.ppm Clang.jxl
JPEG XL encoder v0.9.1 b8ceae3a [AVX2,SSE4,SSE2]
Read 3840x2160 image, 24883217 bytes, 4035.2 MP/s
Encoding [Modular, lossless, effort: 1]
Compressed to 7817.8 kB (7.540 bpp).
3840 x 2160, geomean: 140.517 MP/s [106.11, 148.57], 50 reps, 0 threads.
Wall time: 0 days, 00:00:03.001 (3.00 seconds)
User time: 0 days, 00:00:00.375 (0.38 seconds)
Kernel time: 0 days, 00:00:02.625 (2.62 seconds)```
|
|
2025-09-17 11:30:51
|
Clang output is 0.001 bpp smaller and 6x faster
|
|
|
A homosapien
|
2025-09-17 12:05:42
|
The standalone fjxl binary seems to thread better than libjxl
|
|
2025-09-17 12:39:02
|
Comparing libjxl and fjxl-standalone, compiled with clang on Windows
```
v0.8.4
βββββββββββ€ββββββββββββββ€ββββββββββββββββ
β Threads β cjxl β fast-lossless β
β ββββββββββͺββββββββββββββͺββββββββββββββββ£
β 1 β 249.32 MP/s β 295.357 MP/s β
βββββββββββΌββββββββββββββΌββββββββββββββββ’
β 6 β 694.60 MP/s β 1027.186 MP/s β
βββββββββββΌββββββββββββββΌββββββββββββββββ’
β 12 β 773.32 MP/s β 1248.640 MP/s β
βββββββββββ§ββββββββββββββ§ββββββββββββββββ
v0.10.4
βββββββββββ€βββββββββββββββ€ββββββββββββββββ
β Threads β cjxl β fast-lossless β
β ββββββββββͺβββββββββββββββͺββββββββββββββββ£
β 1 β 235.001 MP/s β 274.598 MP/s β
βββββββββββΌβββββββββββββββΌββββββββββββββββ’
β 6 β 606.156 MP/s β 888.753 MP/s β
βββββββββββΌβββββββββββββββΌββββββββββββββββ’
β 12 β 670.190 MP/s β 1025.519 MP/s β
βββββββββββ§βββββββββββββββ§ββββββββββββββββ
v0.11.1
βββββββββββ€βββββββββββββββ€ββββββββββββββββ
β Threads β cjxl β fast-lossless β
β ββββββββββͺβββββββββββββββͺββββββββββββββββ£
β 1 β 65.821 MP/s β 69.284 MP/s β
βββββββββββΌβββββββββββββββΌββββββββββββββββ’
β 6 β 263.546 MP/s β 304.011 MP/s β
βββββββββββΌβββββββββββββββΌββββββββββββββββ’
β 12 β 354.191 MP/s β 451.722 MP/s β
βββββββββββ§βββββββββββββββ§ββββββββββββββββ
```
|
|
|
spider-mario
my bisection leads me to https://github.com/libjxl/libjxl/pull/3658
|
|
2025-09-17 12:55:21
|
Can confirm, libjxl's e1 speeds have been halved since this commit.
Wow this is crazy, the MSVC builds from github average around ~175 MP/s while these old builds get around 670+.
|
|
|
jonnyawsom3
|
2025-09-17 01:19:07
|
https://github.com/libjxl/libjxl/issues/4447
|
|
2025-09-17 01:27:30
|
On a vageuly related note, remember that only main is truly lossless for fast lossless right now
|
|
|
monad
|
2025-09-17 01:35:47
|
"truly lossless"
|
|
|
jonnyawsom3
|
|
monad
"truly lossless"
|
|
2025-09-17 01:37:52
|
https://github.com/libjxl/libjxl/issues/4026
|
|
|
TheBigBadBoy - πΈπ
|
|
Fox Wizard
jhead ``-du``
|
|
2025-09-17 01:39:33
|
guess what
finally found what were those "6 bytes" you managed to remove thanks to `jhead -du`
It is an additional marker, `0xFFCC` (SOF12 or DAC), which explicitly "Define Arithmetic Coding" [β ](https://cdn.discordapp.com/emojis/867794291652558888.webp?size=48&name=dogelol)
So you are not supposed to remove that
|
|
2025-09-17 01:40:15
|
-# and I just won 1 pizza π
|
|
|
Fox Wizard
|
2025-09-17 01:41:46
|
How did you remember that lmao
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 01:43:37
|
I have a folder with various files that "idk how he beat me" <:KekDog:805390049033191445>
|
|
|
Fox Wizard
|
2025-09-17 01:43:56
|
Hmmm, interesting
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 01:44:10
|
and since I'm working on my file optimizer, I decided to take a look at that file
|
|
|
Fox Wizard
|
2025-09-17 01:44:30
|
Although that probably won't ever happen again since I stopped caring about shaving off every last byte lol
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 01:45:34
|
https://tenor.com/view/wtf-gif-11403402916845629606
|
|
2025-09-17 01:46:26
|
[β ](https://cdn.discordapp.com/emojis/1368821202969432177.webp?size=48&name=SupremelyDispleased)
|
|
|
Fox Wizard
|
2025-09-17 01:46:30
|
I now only do random stuff like that when it's necessary
|
|
2025-09-17 01:46:39
|
Because too lazy to thinkβ’
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 01:46:57
|
that, and also because it takes soooo much time <:KekDog:805390049033191445>
|
|
|
Fox Wizard
|
2025-09-17 01:47:16
|
Probably true
|
|
2025-09-17 01:47:25
|
And I need motivation which is very hard to get
|
|
2025-09-17 01:48:06
|
But when I'm motivated I sometimes end up doing things I never expected I could even come close to accomplishing <:KekDog:884736660376535040>
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 01:48:23
|
that's one of the main reason my music library does not grow
(that, and the time it takes me to manually credit everyone including violins...)
|
|
|
Fox Wizard
|
|
TheBigBadBoy - πΈπ
|
|
Fox Wizard
But when I'm motivated I sometimes end up doing things I never expected I could even come close to accomplishing <:KekDog:884736660376535040>
|
|
2025-09-17 01:48:47
|
well anyway, if you ever do this kind of stuff again do not hesitate to hit me up π
|
|
|
Fox Wizard
|
2025-09-17 01:49:45
|
My music library consists mostly of slow encoded FLACs that now feels like a waste of time since FLAC 1.5.0 was more efficient almost every time with much faster params <a:KMS:821038589096886314>
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 01:50:14
|
`-j` is really nice indeed x)
|
|
|
Fox Wizard
|
2025-09-17 01:50:25
|
My interests and the things I end up doing are always mega random and rarely last long
|
|
2025-09-17 01:50:48
|
And I don't know why, but I do the most interesting things when I'm not really sober lmao
|
|
2025-09-17 01:52:37
|
Like recently. Stuff happened which caused me to not be able to sleep for a long time, so I had to force myself asleep with medication. And then I had to force myself out of a drowsy barely mentally existing state after not being able to sleep anymore for 3 hours on heavy meds
|
|
2025-09-17 01:55:24
|
Long story short, someone put info that's very important to me in a protected zip file. First layer was a Caesar Cipher with key. Second layer was a key that was a base64 encode of a HEX encode of German text that had to get translated into English
|
|
2025-09-17 01:57:34
|
And then that key had to be used to unlock CBC PKCS5 128 BITS IV encrypted text lmao
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 01:59:16
|
Bruh <:KekDog:805390049033191445>
Hopefully you're doing well now <:FrogSupport:805394101528035328>
|
|
|
Fox Wizard
|
2025-09-17 01:59:21
|
And some other stuff got involved too. Long story short, I don't even remember doing that since I was so... let's say zoooooted and no idea how to do that again
|
|
2025-09-17 01:59:50
|
But I got it done <:KekDog:884736660376535040>
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 02:00:29
|
Maybe it's time to finally get pizzas together [β ](https://cdn.discordapp.com/emojis/992960743169347644.webp?size=48&name=PirateCat)
|
|
|
Fox Wizard
|
2025-09-17 02:02:05
|
Maybe someday
|
|
2025-09-17 02:02:32
|
Currently fully anti social, because of reasons and barely feeling emotions because of other reasons <:KekDog:884736660376535040>
|
|
|
spider-mario
|
|
Fox Wizard
My music library consists mostly of slow encoded FLACs that now feels like a waste of time since FLAC 1.5.0 was more efficient almost every time with much faster params <a:KMS:821038589096886314>
|
|
2025-09-17 02:59:54
|
have you tried FLACCL?
|
|
2025-09-17 02:59:58
|
GPU-accelerated flac encoding
|
|
|
Fox Wizard
|
2025-09-17 03:00:38
|
I vaguely remember doing that a long time ago, but it was less efficient I think
|
|
|
TheBigBadBoy - πΈπ
|
2025-09-17 03:10:27
|
it's especially good for long files, bc of its init time
|
|
|
spider-mario
|
|
spider-mario
my bisection leads me to https://github.com/libjxl/libjxl/pull/3658
|
|
2025-09-17 05:24:12
|
seems to have caused an even larger regression on my personal machine, from 250MP/s to 60MP/s
(I bisected again from scratch and landed on that exact commit again)
|
|
|
jonnyawsom3
|
2025-09-17 05:37:27
|
Depending on what version, compiler and image, I was seeing around 6x slower. Clang brought back 6x, but the bisecting revealed a further 2x that was lost from standalone. Of course, 'just use Clang' isn't really a solution :P
|
|
|
spider-mario
|
2025-09-17 05:37:55
|
mine is with clang
|
|
2025-09-17 05:39:05
|
```console
$ clang++ --version
clang version 20.1.8
Target: x86_64-w64-windows-gnu
Thread model: posix
InstalledDir: C:/msys64/mingw64/bin
```
Iβd just like to interject for a moment. What you're referring to as Windows, is in fact, GNU/Windows, or as I've recently taken to calling it, GNU plus Windows.
|
|
2025-09-17 05:40:26
|
I havenβt checked yet but I suspect https://github.com/libjxl/libjxl/blob/1c3d187019537700e26a426de7b8be58e4f8262a/lib/jxl/enc_fast_lossless.cc#L185-L188 might be triggering?
|
|
2025-09-17 05:42:43
|
ooh, bits 5-6-7 are for AVX-512 (https://en.wikipedia.org/wiki/Control_register#XCR0_and_XSS)
|
|
2025-09-17 05:42:54
|
so if we donβt detect AVX-512, we clear everything and donβt even detect AVX2?
|
|
2025-09-17 05:43:00
|
<@811568887577444363> am I reading this right?
|
|
|
jonnyawsom3
|
2025-09-17 05:43:27
|
I left some details in the [issue](<https://github.com/libjxl/libjxl/issues/4447>) but the github releases are around 2-3x slower. I'm probably getting some numbers mixed up, I had 7 versions of cjxl in my console and a wall of results to pick from
|
|
|
spider-mario
I havenβt checked yet but I suspect https://github.com/libjxl/libjxl/blob/1c3d187019537700e26a426de7b8be58e4f8262a/lib/jxl/enc_fast_lossless.cc#L185-L188 might be triggering?
|
|
2025-09-17 05:43:33
|
Maybe? But even without AVX or SSE, a 6x slowdown seems extreme (12x in some cases)
|
|
|
spider-mario
|
2025-09-17 05:45:15
|
it seems plausible to me
|
|
2025-09-17 05:45:20
|
an AVX2 vector is 8 floats (or int32s)
|
|
|
jonnyawsom3
|
2025-09-17 05:47:12
|
Actually right, I had the usual few percentage gains in my head from AVX2 on random binaries. Fast lossless has a lot of handwritten optimizations so it could be pulling it off
|
|
2025-09-17 05:48:44
|
I'm honestly surprised fast lossless even has AVX512, seeing as it was disabled by default in libjxl due to little gains
|
|
|
spider-mario
|
2025-09-17 05:56:10
|
this makes it fast again for me:
```diff
diff --git a/lib/jxl/enc_fast_lossless.cc b/lib/jxl/enc_fast_lossless.cc
index e8ea04913..c87ae229b 100644
--- a/lib/jxl/enc_fast_lossless.cc
+++ b/lib/jxl/enc_fast_lossless.cc
@@ -184,7 +184,8 @@ uint32_t DetectCpuFeatures() {
const uint32_t xcr0 = ReadXCR0();
if (!check_bit(xcr0, 1) || !check_bit(xcr0, 2) || !check_bit(xcr0, 5) ||
!check_bit(xcr0, 6) || !check_bit(xcr0, 7)) {
- flags = 0; // TODO(eustas): be more selective?
+ // No AVX-512; disable everything but AVX2 if present
+ flags &= CpuFeatureBit(CpuFeature::kAVX2);
}
}
```
|
|
|
jonnyawsom3
|
2025-09-17 05:57:18
|
Don't suppose you could send the binary for me to check too?
|
|
|
spider-mario
|
2025-09-17 06:00:04
|
hopefully, this works
|
|
|
jonnyawsom3
|
2025-09-17 06:13:35
|
Yeah, that got singlethreaded back to normal, maybe 10% slower than v0.8
Multithreaded still has a way to go
v0.8
```Compressed to 9027058 bytes including container (8.707 bpp).
3840 x 2160, geomean: 328.88 MP/s [260.63, 367.54], 50 reps, 8 threads.```
Fixed Main
```Compressed to 9027.1 kB including container (8.707 bpp).
3840 x 2160, geomean: 289.702 MP/s [243.688, 319.687], 50 reps, 8 threads.```
Standalone fast lossless
```485.644 MP/s 8.701 bits/pixel```
Still both faster and slightly denser
|
|
|
A homosapien
|
|
gb82
oops; do u have the 444 link somewhere?
|
|
2025-09-18 02:16:03
|
https://media.xiph.org/video/derf/subset1-y4m.tar.gz
I have good news and bad news. The bad news is that the daala image set was always distributed as limited range 420 in a y4m, which explains the blocky chroma in the corpus.
The good news is that all the images are publicly sourced, so it's easy to find the real original 444 hi-res jpgs. Also they completely ignore the icc profiles of the original jpgs as well.
|
|
2025-09-18 02:17:15
|
I've run into this issue before with other xiph images https://discord.com/channels/794206087879852103/794206170445119489/1364305480759119984
|
|
|
username
|
2025-09-18 02:19:07
|
the readme has the sources listed it seems
|
|
|
A homosapien
|
2025-09-18 02:21:29
|
Maybe I should remake the set with proper color conversion/downscaling<:Thonk:805904896879493180>
|
|
2025-09-18 02:27:19
|
Wait, the daala set is even worse than I thought. It's not just desaturated due to it ignoring the icc profiles, there is also slight color shift. I think somebody used ffmpeg to covert it png, which assumes bt.601 color primaries instead of bt.709.
|
|
2025-09-18 02:31:18
|
[This set](https://github.com/WyohKnott/image-formats-comparison/tree/gh-pages/comparisonfiles/subset1/Original) is flawed in multiple ways.
|
|
|
jonnyawsom3
|
2025-09-18 08:01:29
|
So, I was messing around with a black 4096*4096 image to see how fast I could get fast lossless to go... Interesting results
Fixed main cjxl
```JPEG XL encoder v0.12.0 b662606ed [_AVX2_,SSE4,SSE2] {Clang 20.1.8}
Encoding [Modular, lossless, effort: 1]
Compressed to 18281 bytes (0.009 bpp).
4096 x 4096, geomean: 100.359 MP/s [76.933, 118.257], 50 reps, 8 threads.
PageFaultCount: 4322443
PeakWorkingSetSize: 163.4 MiB
QuotaPeakPagedPoolUsage: 52.43 KiB
QuotaPeakNonPagedPoolUsage: 9.422 KiB
PeakPagefileUsage: 163.7 MiB
Creation time 2025/09/18 08:55:34.010
Exit time 2025/09/18 08:55:42.485
Wall time: 0 days, 00:00:08.475 (8.48 seconds)
User time: 0 days, 00:00:16.640 (16.64 seconds)
Kernel time: 0 days, 00:00:20.843 (20.84 seconds)```
v0.10 standalone fast lossless
```wintime -- fjxl Black.png nul 2 50 8
582.256 MP/s
0.003 bits/pixel
PageFaultCount: 92840
PeakWorkingSetSize: 76.41 MiB
QuotaPeakPagedPoolUsage: 31.75 KiB
QuotaPeakNonPagedPoolUsage: 8.086 KiB
PeakPagefileUsage: 201.4 MiB
Creation time 2025/09/18 08:56:05.187
Exit time 2025/09/18 08:56:06.715
Wall time: 0 days, 00:00:01.528 (1.53 seconds)
User time: 0 days, 00:00:00.421 (0.42 seconds)
Kernel time: 0 days, 00:00:03.187 (3.19 seconds)```
v0.9
```JPEG XL encoder v0.9.1 b8ceae3a [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 1]
Compressed to 1576 bytes (0.001 bpp).
4096 x 4096, geomean: 3576.196 MP/s [2910.24, 3812.05], 50 reps, 8 threads.
PageFaultCount: 42017
PeakWorkingSetSize: 36.23 MiB
QuotaPeakPagedPoolUsage: 44.27 KiB
QuotaPeakNonPagedPoolUsage: 7.031 KiB
PeakPagefileUsage: 51.04 MiB
Creation time 2025/09/18 08:57:08.045
Exit time 2025/09/18 08:57:08.329
Wall time: 0 days, 00:00:00.284 (0.28 seconds)
User time: 0 days, 00:00:00.125 (0.12 seconds)
Kernel time: 0 days, 00:00:00.750 (0.75 seconds)```
|
|
2025-09-18 08:02:22
|
I know it's a pure black image, but *something* else big changed to go from 3.6GP/s to 100MP/s and 0.001 bpp to 0.009 bpp
|
|
|
spider-mario
|
2025-09-18 08:10:03
|
1-bit, 8-bit or 16-bit image?
|
|
|
jonnyawsom3
|
2025-09-18 08:12:00
|
Huh, cool... 5.6GP/s
```JPEG XL encoder v0.9.1 b8ceae3a [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 1]
Compressed to 1576 bytes (0.001 bpp).
4096 x 4096, median: 5653.272 MP/s [2906.15, 6052.60] (stdev 892.577), 5000 reps, 16 threads.```
|
|
|
spider-mario
1-bit, 8-bit or 16-bit image?
|
|
2025-09-18 08:12:10
|
1-bit Greyscale
|
|
|
spider-mario
|
2025-09-18 08:34:49
|
apparently, one of the reps reached 11GP/s
> 4096 x 4096, geomean: 6588.863 MP/s [3905.04, 11231.99], 100 reps, 8 threads.
|
|
2025-09-18 08:34:57
|
(v0.9.1Β β now bisecting)
|
|
2025-09-18 08:35:37
|
main (after avx2 fix):
> 4096 x 4096, geomean: 140.945 MP/s [111.058, 159.413], 100 reps, 8 threads.
|
|
2025-09-18 08:35:38
|
yeah.
|
|
|
jonnyawsom3
|
2025-09-18 08:48:52
|
Isn't that 11GP/s?
|
|
|
spider-mario
|
2025-09-18 08:49:10
|
sorry, yes
|
|
2025-09-18 08:49:29
|
probably wasnβt well awake yet
|
|
|
jonnyawsom3
|
2025-09-18 08:49:34
|
Feels weird talking about such high resolutions regardless haha
|
|
2025-09-18 08:50:52
|
For me, increasing reps kept increasing speed due to cache, so you could try increasing it to 5K reps and see if you get closer to that 11GP/s on average
|
|
|
spider-mario
|
2025-09-18 08:51:41
|
oh, interesting effect
|
|
|
Orum
|
|
1-bit Greyscale
|
|
2025-09-18 08:52:06
|
isn't 1-bit by definition B&W, not grayscale?
|
|
|
jonnyawsom3
|
2025-09-18 08:52:28
|
Yeah, but pipelines interpret it as greyscale unless they have a dedicated bitmap pipeline
|
|
|
Orum
|
2025-09-18 08:52:47
|
yeah, I noticed cjxl will not accept pbm <:FeelsSadMan:808221433243107338>
|
|
|
jonnyawsom3
|
2025-09-18 08:55:39
|
Reminds me, I wanted to explore adding custom squeeze levels, so 1-bit images could have 1 squeeze level applied to help compression <https://github.com/libjxl/libjxl/issues/3775#issuecomment-2317324336>
|
|
|
spider-mario
|
2025-09-18 09:39:51
|
<@238552565619359744> itβs https://github.com/libjxl/libjxl/pull/3661
|
|
2025-09-18 09:47:05
|
(later updated in https://github.com/libjxl/libjxl/pull/3733)
|
|
|
jonnyawsom3
|
2025-09-18 10:13:48
|
Apparently that was the one time I didn't try effort 2, but yeah. It's falling back to effort 2 as an unsupported bitdepth now
|
|
2025-09-18 07:30:28
|
Hmm, <@604964375924834314> I just downloaded and tried your latest PR <https://github.com/libjxl/libjxl/pull/4449> from Github actions, but I saw no improvement
Pre-fix
```JPEG XL encoder v0.12.0 ef6f677 [_AVX2_,SSE2] {MSVC 19.44.35215.0}
Encoding [Modular, lossless, effort: 1]
Compressed to 7824.6 kB including container (7.547 bpp).
3840 x 2160, geomean: 21.510 MP/s [19.247, 22.122], 50 reps, 0 threads.```
Post-fix
```JPEG XL encoder v0.12.0 13dfd6f [_AVX2_,SSE2] {MSVC 19.44.35215.0}
Encoding [Modular, lossless, effort: 1]
Compressed to 7824.6 kB including container (7.547 bpp).
3840 x 2160, geomean: 21.602 MP/s [20.023, 22.387], 50 reps, 0 threads.```
Clang v0.9
```JPEG XL encoder v0.9.1 b8ceae3a [AVX2,SSE4,SSE2]
Encoding [Modular, lossless, effort: 1]
Compressed to 7823.9 kB including container (7.546 bpp).
3840 x 2160, geomean: 127.952 MP/s [115.64, 134.51], 50 reps, 0 threads.```
Clang pre-fix
```JPEG XL encoder v0.12.0 029cec42 [_AVX2_] {Clang 20.1.8}
Encoding [Modular, lossless, effort: 1]
Compressed to 7824.6 kB including container (7.547 bpp).
3840 x 2160, geomean: 49.596 MP/s [45.826, 51.540], 50 reps, 0 threads.```
|
|
2025-09-18 07:48:26
|
Maybe this was a regression purely with Clang? And MSVC has just always been this bad...
|
|
2025-09-18 07:53:00
|
Hmm, yeah v0.10 and v0.11 with MSVC are the same speed, 20MP/s while Clang was hitting 50MP/s and now hits 126MP/s with the fix
|
|
2025-09-18 07:55:02
|
And the output is different...
|
|
2025-09-18 07:55:34
|
Fixed Clang
```JPEG XL encoder v0.12.0 b662606ed [_AVX2_,SSE4,SSE2] {Clang 20.1.8}
Encoding [Modular, lossless, effort: 1]
Compressed to 7823.9 kB including container (7.546 bpp).
3840 x 2160, geomean: 125.763 MP/s [105.741, 131.516], 20 reps, 0 threads.```
Fixed MSVC
```JPEG XL encoder v0.12.0 13dfd6f [_AVX2_,SSE2] {MSVC 19.44.35215.0}
Encoding [Modular, lossless, effort: 1]
Compressed to 7824.6 kB including container (7.547 bpp).
3840 x 2160, geomean: 21.456 MP/s [20.881, 22.045], 20 reps, 0 threads.```
|
|
2025-09-18 07:56:03
|
6x faster and slightly denser, so something is wonky between them
|
|
|
spider-mario
I havenβt checked yet but I suspect https://github.com/libjxl/libjxl/blob/1c3d187019537700e26a426de7b8be58e4f8262a/lib/jxl/enc_fast_lossless.cc#L185-L188 might be triggering?
|
|
2025-09-19 01:15:02
|
I know almost nothing about compilers so I'm not sure, but isn't this disabling AVX2 when MSVC is being used?
<https://github.com/libjxl/libjxl/blob/13dfd6ffee8a342a16b90fb66a024fee27835dbb/lib/jxl/enc_fast_lossless.cc#L59>
|
|
2025-09-19 01:27:02
|
A google search later and apparently MSVC just sucks, regardless of what compile options or intrinsics you give it
|
|
|
username
|
|
A google search later and apparently MSVC just sucks, regardless of what compile options or intrinsics you give it
|
|
2025-09-19 01:35:06
|
|
|
|
jonnyawsom3
|
2025-09-19 01:41:42
|
π
|
|
2025-09-19 01:41:42
|
https://github.com/libjxl/libjxl/pull/3368
|
|
|
Orum
|
2025-09-19 02:05:09
|
why was that never merged?
|
|
2025-09-19 02:05:19
|
just awaiting review?
|
|
|
jonnyawsom3
|
2025-09-19 02:17:38
|
It was waiting on this https://github.com/libjxl/libjxl/pull/3388
|
|
2025-09-19 02:18:59
|
Though the Github actions run clang builds already, so I'm not sure what's holding it up
|
|
|
spider-mario
|
|
username
|
|
2025-09-19 07:01:20
|
gone are the times when MSVC, while way behind in language support, was at least competitive perfomance-wise
|
|
|
|
veluca
|
|
Though the Github actions run clang builds already, so I'm not sure what's holding it up
|
|
2025-09-19 09:41:15
|
life
|
|
2025-09-19 09:41:44
|
or me having forgotten completely about it and being busy with... a million things, including jxl-rs
|
|
|
jonnyawsom3
|
2025-09-19 09:50:42
|
Understandably so
|
|
|
Kupitman
|
|
A google search later and apparently MSVC just sucks, regardless of what compile options or intrinsics you give it
|
|
2025-09-20 11:32:13
|
Wtf is msvc, use gcc
|
|
|
A homosapien
|
2025-09-20 12:39:04
|
Clang faster than gcc for libjxl
|
|
|
Orum
|
2025-09-20 01:33:33
|
gcc is dead, long live clang
|
|
|
Kupitman
|
2025-09-20 02:45:27
|
GCC >>>
|
|
2025-09-20 02:46:37
|
https://tenor.com/view/richard-stallman-stallman-rms-emacs-gnu-gif-23943451
|
|
2025-09-23 12:04:25
|
https://media.giphy.com/media/4DB6MagAp0F7EkOH6U/giphy.gif
|
|
|
jonnyawsom3
|
2025-09-23 02:55:18
|
I completely forgot, my CPU is so old that AVX2 takes 2 cycles. It's surprising fast lossless is only 50% slower than Veluca's old 5800x results I was comparing against, seeing as it's already 50% higher clock speed and 4 years of architectural improvements between them
|
|
|
Orum
|
2025-10-02 01:56:30
|
JXL doing nicely here, 61% of the original TIFF (which is compressed as well):
```
87738044 Webb_Reveals_Cosmic_Cliffs,_Glittering_Landscape_of_Star_Birth-10.jxl
103789782 Webb_Reveals_Cosmic_Cliffs,_Glittering_Landscape_of_Star_Birth-7.webp
143648188 Webb_Reveals_Cosmic_Cliffs,_Glittering_Landscape_of_Star_Birth.tiff
```
|
|
2025-10-02 01:56:56
|
if only the government would use it <:FeelsSadMan:808221433243107338>
|
|
|
AccessViolation_
|
2025-10-07 05:03:57
|
potentially interesting benchmark image:
https://upload.wikimedia.org/wikipedia/commons/e/ea/Mandelbox_mit_farbigem_Nebel_und_Licht_20241111_%28color%29.png
|
|
|
Mine18
|
|
AccessViolation_
potentially interesting benchmark image:
https://upload.wikimedia.org/wikipedia/commons/e/ea/Mandelbox_mit_farbigem_Nebel_und_Licht_20241111_%28color%29.png
|
|
2025-10-07 07:34:55
|
surprisingly, AVIF on the lowest quality level looks reallly good
|
|
2025-10-07 07:38:40
|
here's what i got
|
|
|
AccessViolation_
|
2025-10-07 07:58:20
|
jxl does pretty alright here too!
|
|
2025-10-07 08:03:51
|
I got a weird thing at `-q 10` where it didn't seem to apply [adaptive LF smoothing](<https://arxiv.org/pdf/2506.05987#subsection.6.2>), but then it *did* apply it at `-q 5` and `-q 1`, so surprisingly, those lower quality settings end up looking better, since a large part of this image is its gradients
|
|
2025-10-07 08:10:23
|
the lowest quality JXL (just `cjxl source.png -d 25 d25.jxl`) still looks pretty good. I think the gradients look better than the AVIF, but AVIF is better at turning the structure into smooth blocks with nice sharp edges
|
|
2025-10-07 08:11:08
|
-
others for reference:
(specifically for if someone wants to look into why adaptive LF smoothing didn't seem to trigger at `-q 10`, it looks *really* bad)
|
|
|
jonnyawsom3
|
|
AccessViolation_
I got a weird thing at `-q 10` where it didn't seem to apply [adaptive LF smoothing](<https://arxiv.org/pdf/2506.05987#subsection.6.2>), but then it *did* apply it at `-q 5` and `-q 1`, so surprisingly, those lower quality settings end up looking better, since a large part of this image is its gradients
|
|
2025-10-07 08:14:46
|
What distance is that?
|
|
|
AccessViolation_
|
|
What distance is that?
|
|
2025-10-07 08:16:09
|
`Encoding [VarDCT, d15.267, effort: 7]`
|
|
|
jonnyawsom3
|
2025-10-07 08:17:51
|
Interesting, I can't think of any thresholds around there
|
|
2025-10-07 08:29:51
|
Well, at least I made the right call enabling resampling at distance 10. Looks sharper and 20% smaller using a lower internal resolution... Does hint that the encoder could try harder though
|
|
|
AccessViolation_
|
2025-10-07 08:45:15
|
especially on a massive image like this that's often going to be looked at at 10%-20% scale anyway it's a smart move
|
|
2025-10-07 08:46:39
|
but even when zoomed in like that the resampling helps nicely π
|
|
|
jonnyawsom3
|
2025-10-07 09:11:59
|
I was going to explore using 4x resampling too on top of the automatic 2x, but it would need to check the resolution as it's just too blurry for anything below 4K
|
|
|
juliobbv
|
|
AccessViolation_
the lowest quality JXL (just `cjxl source.png -d 25 d25.jxl`) still looks pretty good. I think the gradients look better than the AVIF, but AVIF is better at turning the structure into smooth blocks with nice sharp edges
|
|
2025-10-07 09:45:01
|
I was curious and tried the image with my mystery AVIF encoder (file-size matched to d25.jxl) π
the second encode is the absolute lowest quality available (57 KB)
|
|
2025-10-07 09:47:08
|
"mystery encoder": SVT-AV1 tune IQ, preset -1
|
|
2025-10-07 09:47:25
|
this is a nice benchmark image
|
|
|
AccessViolation_
|
2025-10-07 09:49:52
|
very impressive
|
|
2025-10-07 09:51:23
|
this is the source, btw
https://commons.wikimedia.org/wiki/File:Mandelbox_mit_farbigem_Nebel_und_Licht_20241111_(color).png
|
|
|
juliobbv
|
2025-10-07 09:51:24
|
yeah, I love love JXL's handling of gradients though
|
|
|
AccessViolation_
|
2025-10-07 09:55:20
|
yeah, adaptive LF smoothing works really well
|
|
|
jonnyawsom3
|
2025-10-07 09:59:20
|
And the gradient predictor :P
|
|
|
A homosapien
|
|
juliobbv
"mystery encoder": SVT-AV1 tune IQ, preset -1
|
|
2025-10-07 10:01:20
|
Wait, tune IQ? Not tune=4 "still image"?
|
|
|
juliobbv
|
|
A homosapien
Wait, tune IQ? Not tune=4 "still image"?
|
|
2025-10-07 10:02:15
|
yep, mainline SVT-AV1 now has its own tune IQ too
|
|
2025-10-07 10:02:46
|
so it got assigned tune=3 because that was the next one available lol
|
|
|
A homosapien
|
2025-10-07 10:03:11
|
how does it compare to the psy forks?
|
|
|
juliobbv
|
2025-10-07 10:03:31
|
it's mostly the same as the psy forks
|
|
2025-10-07 10:04:11
|
the only exception is that I had to switch it back to use SSD instead of SSIM RDO as the latter wasn't compatible with `--avif`
|
|
2025-10-07 10:04:38
|
but all of the other tweaks were transferred verbatim from -psy
|
|
|
A homosapien
|
2025-10-07 10:04:54
|
nice to see big features getting merged into mainline
|
|
|
juliobbv
|
2025-10-07 10:05:12
|
yeah, this one was long overdue too
|
|
2025-10-07 10:05:30
|
libaom's is actually a derivative, and that one got merged first
|
|
2025-10-07 10:08:25
|
and it's also a relief for maintenance purposes, because the code delta between mainline and the psy forks is becoming much smaller, so rebasing the forks onto new mainline versions is now faster
|
|
|
A homosapien
|
|
juliobbv
the only exception is that I had to switch it back to use SSD instead of SSIM RDO as the latter wasn't compatible with `--avif`
|
|
2025-10-07 11:35:44
|
Where is that located in the code? I would like to compile a fresh build myself.
|
|
|
juliobbv
|
|
A homosapien
Where is that located in the code? I would like to compile a fresh build myself.
|
|
2025-10-07 11:38:04
|
it's in a few places
|
|
2025-10-07 11:38:22
|
basically, everything that references the `TUNE_SSIM` constant
|
|
|
A homosapien
|
2025-10-07 11:39:52
|
And I would just change it to `SSD`?
|
|
|
juliobbv
|
|
A homosapien
And I would just change it to `SSD`?
|
|
2025-10-07 11:41:17
|
you'd need to add `TUNE_IQ` to be alongside `TUNE_SSIM`
|
|
2025-10-07 11:41:43
|
most of the time the constants are used in checks in if clauses
|
|
2025-10-07 11:44:30
|
for example, in `src_ops_process.c` you'd add the `TUNE_IQ` check :
```
if (scs->static_config.tune == TUNE_SSIM || scs->static_config.tune == TUNE_IQ) {
aom_av1_set_mb_ssim_rdmult_scaling(pcs);
}
```
|
|
|
AccessViolation_
-
others for reference:
(specifically for if someone wants to look into why adaptive LF smoothing didn't seem to trigger at `-q 10`, it looks *really* bad)
|
|
2025-10-08 01:05:07
|
oh wow, you weren't joking on adaptive LF
|
|
2025-10-08 01:05:29
|
it really improves the quality of gradients
|
|
2025-10-08 01:06:41
|
there's so much banding in the no adaptive LF image that it makes it look like it was decoded with like 6 bits lol
|
|
2025-10-08 01:07:08
|
|
|
|
jonnyawsom3
|
2025-10-08 04:37:16
|
Actually, it's worse than that, because it should be dithered for VarDCT
|
|
|
AccessViolation_
|
|
juliobbv
oh wow, you weren't joking on adaptive LF
|
|
2025-10-08 04:41:00
|
yeah I'm surprised adaptive LF smoothing actually works this well
|
|
|
jonnyawsom3
|
|
AccessViolation_
yeah I'm surprised adaptive LF smoothing actually works this well
|
|
2025-10-08 07:37:55
|
Well, it was built for it
|
|
|
AccessViolation_
|
2025-10-08 07:40:23
|
yeah I know but for some reason I didn't expect it to work well on varblocks this large
|
|
|
A homosapien
|
|
juliobbv
the only exception is that I had to switch it back to use SSD instead of SSIM RDO as the latter wasn't compatible with `--avif`
|
|
2025-10-09 12:38:15
|
Well I just compiled SVT-AV1 from main and `tune=IQ` seems to work fine with `-a avif=1`, when I make the changes you suggested avifenc crashes lol.
|
|
|
juliobbv
|
|
A homosapien
Well I just compiled SVT-AV1 from main and `tune=IQ` seems to work fine with `-a avif=1`, when I make the changes you suggested avifenc crashes lol.
|
|
2025-10-09 12:38:51
|
yep, that's why those changes were left out π
|
|
2025-10-09 12:39:37
|
you can still use it without avif mode though
|
|
2025-10-09 12:40:08
|
you'll lose a few bytes due to the full AV1 header being used instead of the reduced still picture one
|
|
|
A homosapien
|
2025-10-09 12:40:30
|
I just like using avif mode for the reduced memory usage
|
|
|
juliobbv
|
2025-10-09 12:40:37
|
yeah, that's fair
|
|
|
A homosapien
|
2025-10-09 12:40:39
|
SVT is quite memory hungry
|
|
2025-10-09 12:49:26
|
Wow avif mode reduces memory consumption quite a lot. Using the 8000x8000 Mendel image as a test.
```
wintime -- avifenc -y 420 --sharpyuv --cicp 1/2/1 -d 10 -c svt -a tune=3 -a avif=x -a lp=4 -q 10 --tilecolslog2 0 --tilerowslog2 0 mandel.png mandel.avif
-a avif=1 = PeakWorkingSetSize: 2.216 GiB
-a avif=0 = PeakWorkingSetSize: 15.1 GiB
```
|
|