JPEG XL

Info

rules 57
github 35276
reddit 647

JPEG XL

tools 4225
website 1655
adoption 20712
image-compression-forum 0

General chat

welcome 3810
introduce-yourself 291
color 1414
photography 3435
other-codecs 23765
on-topic 24923
off-topic 22701

Voice Channels

General 2147

Archived

bot-spam 4380

benchmarks

π•°π–’π–—π–Š
Hmm, have you tried decoding with output disabled? (remove `-o enc.png` from Oxide and change djxl to `djxl enc.jxl nul --output_format ppm --bits_per_sample 8`)
2025-08-01 05:02:59
2025-08-01 05:03:18
but I am not sure if it's directly comparable
jonnyawsom3
2025-08-01 05:03:21
Yeah, that's more like what I expected
π•°π–’π–—π–Š
2025-08-01 05:03:24
it seems there is initialization difference
2025-08-01 05:03:55
how can png encoding be 2x slower then?
jonnyawsom3
2025-08-01 05:03:56
Then use the MP/s both tools output instead of external metrics
Quackdoc
π•°π–’π–—π–Š how can png encoding be 2x slower then?
2025-08-01 05:04:20
libjxl's png encoder is dogshit slow
jonnyawsom3
2025-08-01 05:04:24
That's what you're meant to do to ignore output and IO overhead
π•°π–’π–—π–Š
Quackdoc libjxl's png encoder is dogshit slow
2025-08-01 05:04:36
but isn't this important for actual decoding?
jonnyawsom3
Quackdoc libjxl's png encoder is dogshit slow
2025-08-01 05:04:43
Oxide is probably zlib-ng libjxl is zlib
π•°π–’π–—π–Š
2025-08-01 05:04:43
for example I need png images
Quackdoc
2025-08-01 05:04:46
it depends,
afed
2025-08-01 05:04:47
different defaults, different png libs and depends on which png library libjxl is compiled with
Quackdoc
2025-08-01 05:05:00
something like imagemagick or ffmpeg have their own png encoders
jonnyawsom3
2025-08-01 05:05:26
It used to be much worse, but we set libjxl to zlib level 1 to remove most overhead without exploding PNG filesize
Quackdoc
2025-08-01 05:06:09
I wonder if I could add rust-png to libjxl hmmm
jonnyawsom3
π•°π–’π–—π–Š but isn't this important for actual decoding?
2025-08-01 05:06:16
'actual decoding' is just getting pixel data from the file, saving that data to another format is then encoding with a different format
π•°π–’π–—π–Š
'actual decoding' is just getting pixel data from the file, saving that data to another format is then encoding with a different format
2025-08-01 05:06:41
yes, I understand that
2025-08-01 05:06:47
but for example, in order to use CVVDP
2025-08-01 05:06:55
I need a PNG file
2025-08-01 05:07:15
so if I do graphs for example (decoding to PNG many times)
2025-08-01 05:07:26
there can be an hour of decoding difference between jxl-oxide and djxl
2025-08-01 05:08:09
so the bottleneck for djxl here is strictly encoding into png
Quackdoc
2025-08-01 05:08:42
yeah, it would be better to use magick or something
jonnyawsom3
2025-08-01 05:09:06
Thinking about it, there's a chance libjxl is still using filters while Oxide isn't. I can check later, but I'd expect it's because of zlib vs zlib-ng
2025-08-01 05:09:35
fpnge would be ideal, but as x86 only it's not suitable for libjxl
π•°π–’π–—π–Š
2025-08-01 05:09:44
Quackdoc
Thinking about it, there's a chance libjxl is still using filters while Oxide isn't. I can check later, but I'd expect it's because of zlib vs zlib-ng
2025-08-01 05:10:54
zlib-ng vs zlib?
π•°π–’π–—π–Š
π•°π–’π–—π–Š
2025-08-01 05:11:08
afed
fpnge would be ideal, but as x86 only it's not suitable for libjxl
2025-08-01 05:11:09
yeah, but unfortunately it's not a priority even just for x86
Quackdoc
2025-08-01 05:11:11
image-png uses their own zlib
π•°π–’π–—π–Š
2025-08-01 05:11:20
so actual decoding is 3x faster for djxl
2025-08-01 05:11:41
And what's the limitation for jxl-oxide here? Security reasons I assume
Quackdoc
2025-08-01 05:12:46
no C api for image-png https://cdn.discordapp.com/emojis/720670067091570719?size=64
π•°π–’π–—π–Š And what's the limitation for jxl-oxide here? Security reasons I assume
2025-08-01 05:13:13
jxl-oxide is not focused on speed/efficiency, best to wait for jxl-rs for that
afed
2025-08-01 05:16:28
libjxl is heavily simd optimized (and also some other optimizations), but rust versions are not and it is more difficult for rust (but there are some planned improvements in nightly rust for simd, as far as I know)
jonnyawsom3
2025-08-01 05:18:44
Oxide was solely made by Tirr while libjxl has had half a decade of half a dozen google employees making it multithreaded and utilising instruction sets
2025-08-01 05:19:08
(though Eustas has been making a lot more SIMD for the encoder in the past week)
π•°π–’π–—π–Š
afed libjxl is heavily simd optimized (and also some other optimizations), but rust versions are not and it is more difficult for rust (but there are some planned improvements in nightly rust for simd, as far as I know)
2025-08-01 05:36:07
you can utilize SIMD in Rust
2025-08-01 05:36:22
but of course, the project itself may have limitations
2025-08-01 05:36:42
rav1e <:SmileDoge2:1320855554683441163>
Kupitman
2025-08-01 05:40:04
Lol
Quackdoc
π•°π–’π–—π–Š rav1e <:SmileDoge2:1320855554683441163>
2025-08-01 05:40:30
bring back rav1e <:PepeSad:815718285877444619>
Orum
π•°π–’π–—π–Š rav1e <:SmileDoge2:1320855554683441163>
2025-08-01 05:43:39
...and yet it's slow AF
π•°π–’π–—π–Š
2025-08-01 05:44:08
SIMD isn't magically fast
Orum
2025-08-01 05:44:13
exactly
afed
2025-08-01 05:44:29
when asm becomes more than 90%, rav1e will come to life <:kekw:808717074305122316>
π•°π–’π–—π–Š
Orum ...and yet it's slow AF
2025-08-01 05:51:01
it was the fastest encoder back then, though
2025-08-01 05:51:13
the reason it's slow is that the development stopped long ago
2025-08-01 06:02:45
the about page is still the same 😁
Orum
2025-08-01 06:07:10
yeah I think at the time that was written it was the only AV1 encoder
afed
2025-08-01 06:29:24
libaom was available as a normal encoder even before av1 was finalized and rav1e didn't have many av1 tools at first (and even now not everything is fully implemented), so maybe that's why it was faster <:kekw:808717074305122316> also because a lot of asm things were ported from dav1d
Quackdoc
2025-08-01 06:45:36
rav1e used to be fast and good, [cheems](https://cdn.discordapp.com/emojis/720670067091570719.webp?size=48&name=cheems)
Orum
2025-08-01 06:51:43
it was never fast, and it's only really good at intra encoding
π•°π–’π–—π–Š
Orum it was never fast, and it's only really good at intra encoding
2025-08-01 06:55:34
relatively
2025-08-01 06:55:43
it was faster than AOM at some point as far as I remember
2025-08-01 06:57:14
It could easily be competitive if funding and development hadn't stopped.
Quackdoc
Orum it was never fast, and it's only really good at intra encoding
2025-08-01 07:01:35
no, it was quite good in the past
2025-08-01 07:02:30
it was slow as molasses since it couldnt thread worth shit, but if you did chunked encoding it was genuinely the best
2025-08-01 07:02:47
using a threaded tool it was fast
π•°π–’π–—π–Š
Quackdoc it was slow as molasses since it couldnt thread worth shit, but if you did chunked encoding it was genuinely the best
2025-08-01 07:04:43
hmm, true I was using it with av1an
jonnyawsom3
2025-08-01 07:13:53
<#805176455658733570>
Orum
Quackdoc using a threaded tool it was fast
2025-08-01 07:18:55
which isn't possible in many circumstances
A homosapien
Oxide is probably zlib-ng libjxl is zlib
2025-08-01 08:01:11
Benchmarks indicate that rust's png crates are faster than zlib-ng. So it's probably like 3x faster instead of 2x. https://www.reddit.com/r/rust/comments/1ha7uyi/memorysafe_png_decoders_now_vastly_outperform_c/
Tirr
2025-08-01 08:21:07
oxide uses png crate which uses miniz-oxide
A homosapien
2025-08-01 08:58:16
and it's presumably faster than libpng /w zlib-ng
2025-08-01 08:59:00
But I wonder how easy it would be to compile libjxl with zlib-ng
2025-08-01 08:59:37
We have numbers here showing that reading/writing PNG files became significantly faster
jonnyawsom3
2025-08-01 08:59:56
Ideally there'd be a dependancy overhaul. Swapping leftover libjpeg with jpegli, zlib with zlib-ng, ect
π•°π–’π–—π–Š
A homosapien Benchmarks indicate that rust's png crates are faster than zlib-ng. So it's probably like 3x faster instead of 2x. https://www.reddit.com/r/rust/comments/1ha7uyi/memorysafe_png_decoders_now_vastly_outperform_c/
2025-08-01 10:04:03
so jxl-oxide would also be probably faster than using `djxl` + another png encoder
A homosapien
2025-08-01 10:04:33
maybe, it's just speculation
π•°π–’π–—π–Š
2025-08-01 10:04:40
I mean externally
2025-08-01 10:04:46
such as `magick` for example
2025-08-01 10:04:57
since we call two different processes, it's also another overhead
2025-08-01 10:05:28
and since the actual decoding difference is milliseconds (especially for normal sized images) the bottleneck is the png encoding
jonnyawsom3
2025-08-01 10:09:19
fpnge could still work with piping
afed
2025-08-01 10:09:27
djxl + fpnge should be pretty fast if the ssd/hdd isn't a bottleneck fpnge is the fastest png encoder
π•°π–’π–—π–Š
afed djxl + fpnge should be pretty fast if the ssd/hdd isn't a bottleneck fpnge is the fastest png encoder
2025-08-01 10:11:28
let's do a benchmark
jonnyawsom3
2025-08-01 10:14:51
Oh, hmm... I forgot about that `Usage: fpnge [options] in.png out.png`
A homosapien
2025-08-01 10:17:22
Modifing fpnge to take in raw bitmap inputs sounds like a fun side project
afed
Oh, hmm... I forgot about that `Usage: fpnge [options] in.png out.png`
2025-08-01 10:18:19
then yes, I asked to add ppm but I don't remember if it was added or not
π•°π–’π–—π–Š
2025-08-01 10:24:21
interesting size
Quackdoc
π•°π–’π–—π–Š so jxl-oxide would also be probably faster than using `djxl` + another png encoder
2025-08-01 10:24:56
I mean, you could wire up libjxl to rust-png and call that good, that would be the "fastest"
π•°π–’π–—π–Š
Oh, hmm... I forgot about that `Usage: fpnge [options] in.png out.png`
2025-08-01 10:25:57
so it doesn't have ppm, or piped input 😁
2025-08-06 11:33:45
JXL has a problem with screen content. JPEGLI and AVIF don't have the same problem (the sizes and the metric scores are completely linear)
2025-08-06 11:39:44
``` cjxl v0.12.0 73beeb54 -e 10 --keep_invisible=0 --progressive_dc=0 --brotli_effort=11 -x strip=all ```
2025-08-06 11:39:59
reference image used
2025-08-06 11:51:38
The behavior is similar with `resampling=1` `ec_resampling=1`
2025-08-06 11:54:09
The scores and the output sizes are completely non-linear And a much lower size can get a higher score or vice versa.
jonnyawsom3
π•°π–’π–—π–Š The behavior is similar with `resampling=1` `ec_resampling=1`
2025-08-07 01:45:15
What about --resampling 1 and --patches 0?
2025-08-07 01:45:49
I'd bet the size discrepancies are from patches falling in and out of detection as the quality changes
π•°π–’π–—π–Š
2025-08-07 01:57:33
same image on AVIFENC/AOM is much more linear both in size and metric scores (especially for the second SCD mode) Comparison with its new screen content detection mode
Kupitman
2025-08-07 05:26:15
AV2 released?
2025-08-07 05:26:37
Oh, scd
gb82
π•°π–’π–—π–Š rav1e <:SmileDoge2:1320855554683441163>
2025-08-11 06:40:10
rav1e just borrowed from dav1d
_wb_
2025-08-11 07:51:33
if someone feels like setting up the github actions, it would be nice to make https://github.com/libjxl/bench so that it runs all the tests in actions and auto-updates when the various decoder implementations get updated
jonnyawsom3
2025-08-11 02:08:43
Also <@794205442175402004>, do you know if setting this to 0 is disabling the palette? The wording is confusing me, but setting it to 0 is giving better density on all images I try ```-Y PERCENT, --post-compact=PERCENT Use local (per-group) channel palette if the number of sample values is smaller than this percentage of the nominal range.```
2025-08-11 02:12:45
It's also what we changed to fix progressive lossless, so I'm wondering if it's broken entirely
_wb_
2025-08-11 02:33:55
it could be that this no longer works correctly after moving to chunked encoding
jonnyawsom3
2025-08-11 02:39:39
Chunked is disabled for images under 2048 x 2048 though, so it applies with and without buffering
2025-08-11 02:49:45
It's a 0.1% density change, but it's consistent and given it made progressive lossless 30% larger, it hints something isn't working right
_wb_
2025-08-11 06:46:51
Oh, I guess in combination with squeeze it is very counterproductive to do channel palette since squeeze introduces many small channels.
2025-08-11 06:47:41
Without squeeze it is probably often not worth its signaling overhead so the heuristics should be made stricter
Snafuh
_wb_ if someone feels like setting up the github actions, it would be nice to make https://github.com/libjxl/bench so that it runs all the tests in actions and auto-updates when the various decoder implementations get updated
2025-08-16 10:44:30
I thought about the same last week when someone asked about the current status of decoders. I'll take a look. The instructions for running it locally seem to be alright, so getting it into actions should be doable
2025-08-16 01:26:06
Got the bench repo running in an action https://github.com/Snafuh/bench/actions/runs/17008565099 Seems like all test exit with 1 but the dump files seem to be updated. I have not looked at the code at all so far, just created the action. Artifacts: https://github.com/Snafuh/bench/actions/runs/17008565099/artifacts/3779719050 I think some things could be adjusted in the scripts. It expected the dump files to be already there. So it's a bit hard to judge what was updated and what not. Also the website could get more info. Especially Date of creation, version/commit info
Kupitman
2025-08-19 12:28:54
e9 849 710 | 0.12 849 764 | 0.11 <:Yes:1368822664382119966> e10 621 786 | 0.12 621 171 | 0.11 <:tfw:843857104439607327>
Demiurge
2025-08-23 09:09:23
There is a portable C simd intrinsics lib that can be used to make fpnge portable
2025-08-23 09:11:16
https://simd-everywhere.github.io/blog/2020/06/22/transitioning-to-arm-with-simde.html
veluca
2025-08-23 09:30:04
not a great idea
2025-08-23 09:30:23
if I ever port it, it'll be to highway or the system we'll use for jxl-rs
A homosapien
2025-08-23 09:32:02
What system for SIMD are you planning to use for jxl-rs?
Demiurge
2025-08-23 09:32:44
Rust intrinsics are still in an early, very Rusty state
veluca
2025-08-23 09:33:02
nah, at least not on ARM and x86 πŸ™‚
Demiurge
2025-08-23 09:33:33
I thought they couldn't decide if they actually wanted to have them or not
veluca
A homosapien What system for SIMD are you planning to use for jxl-rs?
2025-08-23 09:34:04
the final one is a bit of a work in progress (see https://github.com/rust-lang/rust/issues/143352) but for now see https://github.com/libjxl/jxl-rs/tree/main/jxl/src/simd -- I'm writing it with a design that will hopefully be very easy to adapt once we have the appropriate language features
Demiurge
2025-08-23 09:38:44
Speaking of libhwy, I heard something similar is part of c++26 language now. Idk if clang supports it.
A homosapien
veluca the final one is a bit of a work in progress (see https://github.com/rust-lang/rust/issues/143352) but for now see https://github.com/libjxl/jxl-rs/tree/main/jxl/src/simd -- I'm writing it with a design that will hopefully be very easy to adapt once we have the appropriate language features
2025-08-30 11:55:40
Do the compiled binaries use the handwritten SIMD code yet? I did some benchmarks and there's little to no change in performance.
veluca
2025-08-30 01:16:08
what images did you benchmark?
2025-08-30 01:16:18
(and on what CPU family?)
2025-08-30 01:16:38
for now most of the improvements are for vardct images
A homosapien
2025-08-30 01:24:01
I'm afk at the moment. I'll post my numbers and compile settings when I get home.
jonnyawsom3
veluca for now most of the improvements are for vardct images
2025-08-31 01:15:38
I was trying an 8K VarDCT image on a ryzen Zen1 CPU, it was giving slower results than an older build
veluca
2025-08-31 02:32:13
Weird
2025-08-31 02:32:24
Can you share the image? (Ideally in an issue)
jonnyawsom3
veluca Can you share the image? (Ideally in an issue)
2025-08-31 02:36:09
I'm tempted to ask if you have a compiled x86 binary we could test, to see if we're just not passing the right compile flags or something
veluca
2025-08-31 02:44:58
You just need the --release flag
A homosapien
2025-08-31 03:07:20
I did use the release flag, however, I also had additional compile flags for speed like `codegen-units=1` and `lto=fat`. Also `-Ctarget-cpu=znver2` might change things.
2025-08-31 03:08:43
I'm out being a tourist in Italy so I'm still afk.
veluca
2025-08-31 03:09:06
(at the risk of going off topic, in Italy where? :D)
A homosapien
2025-08-31 03:14:10
I'm Sicily right now in Naxos on a bus ride to Taorminas
2025-08-31 03:14:28
Tomorrow I'm going to see Mount Edna and then off to the mainland (Naples & Pompeii)
2025-08-31 03:21:29
So far I have visited Syracuse, Ragusa, and Palermo
2025-08-31 03:24:46
I know for a fact I'm going to transcode all the photos I've taken into JXL, The problem is they are ultra HDR. πŸ˜”
Orum
2025-08-31 03:26:42
why is that an issue?
A homosapien
2025-08-31 03:27:42
I'm not sure if cjxl preserves the HDR gainmap part of the image
2025-08-31 03:28:16
Also just viewing the HDR version outside of Android is annoying
Orum
2025-08-31 03:28:25
it *should*, but I haven't tested
veluca
2025-08-31 03:30:10
haven't spent a lot of time in Sicily (was just in Catania for organizing informatics olympiads), but it's a nice place πŸ˜„
jonnyawsom3
Orum it *should*, but I haven't tested
2025-08-31 04:52:54
gainmaps were added to the API in 0.11 but the CLI tools weren't updated to use them
A homosapien
veluca Can you share the image? (Ideally in an issue)
2025-08-31 09:18:57
Idk if my build env was bad or if I was testing on a lossless image by mistake, but I don't see a regression. I redid everything, I nuked my jxl-rs folder and recompiled and retested. The SIMD code seems to work, main is slightly faster (+5-10%) compared to pre-SIMD (d4b5df1) jxl-rs. With and without my additional flags.
2025-09-01 06:35:10
<@238552565619359744> you were the first one to tell me there was regression. I DM'ed you fresh binaries, does it still occur?
Lilli
2025-09-01 09:13:29
I'm using libjxl (0.10.2 in an embedded device, and it performs rather poorly) For effort 3: 8s For effort 5: 56s that sounds like too big of a jump to me
2025-09-01 09:23:20
Is there something I likely did wrong?
2025-09-01 09:27:38
I checked, I'm not swapping
jonnyawsom3
2025-09-01 09:35:12
Effort 3 lossy is basically just a standard JPEG, effort 5 starts using features of JPEG XL like Variable block sizes
Lilli
2025-09-01 09:47:31
Yes, but why is it 7 times slower ? :/
2025-09-01 09:47:49
When on my laptop it's only about 2-3times slower
Mine18
2025-09-01 09:50:11
threading, instructions, cache, stuffff
_wb_
2025-09-01 11:22:06
is this lossy or lossless?
Lilli
2025-09-01 11:59:37
lossy
_wb_
2025-09-01 12:54:34
how many threads?
veluca
2025-09-01 01:17:29
I imagine 1 b/c embedded
jonnyawsom3
A homosapien <@238552565619359744> you were the first one to tell me there was regression. I DM'ed you fresh binaries, does it still occur?
2025-09-01 01:23:47
Still getting around a 10% regression on every VarDCT image I try Pre-SIMD ``` Wall time: 0 days, 00:00:08.770 (8.77 seconds) User time: 0 days, 00:00:02.203 (2.20 seconds) Kernel time: 0 days, 00:00:05.921 (5.92 seconds)``` Post-SIMD ``` Wall time: 0 days, 00:00:09.291 (9.29 seconds) User time: 0 days, 00:00:02.031 (2.03 seconds) Kernel time: 0 days, 00:00:06.625 (6.62 seconds)```
_wb_
2025-09-01 01:28:36
interesting, usr time goes down but sys time goes up
Lilli
_wb_ how many threads?
2025-09-01 02:04:33
2 threads before, I just tried with 8, just about 10s difference using the `JxlResizeableParallelRunner` 8->7.3 55->45 And I'm not sure I can do 8 while in production tbh
_wb_
2025-09-01 02:47:03
It would be interesting to check why e5 is so much slower than e3 on that device. It's of course expected that there is some gap, but indeed not that large...
Lilli
2025-09-01 02:59:36
yes, my thoughts exactly, the device is ARM based, it's the compute module (CM4) of a raspberry pi 4
2025-09-01 03:00:05
(and it doesn't have 8 threads, just 4, but I tried a bunch of things anyway)
_wb_
2025-09-01 03:27:53
actually I kind of get the same big gap on my macbook β€” which is also ARM based but substantially beefier than a raspberry pi πŸ™‚
2025-09-01 03:27:55
``` 001.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm SSIMULACRA2 PSNR pnorm BPP*pnorm QABPP Bugs ---------------------------------------------------------------------------------------------------------------------------------------- jxl:3 1084 188913 1.3934206 33.951 59.844 1.67419956 85.39267932 43.50 0.66288994 0.923684510472 2.333 0 jxl:4 1084 191619 1.4133800 33.915 58.583 1.67329057 85.76321222 43.53 0.66185216 0.935448636824 2.365 0 jxl:5 1084 184631 1.3618366 4.604 48.690 1.63430530 85.04619567 43.27 0.68055976 0.926811204959 2.226 0 jxl:6 1084 183920 1.3565923 2.685 54.254 1.62873990 84.91382279 43.21 0.68212848 0.925370233425 2.210 0 jxl:7 1084 184129 1.3581339 2.435 53.850 1.83446444 84.93351387 43.20 0.68402386 0.928995971028 2.491 0 Aggregate: 1084 186617 1.3764854 8.090 54.900 1.68738647 85.20988478 43.34 0.67421937 0.928053147031 2.323 0 ```
2025-09-01 03:28:58
^ this is homebrew libjxl 0.11
2025-09-01 03:29:24
``` 001.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm SSIMULACRA2 PSNR pnorm BPP*pnorm QABPP Bugs ---------------------------------------------------------------------------------------------------------------------------------------- jxl:3 1084 188875 1.3931403 37.762 63.803 1.67419956 85.39267932 43.50 0.66288994 0.923498710599 2.332 0 jxl:4 1084 191586 1.4131366 36.777 60.061 1.67329057 85.76321222 43.53 0.66185216 0.935287536907 2.365 0 jxl:5 1084 184496 1.3608409 6.501 50.113 1.63113330 85.33134233 43.47 0.66017514 0.898393306174 2.220 0 jxl:6 1084 184952 1.3642043 3.745 56.992 1.62598521 85.27235686 43.44 0.65957970 0.899801466869 2.218 0 jxl:7 1084 185176 1.3658565 3.326 56.406 1.62600045 85.28563934 43.44 0.66006019 0.901547521492 2.221 0 Aggregate: 1084 186997 1.3792855 10.238 57.294 1.64596675 85.40904601 43.48 0.66091024 0.911583915676 2.270 0 ``` ^ current git libjxl
2025-09-01 03:30:58
basically e3 and e4 are pretty much the same speed, e5 is 7x slower
Lilli
2025-09-02 08:04:41
Okay, so it's a matter of differing optimizations then?
2025-09-02 08:04:55
Thanks a lot for checking that out by the way ! πŸ™‚
_wb_
2025-09-02 08:34:08
the gap between e4 and e5 is a bit too large imo; also e3 is not really much faster than e4. Perhaps we should rename e3 to e2, e4 to e3, and make the new e4 do something between current e4 and e5 at a speed roughly halfway between the two.
Lilli
2025-09-02 09:32:15
Yes that would be ideal ! The jump in quality is also quite noticeable, so an intermediate would also make sense.
afed
2025-09-02 09:33:33
but e2 is useful and also noticeably faster than e3
2025-09-02 09:38:17
perhaps it would be better to just replace e3 with e4 and make a new e4 because the current e1, e2, and e4 have a noticeable difference speed and compression and usefulness for their use cases
jonnyawsom3
afed perhaps it would be better to just replace e3 with e4 and make a new e4 because the current e1, e2, and e4 have a noticeable difference speed and compression and usefulness for their use cases
2025-09-02 10:18:31
They shouldn't though. e1 and e2 are identical, e3 and e4 only do ANS and Coefficient reordering <https://github.com/libjxl/libjxl/blob/main/doc/encode_effort.md>
2025-09-02 10:19:16
Then e5 does VarDCT, AQ, Gabor and CFL
2025-09-02 10:23:30
It used to say e4 had simple VarDCT and AQ, but I checked the code and it was wrong, so I updated the docs a while back. Maybe it should have been implemented instead
afed
2025-09-02 10:30:58
ah, I thought it also meant for lossless, if only for lossy, then yeah
Lilli
It used to say e4 had simple VarDCT and AQ, but I checked the code and it was wrong, so I updated the docs a while back. Maybe it should have been implemented instead
2025-09-02 10:32:51
I would like only VarDCT and AQ πŸ˜„
jonnyawsom3
2025-09-02 10:33:53
You could try manually disabling the Gabor and CFL flags, but I think most of the time is from VarDCT
Lilli
2025-09-02 10:56:46
I see, maybe that is not worth it then
2025-09-02 11:01:05
I find it strange that when doing lossless I obtain these values, so, setting the distance to zero: ``` Input size: 300MB |effort|duration (s)|size (MB)| |------|------------|---------| | 0 | 86.9 | 62 | | 1 | 12.3 | 116 | | 2 | 12.3 | 116 | | 3 | 14 | 114 | | 5 | 48.9 | 134 | ``` I suppose that means I'm also not setting a few other things that I should ? (these timings are on the CM4)
jonnyawsom3
Lilli I find it strange that when doing lossless I obtain these values, so, setting the distance to zero: ``` Input size: 300MB |effort|duration (s)|size (MB)| |------|------------|---------| | 0 | 86.9 | 62 | | 1 | 12.3 | 116 | | 2 | 12.3 | 116 | | 3 | 14 | 114 | | 5 | 48.9 | 134 | ``` I suppose that means I'm also not setting a few other things that I should ? (these timings are on the CM4)
2025-09-02 11:04:37
What bitdepth is your image data? Effort 1 has a specialised fast encoder, but will fall back to effort 2 if above 16bit or something else is incompatible
Lilli
2025-09-02 11:04:52
16bits yep, float data
2025-09-02 11:05:29
I mean, it's 16 bits uint16, and I also use float sometimes
2025-09-02 11:05:43
these tests are 16bits uint
jonnyawsom3
2025-09-02 11:08:38
Hmmm, odd
Lilli
2025-09-02 11:11:03
I did not set `JxlEncoderSetFrameLossless` only the distance to 0
2025-09-02 11:11:11
So I guess modular isn't activated? I added more results to the table
jonnyawsom3
2025-09-02 11:33:35
Ah right, I assume you're on v0.11? IIRC we made it error now if you set distance to 0 without actually setting it as lossless in the API
Lilli
2025-09-02 11:43:05
0.10.2 on the embedded device, 0.11.1 on my laptop
2025-09-02 11:49:18
I just tried on my laptop enabling `JxlEncoderSetFrameLossless` when using distance 0 and the results are identical in terms of filesize (I don't have the results on the embedded)
2025-09-02 11:56:05
I first set `JxlEncoderSetFrameLossless` and then the distance to 0, all in linear sRGB
_wb_
2025-09-02 04:03:33
effort 0 is not a valid effort value iirc
2025-09-02 04:05:27
effort 4-5 lossless being larger than e3 can happen on some images. Especially for photographic images, e3 is quite good and higher effort doesn't necessarily beat it.
2025-09-02 04:06:28
but you should try the current git head version because things might be different there, some bugs have been fixed and encoder choices improved a bit
Kupitman
_wb_ effort 4-5 lossless being larger than e3 can happen on some images. Especially for photographic images, e3 is quite good and higher effort doesn't necessarily beat it.
2025-09-02 09:51:31
can you fix that?πŸ‘‰ πŸ‘ˆ
monad
2025-09-03 06:21:43
e3 lossless is also inefficient for images characteristically different than photos. for varying content, e4 is much safer
Kupitman can you fix that?πŸ‘‰ πŸ‘ˆ
2025-09-03 06:33:10
generally, higher efforts are more dense, despite exceptional individual cases
2025-09-03 06:42:45
117 photo/film ``` Mpx/s real (mean) B bpp (mean) mins unique mins best of 84969245 6.2769264 94.02% 94.02% 4.1837 cjxl_0.11.0_d0e4num_threads0 86605683 6.4035571 5.98% 5.98% 10.032 cjxl_0.11.0_d0e3num_threads0```
_wb_
2025-09-03 07:44:21
it is hard to give guarantees that higher effort means better compression on a per-image basis, the only way to do that is to brute force it (try all lower efforts too) but that would come at a substantial cost
Lilli
_wb_ effort 0 is not a valid effort value iirc
2025-09-03 09:37:04
yes, but it still does something somehow haha they are non-perceptual raw 16bits linear near-dark images, with intensity target 64000 (but I guess intensity target doesn't do anything in lossless) So you're saying e4 should be better? I'll have to test that then. You're saying it's better regarding compression ratio only or speed ? My metric is time spent vs final size (for a threshold quality, in theory, but we're talking about lossless anyway). So I need the best compromise in terms of compression ratio and time.
jonnyawsom3
monad 117 photo/film ``` Mpx/s real (mean) B bpp (mean) mins unique mins best of 84969245 6.2769264 94.02% 94.02% 4.1837 cjxl_0.11.0_d0e4num_threads0 86605683 6.4035571 5.98% 5.98% 10.032 cjxl_0.11.0_d0e3num_threads0```
2025-09-03 10:49:50
By the way, do you have a script or something you use for your corpus tests? Me and some friends were thinking of finding the optimal parameters for our use case
monad
2025-09-03 05:24:44
yes, a beautiful monstrosity
jonnyawsom3
monad yes, a beautiful monstrosity
2025-09-04 12:47:08
Don't suppose you could share it?
monad
2025-09-04 12:51:27
sure. it has some idiosyncrasies and arcana. and it's linuxy. but I can write some explanation
2025-09-04 08:04:31
<@238552565619359744> I documented the interface, but made no attempt to clarify the code. Oh, and I forgot to mention it assumes cjxl prints size in bytes, which is non-standard.
Exorcist
2025-09-15 06:01:19
https://halide.cx/blog/consistency/
2025-09-15 06:02:38
Since JXL author is also SSIMULACRA2 author, why JXL is worst consistency?
jonnyawsom3
2025-09-15 07:17:02
Could be because of the PRs that caused a regression in SSIMULACRA2 scores
afed
2025-09-15 07:34:35
and also because it is still a small dataset, on a larger and more diverse one it may be very different and because jxl has zero tuning for ssimulacra2 (even though it has the same authors, but when the race for metric numbers begins, these metrics and such results can no longer be considered very accurate and truthful) in some pr, even on the opposite, ssimulacra2 scores got worse though, jpegli has some tuning for ssimulacra2, mainly I think because there was no time for careful visual tuning (and tuning for metrics is at least better than nothing)
jonnyawsom3
2025-09-15 08:17:34
Both use butter internally
afed
2025-09-15 08:23:23
yeah, but jpegli had some tuning for ssimulacra2, even though it's not used internally <https://github.com/libjxl/libjxl/pull/2646>
2025-09-15 08:25:16
btw, there were also some old benchmarks, and probably some from Jon as well, where according to the metrics, the best in terms of "consistency" was the standard libjpeg, and libjxl was the worst or one of the worst, if I'm not mistaken but this is not the case in practice, libjxl is quite consistent in terms of visual quality, at least in photos and is also actually the only encoder where the quality settings use not just a quantizer, but one that is closely integrated with butteraugli so for other encoders, Q means almost nothing, because it is just a quantizer, not some metric or, even more so, a real quality indicator, and the actual quality is highly variable depending on the content with the same Q with any encoder, it's possible to use encoding based on metrics, but then the encoder has to make many encodings until it reaches a value close to this metric, so the encoding time is multiplied by the number of tries and also, without deep integration, it only works on the whole image (not just on the needed parts or at least blocks)
_wb_
2025-09-15 01:26:25
libjxl is not tuned for or optimizing for ssimu2, in fact ssimu2 was only created after most of the current libjxl encoder was already there.
2025-09-15 01:30:35
it would be interesting in that blog post to see the libaom numbers without tune iq πŸ™‚
gb82
2025-09-15 04:35:26
yeah I can look into a follow-up
2025-09-15 04:36:13
also I think libjxl having less consistent performance isn't the end of the world – like I mentioned at the end of the blog post, efficiency and speed are still the most important, and libjxl at e7 is really fast and efficient
afed btw, there were also some old benchmarks, and probably some from Jon as well, where according to the metrics, the best in terms of "consistency" was the standard libjpeg, and libjxl was the worst or one of the worst, if I'm not mistaken but this is not the case in practice, libjxl is quite consistent in terms of visual quality, at least in photos and is also actually the only encoder where the quality settings use not just a quantizer, but one that is closely integrated with butteraugli so for other encoders, Q means almost nothing, because it is just a quantizer, not some metric or, even more so, a real quality indicator, and the actual quality is highly variable depending on the content with the same Q with any encoder, it's possible to use encoding based on metrics, but then the encoder has to make many encodings until it reaches a value close to this metric, so the encoding time is multiplied by the number of tries and also, without deep integration, it only works on the whole image (not just on the needed parts or at least blocks)
2025-09-15 04:39:47
> Q means almost nothing, because it is just a quantizer, not some metric or, even more so, a real quality indicator, and the actual quality is highly variable depending on the content with the same Q if you read the blog post, this is exactly what I tested...
afed
2025-09-15 05:04:02
just briefly i mean for actual use other encoders don't even have anything but quantizer for quality settings, libjxl is basically the only one that uses metrics as a basis, at least from open source and without additional external tools then the dataset is too small or very uniform for such tests, like for any flat, charts, not very photographic images, libjpeg and many other encoders completely fall apart in keeping consistency, like for some images Q60 is very good quality, for some Q90 is not good enough, when as for the most modern metrics scores even for VMAF there is a much more confident understanding of what this value will give in quality, when as for quantization it's a large randomness (if not taken in advance inflated values and ignore overshoots, when most images will look good)
gb82
2025-09-15 05:13:48
but when you say "the actual quality is highly variable depending on the content with the same Q" you realize this can be true for distance as well, right?
afed
2025-09-15 05:20:54
yeah, I mean butteraugli isn't perfect either, but simple quantizer is much less consistent in quality and I noticed this quite well in practice with jpeg (and all jpeg encoders that existed since then), when as libjxl was much more consistent and which I also liked when jxl was introduced
gb82
2025-09-15 05:21:25
> simple quantizer is much less consistent in quality but if you *read the blog post* you'll see exactly how consistent each encoder's quality scale really is!
2025-09-15 05:22:09
and yes, we're using SSIMU2 with a photographic dataset, but that's really the best we can do for a proof of concept. and besides, subset1 is reputable & used for a lot of testing anyway
afed
2025-09-15 05:24:05
popularity for datasets is rather a disadvantage, as I've said before, because most encoders are hard tuned for known datasets
gb82
2025-09-15 05:25:19
I can tell you Iris is not, and I don't believe libaom is either. it seems reductive to handwave all of this because of some theoretical cope for why JXL is secretly the best
2025-09-15 05:26:10
it is very much possible that an encoder's Q-scale can be more consistent than JXL's distance scale. in fact ... all of them are!
juliobbv
2025-09-15 05:32:06
the other thing is that ssimu2 starts losing target quality properties at scores of 65 and lower
2025-09-15 05:34:14
so having a high stddev for ssimu2 <65 average scores might actually be the desirable thing for actual perceived consistency
afed
2025-09-15 05:36:07
for some modern encoders, let's say it's possible, but as I said about libjpeg, I strongly do not confirm this in practice, at least for a variety of content, libjxl is much, much more consistent and I've used it, like, probably on so many images, like comparable to the amount that goes through a small cdn though I haven't used low quality, but libjxl is not really designed and tuned for lower quality from the beginning, and neither is butteraugli and even ssimulacra2
juliobbv
2025-09-15 05:36:27
so IMO this is the most informative range of the graph
_wb_ it would be interesting in that blog post to see the libaom numbers without tune iq πŸ™‚
2025-09-15 05:50:29
btw, libaom's tune iq in 3.13 is no longer optimized for just ssimu2, that'd be `tune=ssimulacra2`
gb82
afed for some modern encoders, let's say it's possible, but as I said about libjpeg, I strongly do not confirm this in practice, at least for a variety of content, libjxl is much, much more consistent and I've used it, like, probably on so many images, like comparable to the amount that goes through a small cdn though I haven't used low quality, but libjxl is not really designed and tuned for lower quality from the beginning, and neither is butteraugli and even ssimulacra2
2025-09-15 05:51:18
You need some sort of numbers to back up your argument
afed
2025-09-15 05:51:23
also ssimulacra2 probably isn't the best for consistency comparisons, and for libjxl it's more like comparing butteraugli 3-norm and ssimulacra2 than anything else though MOS comparisons are also difficult for that purpose, especially for overshoots
gb82
2025-09-15 05:51:42
β€œJXL is more consistent because it feels like it” is not an argument I can refute
afed also ssimulacra2 probably isn't the best for consistency comparisons, and for libjxl it's more like comparing butteraugli 3-norm and ssimulacra2 than anything else though MOS comparisons are also difficult for that purpose, especially for overshoots
2025-09-15 05:52:23
It is a good visual quality metric, it’s not like there’s anything better I could have used
2025-09-15 05:52:41
Yes, MOS is the best, but none of us can measure that with reasonable accuracy
2025-09-15 05:53:34
Also, numbers for a specific metric can generalize because it means the tools are available for the purposes of improving other metrics or VQ, as long as these tools aren’t purposefully designed to overfit for a metric
afed
gb82 β€œJXL is more consistent because it feels like it” is not an argument I can refute
2025-09-15 06:01:47
yeah I know, but for good benchmarks I don't have the time, and the bad and smaller ones don't make sense also there were similar benchmarks even from Jon and many others similar even long before jxl existed also in this jxl is also worse on metrics on consistency and libjpeg is beating all other encoders (if i understand the graphs correctly) <https://jon-cld.s3.amazonaws.com/test/index.html>
monad
2025-09-16 02:20:00
Does Iris actually exist, or is it non-free? I am also wondering how it performs on non-photo where typical WebP implodes.
jonnyawsom3
2025-09-16 07:58:01
Continuing from https://discord.com/channels/794206087879852103/804324493420920833/1417419139655663686 Ignore the weird formatting, I shoved my console output into GPT to save time
ignaloidas
2025-09-16 10:53:45
FWIW I don't think you're really measuring consistency by measuring how perceptual quality metrics change across some encoding quality setting - given an edge case with an all-solid color picture, to keep it "consistent" by such metric basically any encoder would have to degrade the quality of the image somehow. Some images will compress better than others, and I don't see any realistic need to enforce that the degradation of the quality is the same across a wide range of images.
2025-09-16 10:56:56
What you'd really want is some setting where you essentially say "I don't want my images to dip bellow this quality" - which now of course brings the problem that you're explicitly optimizing for some specific metric, which can be gamed
2025-09-16 10:58:45
But I think just looking at stdev can just end up penalizing overperforming on easier images
afed
2025-09-16 11:05:23
not dropping below a certain quality is most important, but overshooting quality higher than necessary also matters, because it's a waste of traffic, bandwidth, page loads, costs, etc, especially for large services although this is not so important for personal use, where some extra quality is not a bad thing
ignaloidas
2025-09-16 11:09:46
Sure, but you don't always need that much extra bits to hit a high quality. I guess the ideal encoder would have 2 settings - bpp_max and quality_min
2025-09-16 11:10:24
and then maybe some prioritization between the two targets (and encode speed)
afed
2025-09-16 11:28:05
libjxl already has close to the perfect setting for quality but the thing is that there are no perfect metrics yet, especially after very dense use of metrics (although I quite like the way butteraugli is used in libjxl and I'm not sure if replacing it with something else would be better for the same purpose, though tuning and encoder improvements are still needed) I can say that there are no absolutely good ones, there are some that are worse, some that are better for certain content, they can be as some helper and for some process automation but any, even very advanced and best metrics often miss on many things and many people are overly focused on increasing metrics scores, though there is no other way to show any improvement or difference without personal subjectivity
jonnyawsom3
afed not dropping below a certain quality is most important, but overshooting quality higher than necessary also matters, because it's a waste of traffic, bandwidth, page loads, costs, etc, especially for large services although this is not so important for personal use, where some extra quality is not a bad thing
2025-09-16 11:33:22
When tweaking resampling for cjxl, I was having issues with over and undershooting. I ended up erring on the side of caution, and going for slightly higher bpp to make sure images hit a minimum level of quality
_wb_
ignaloidas But I think just looking at stdev can just end up penalizing overperforming on easier images
2025-09-16 11:46:00
Yes, I agree looking at stdev is not quite right. I think what matters in practice is the spread between p1 worst and p50 (median) quality. The median quality is what you expect to get (e.g. if you encode a few images with a setting, chances are you'll see something close to median behavior), while the p1 worst-case (or some other percentile) is what can happen and how bad it can get.
2025-09-16 11:47:09
if some images are better than the median, that's usually OK. But if 1% of images look like crap, it's a problem.
afed
2025-09-16 12:05:05
yep, and for libjpeg I have pretty often noticed that some images look really bad, ringing artifacts, banding, with the same Q when most other images look good so even had to use some additional metrics based encoding to avoid this but with libjxl I hardly ever encountered anything like that, it is much safer to use the same quality settings (at least if they are not very low) but such and similar benchmarks show that libjpeg is almost the best in quality consistency, which is not entirely true, at least if there is enough content variety
Mine18
monad Does Iris actually exist, or is it non-free? I am also wondering how it performs on non-photo where typical WebP implodes.
2025-09-16 12:11:13
you would have to ask <@237665944942411777> about that
Trix
2025-09-16 12:12:29
I think you mean <@703028154431832094>, he's the Iris dev, I don't have access to it
jonnyawsom3
2025-09-16 12:13:31
He said before that it's closed source and he was looking into whether to allow public encoding or not
username
2025-09-16 12:18:08
https://discord.com/channels/794206087879852103/805176455658733570/1392294193690316923 > for now I'm looking into licensing, but I'd strongly consider making this open-source because I love open source. I just need to be able to support myself while working on this
monad
2025-09-16 12:24:52
I hope it makes money. I will have to forget about it and stick with JPEG-LI.
Exorcist
afed yep, and for libjpeg I have pretty often noticed that some images look really bad, ringing artifacts, banding, with the same Q when most other images look good so even had to use some additional metrics based encoding to avoid this but with libjxl I hardly ever encountered anything like that, it is much safer to use the same quality settings (at least if they are not very low) but such and similar benchmarks show that libjpeg is almost the best in quality consistency, which is not entirely true, at least if there is enough content variety
2025-09-16 12:25:03
use victorvde/jpeg2png when see the JPEG artifacts
afed
2025-09-16 12:29:26
yeah, but this filtering is also quite strong and destructive in my opinion, I generally try to avoid filtering where possible and I'm also not about how to improve images after compression, but how to optimally compress and have images in jpeg (and other) formats when I already have high quality and lossless sources
username
Exorcist use victorvde/jpeg2png when see the JPEG artifacts
2025-09-16 12:47:08
you should try this fork of jpeg2png as it gives better results: https://discord.com/channels/794206087879852103/794206087879852107/1372897818607616011
Exorcist
2025-09-16 12:49:12
The very common side-by-side comparison show the JPEG are worst (so many blocking & banding) Actually, JPEG is not so bad
gb82
_wb_ Yes, I agree looking at stdev is not quite right. I think what matters in practice is the spread between p1 worst and p50 (median) quality. The median quality is what you expect to get (e.g. if you encode a few images with a setting, chances are you'll see something close to median behavior), while the p1 worst-case (or some other percentile) is what can happen and how bad it can get.
2025-09-16 12:56:34
I disagree that overshooting doesn’t matter; this basically ignores overshooting
2025-09-16 12:57:29
> A measure of how spread out data values are around the mean This is a relevant definition for a TQ loop, which is why it is worth measuring imo. I tried to approach this post with the perspective of someone setting up a TQ loop, not an individual user using an encoder
afed
Exorcist The very common side-by-side comparison show the JPEG are worst (so many blocking & banding) Actually, JPEG is not so bad
2025-09-16 12:59:32
yeah, actually if use filtering, it's easier to remove some artifacts than to restore missing details (which is mostly impossible), which is done by codecs that tend to prefer more smoothing to avoid any artifacts
ignaloidas
gb82 I disagree that overshooting doesn’t matter; this basically ignores overshooting
2025-09-16 01:08:37
so what an encoder should do when an image compresses really well, make it worse or something? Especially with artificial images, there will be a bunch of images that compress unusually well without dropping much in quality - there is no reason to penalize the encoder for managing to keep the quality high *as long as the size doesn't grow as well*
_wb_
gb82 I disagree that overshooting doesn’t matter; this basically ignores overshooting
2025-09-16 01:36:11
Overshooting does matter. But the issue is not that the quality is too high β€” the issue is bytes may be wasted. If you can do lossless in fewer bytes than "consistent quality" lossy, it's always a win. Basically what counts for quality is the worst case. And what counts for file size is the total (or the average).
ignaloidas
_wb_ Overshooting does matter. But the issue is not that the quality is too high β€” the issue is bytes may be wasted. If you can do lossless in fewer bytes than "consistent quality" lossy, it's always a win. Basically what counts for quality is the worst case. And what counts for file size is the total (or the average).
2025-09-16 01:41:33
I guess to quantify how bad overshooting is you could check how the average bpp of overshooting images differs from the overall average/median bpp
_wb_
2025-09-16 01:45:53
I suppose. If you look at average bpp over a corpus, you're including overshooting. I think average bpp / p5 quality is the plot that makes the most sense, if you have to aggregate bitrate-distortion plots of a corpus of images.
gb82
ignaloidas so what an encoder should do when an image compresses really well, make it worse or something? Especially with artificial images, there will be a bunch of images that compress unusually well without dropping much in quality - there is no reason to penalize the encoder for managing to keep the quality high *as long as the size doesn't grow as well*
2025-09-16 02:33:33
the dataset I chose intentionally controls for this – daala subset1 4:4:4 is pretty uniform in terms of image complexity. I agree with your point otherwise – including non-photographic image content would make this test useless, for example.
_wb_ Overshooting does matter. But the issue is not that the quality is too high β€” the issue is bytes may be wasted. If you can do lossless in fewer bytes than "consistent quality" lossy, it's always a win. Basically what counts for quality is the worst case. And what counts for file size is the total (or the average).
2025-09-16 02:34:11
but in this case, looking at the range I chose on the dataset I picked, there is *no chance* of lossless being better here
2025-09-16 02:34:35
if I was testing the effectiveness of my target quality loop, I'd use exactly the metric I proposed; I don't think 1% lows mean anything in that context
A homosapien
gb82 the dataset I chose intentionally controls for this – daala subset1 4:4:4 is pretty uniform in terms of image complexity. I agree with your point otherwise – including non-photographic image content would make this test useless, for example.
2025-09-16 02:38:26
The Daala image set linked on the website is not 4:4:4, it's nearest neighbor upscaled 4:2:0. Unless that's the wrong [link.](https://github.com/WyohKnott/image-formats-comparison/tree/gh-pages/comparisonfiles/subset1/Original)
ignaloidas
gb82 the dataset I chose intentionally controls for this – daala subset1 4:4:4 is pretty uniform in terms of image complexity. I agree with your point otherwise – including non-photographic image content would make this test useless, for example.
2025-09-16 02:39:11
I mean, even with photographic images of similar complexity, the subjects matter. Right now it's not used, but JXL does have a way to encode splines, which can help some images a bunch, but not all of them, and if spline encoding was added this would lead to some images overshooting in quality without it mattering in some subset of images.
gb82
ignaloidas I mean, even with photographic images of similar complexity, the subjects matter. Right now it's not used, but JXL does have a way to encode splines, which can help some images a bunch, but not all of them, and if spline encoding was added this would lead to some images overshooting in quality without it mattering in some subset of images.
2025-09-16 02:40:18
not necessarily, if using splines was controlled for internally
2025-09-16 02:40:51
it is flawed reasoning to consider that in order to be more efficient, an encoder must be less consistent. it can very much be both with smart internal heuristics
ignaloidas
2025-09-16 02:41:48
You could control for it internally, but I don't think dropping quality in one place if you can get free wins on it in another is a good choice
gb82
2025-09-16 02:42:20
that's not how that works
ignaloidas
2025-09-16 02:44:33
why not? If you're using splines, you'll get higher quality, so if you want to keep it from overshooting on quality, you'll drop the quality somewhere else
gb82
2025-09-16 02:44:43
> how closely an image encoder's user-configurable quality index matches a perceptual quality index. this is the entire point of having a quality slider
ignaloidas
gb82 > how closely an image encoder's user-configurable quality index matches a perceptual quality index. this is the entire point of having a quality slider
2025-09-16 02:45:25
I use it not as a "be as close to this as possible", but "be at least this", and I believe most other users do as well
gb82
ignaloidas why not? If you're using splines, you'll get higher quality, so if you want to keep it from overshooting on quality, you'll drop the quality somewhere else
2025-09-16 02:45:56
this is like saying "adding bigger blocks makes it cheaper to encode parts of the image, so we must drop the quality in other parts to achieve consistency"
ignaloidas I use it not as a "be as close to this as possible", but "be at least this", and I believe most other users do as well
2025-09-16 02:46:19
individual users, sure; I already addressed this. this is also not Iris's target audience and never will be
A homosapien The Daala image set linked on the website is not 4:4:4, it's nearest neighbor upscaled 4:2:0. Unless that's the wrong [link.](https://github.com/WyohKnott/image-formats-comparison/tree/gh-pages/comparisonfiles/subset1/Original)
2025-09-16 02:48:46
oops; do u have the 444 link somewhere?
ignaloidas
gb82 individual users, sure; I already addressed this. this is also not Iris's target audience and never will be
2025-09-16 02:53:21
I don't know what the target audience is then - I don't think it's smart for CDNs to be chasing quality targets instead of bpp/size targets with acceptable quality. Nor does the page with the comparison specify who the target audience is.
gb82
2025-09-16 02:54:34
The page doesn’t need to, the user just decides whether it’s relevant to them. Also, why in the world would a CDN chase a size target? That’s silly
2025-09-16 02:55:00
You want minimum quality to maximize user engagement, which ends up giving you the minimum size
ignaloidas
2025-09-16 03:01:37
Overshoot only matters if when compressing the same image with a smaller target you'll actually get a meaningful improvement in compression - which is not a given, especially in images that are easy to compress. Since the testing doesn't test the "chasing" behavior on said images, I don't think it's correct to make assumptions that all of the overshoots are bad, or are entirely undesirable.
2025-09-16 03:02:32
FWIW I myself wouldn't want to use a CDN that goes out of it's way to drop the quality for a minimum improvement.
_wb_
2025-09-16 03:02:35
There's always a quality target, not a size target. Some images are basically mostly some background (solid or simple gradient), while other images are high-entropy all over. Fixed bpp targets would lead to horrible variation in quality.
gb82
ignaloidas Overshoot only matters if when compressing the same image with a smaller target you'll actually get a meaningful improvement in compression - which is not a given, especially in images that are easy to compress. Since the testing doesn't test the "chasing" behavior on said images, I don't think it's correct to make assumptions that all of the overshoots are bad, or are entirely undesirable.
2025-09-16 03:03:03
Your argument here is that consistency doesn’t matter in pursuit of efficiency, which I completely agree with; I should make that very clear if those blog hasn’t already
2025-09-16 03:03:38
You’re totally right to say that if you overshoot with a more efficient encoder, that is still a better encoder
_wb_
2025-09-16 03:04:25
And the quality target is always a minimum acceptable quality. Doing better is no problem (unless of course it comes at a big cost in filesize while a much smaller file is still acceptable), doing worse is a problem.
Quackdoc
2025-09-16 03:05:27
for a CDN IMO consistency is extremely important if the set target quality is too low, Variation in image quality can be a very big issue for things like image galleries or sets
gb82
2025-09-16 03:06:01
What I’ve seen is that if you’re doing better and it costs 10% more bits, it is probably worth it to save on those 10% if you know that your quality doesn’t need to be higher than a given baseline
2025-09-16 03:06:31
<@794205442175402004> what is considered a big cost where you’d wanna scale down?
_wb_
2025-09-16 03:06:40
There is something asymmetric: undershooting makes an image look bad, overshooting makes it load a bit slower than it should. When in doubt, nearly everyone will prefer overshooting to undershooting, at least when it's their images and reputation that is at stake.
gb82
2025-09-16 03:06:52
That makes complete sense
2025-09-16 03:08:07
So with a consistent encoder with no TQ loop, I’d guess you’d pick the Q that has its lower bound at the point you’re looking for?
2025-09-16 03:08:51
eg 85 +/- 5 is useful for tq80?
_wb_
2025-09-16 03:11:52
Costs are measured in total terabytes of bandwidth/storage so usually people don't complain about a single image being too large (unless it's something like a 1 MB image where a 50 kb one would look just as good), but they do want the total to be as low as possible, of course. But they do complain about a single image if it looks like crap. Especially marketeers: they usually don't notice large files (unless it affects SEO metrics), but they will notice artifacts, especially in things like brand logos or hero images of a landing page.
Mine18
Trix I think you mean <@703028154431832094>, he's the Iris dev, I don't have access to it
2025-09-16 03:15:16
oops, my bad i kinda confuse you both as the same person πŸ˜…
afed
2025-09-16 03:17:38
also for overshooting it's not really useless data, it still improves quality and yeah, it's totally incomparable in importance to undershooting if the image looks worse than required or even really bad
gb82
_wb_ Costs are measured in total terabytes of bandwidth/storage so usually people don't complain about a single image being too large (unless it's something like a 1 MB image where a 50 kb one would look just as good), but they do want the total to be as low as possible, of course. But they do complain about a single image if it looks like crap. Especially marketeers: they usually don't notice large files (unless it affects SEO metrics), but they will notice artifacts, especially in things like brand logos or hero images of a landing page.
2025-09-16 03:18:35
Gotcha, makes sense – I have talked to some companies that appear to care a lot more about size though, considering they run their own CDNs that are large enough to matter for size. I guess in Cloudinary’s case where customers pay per byte on some plans, it is on the customer to notice as far as I understand
Mine18 oops, my bad i kinda confuse you both as the same person πŸ˜…
2025-09-16 03:19:07
It is an honor to be confused with Trix <:BlobYay:806132268186861619>
afed
afed also for overshooting it's not really useless data, it still improves quality and yeah, it's totally incomparable in importance to undershooting if the image looks worse than required or even really bad
2025-09-16 03:29:03
and for TQ, libjxl basically already uses TQ for quality/distance internally for slower efforts (and even more advanced, not for the whole image) when other encoders don't, at least not in this way and not such complex metrics also a separate TQ is needed for those who don't like butteraugli results, but it's basically a double TQ with double the encoding time than needed (although butteraugli is somewhat simplified and also uses some heuristics as far as I know, but still, for slower efforts it's close to the real butteraugli)
spider-mario
2025-09-16 03:38:29
what is TQ?
afed
2025-09-16 03:42:07
target quality
_wb_
gb82 Gotcha, makes sense – I have talked to some companies that appear to care a lot more about size though, considering they run their own CDNs that are large enough to matter for size. I guess in Cloudinary’s case where customers pay per byte on some plans, it is on the customer to notice as far as I understand
2025-09-16 04:23:44
Sure, size does matter, and it does happen regularly that a customer asks for a size target approach ("I have this byte budget"). But so far, every time customers said they had a per-image byte budget, it turned out to not really actually be the case that they wanted to target a particular size rather than just getting the lowest possible overall size (in terms of overall traffic, not per-image) while getting a particular minimum acceptable quality (which varies per customer, for some the minimum acceptable quality is quite low while for others it is very high).
gb82
2025-09-16 04:35:45
that's kind of what im saying, so that makes sense. when I say "care a lot about size" I just mean they care that they aren't wasting bytes on images that look "too good"
jonnyawsom3
2025-09-17 08:46:58
<@179701849576833024> I don't suppose you have an old build of fjxl/fast_lossless anywhere? I was reading some old benchmarks and they show singlethreaded speeds 4-12x faster than what I'm getting with `cjxl -d 0 -e 1 --num_threads 0`. Trying to figure out if there was a major regression somewhere or if I'm reading the wrong data
veluca
2025-09-17 08:47:55
This is the wrong month to ask me 🀣
jonnyawsom3
2025-09-17 08:49:12
Yeah, I can imagine it's pretty hectic πŸ˜…
2025-09-17 09:08:30
We'll do a bit more investigating ourselves
_wb_
<@179701849576833024> I don't suppose you have an old build of fjxl/fast_lossless anywhere? I was reading some old benchmarks and they show singlethreaded speeds 4-12x faster than what I'm getting with `cjxl -d 0 -e 1 --num_threads 0`. Trying to figure out if there was a major regression somewhere or if I'm reading the wrong data
2025-09-17 09:24:14
e1 can fall back to e2 if the input is not in one of the pixel formats supported by e1, maybe that's what's going on? Try using ppm input...
jonnyawsom3
_wb_ e1 can fall back to e2 if the input is not in one of the pixel formats supported by e1, maybe that's what's going on? Try using ppm input...
2025-09-17 09:29:00
It's standard RGB 24bit PNG. Same result with PPM. Looking at old results from years ago, 1 thread hits around 200-300MP/s, with a clang optimized main build I'm only hitting 50MP/s
2025-09-17 09:29:56
Effort 2 is a further 5x slower, so it's not falling back
_wb_
2025-09-17 09:33:03
that's strange. Can you bisect to find the commit that caused this?
jonnyawsom3
2025-09-17 09:34:12
Okay yeah, there's a major regression ```cjxl8 -d 0 -e 1 --num_threads 0 Test.ppm nul JPEG XL encoder v0.8.4 [AVX2,SSE4,SSSE3,Unknown] Read 3840x2160 image, 24883217 bytes, 4142.6 MP/s Encoding [Modular, lossless, effort: 1], Compressed to 7817776 bytes (7.540 bpp). 3840 x 2160, 141.04 MP/s [141.04, 141.04], 1 reps, 0 threads.``` 6x slower since v0.9 ```cjxl9 -d 0 -e 1 --num_threads 0 Test.ppm nul JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2] Encoding [Modular, lossless, effort: 1] Compressed to 7818.6 kB (7.541 bpp). 3840 x 2160, 22.910 MP/s [22.91, 22.91], 1 reps, 0 threads.``` The optimized Clang build is over 2x faster than v0.9, but still 3x slower than v0.8 ```cjxl -d 0 -e 1 --num_threads 0 Test.ppm nul JPEG XL encoder v0.12.0 029cec42 [_AVX2_] {Clang 20.1.8} Encoding [Modular, lossless, effort: 1] Compressed to 7818.6 kB (7.541 bpp). 3840 x 2160, 54.364 MP/s, 0 threads.```
_wb_
2025-09-17 09:36:14
I don't think there have been any recent changes to the e1 code itself so this is quite strange. Do you get the same thing when doing `--num_reps=30`?
jonnyawsom3
2025-09-17 09:38:59
v0.8 ```cjxl8 -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul JPEG XL encoder v0.8.4 [AVX2,SSE4,SSSE3,Unknown] Read 3840x2160 image, 24883217 bytes, 4074.7 MP/s Encoding [Modular, lossless, effort: 1], Compressed to 7817776 bytes (7.540 bpp). 3840 x 2160, geomean: 157.25 MP/s [132.82, 175.94], 100 reps, 0 threads.``` v0.9 ```cjxl9 -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2] Read 3840x2160 image, 24883217 bytes, 4071.5 MP/s Encoding [Modular, lossless, effort: 1] Compressed to 7818.6 kB (7.541 bpp). 3840 x 2160, geomean: 23.319 MP/s [21.60, 24.43], 100 reps, 0 threads.``` Optimized main ```cjxl -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul JPEG XL encoder v0.12.0 029cec42 [_AVX2_] {Clang 20.1.8} Read 3840x2160 image, 24883217 bytes, 5587.6 MP/s Encoding [Modular, lossless, effort: 1] Compressed to 7818.6 kB (7.541 bpp). 3840 x 2160, geomean: 54.500 MP/s [50.697, 58.616], 100 reps, 0 threads.```
_wb_ I don't think there have been any recent changes to the e1 code itself so this is quite strange. Do you get the same thing when doing `--num_reps=30`?
2025-09-17 09:40:47
No, but it has been nearly 2 years since v0.9, so it wouldn't have been a recent change
2025-09-17 09:41:30
Glad I found it while we're wrapping up v0.12, means we can hopefully find and fix the problem, making it even more of a golden release
monad
2025-09-17 10:54:47
haven't seen any evidence of such behavior in my environment
spider-mario
v0.8 ```cjxl8 -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul JPEG XL encoder v0.8.4 [AVX2,SSE4,SSSE3,Unknown] Read 3840x2160 image, 24883217 bytes, 4074.7 MP/s Encoding [Modular, lossless, effort: 1], Compressed to 7817776 bytes (7.540 bpp). 3840 x 2160, geomean: 157.25 MP/s [132.82, 175.94], 100 reps, 0 threads.``` v0.9 ```cjxl9 -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2] Read 3840x2160 image, 24883217 bytes, 4071.5 MP/s Encoding [Modular, lossless, effort: 1] Compressed to 7818.6 kB (7.541 bpp). 3840 x 2160, geomean: 23.319 MP/s [21.60, 24.43], 100 reps, 0 threads.``` Optimized main ```cjxl -v -d 0 -e 1 --num_threads 0 --num_reps 100 Test.ppm nul JPEG XL encoder v0.12.0 029cec42 [_AVX2_] {Clang 20.1.8} Read 3840x2160 image, 24883217 bytes, 5587.6 MP/s Encoding [Modular, lossless, effort: 1] Compressed to 7818.6 kB (7.541 bpp). 3840 x 2160, geomean: 54.500 MP/s [50.697, 58.616], 100 reps, 0 threads.```
2025-09-17 11:10:42
on my corp machine, both 0.8.0 and 0.9.0 are about 85MP/s
2025-09-17 11:10:49
so I can’t really bisect
2025-09-17 11:11:47
that’s with clang 19
2025-09-17 11:12:55
ah, main is 50
monad
2025-09-17 11:13:12
moved to more recent versions and I see a difference between 0.10 and 0.11
2025-09-17 11:24:42
```JPEG XL encoder v0.10.0 bf2b7655 [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 1] Compressed to 64122 bytes (0.247 bpp). 1920 x 1080, geomean: 1332.567 MP/s [591.44, 1560.84], 5 reps, 0 threads.``` ```JPEG XL encoder v0.11.0 4df1e9ec [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 1] Compressed to 23224 bytes (0.090 bpp). 1920 x 1080, geomean: 88.376 MP/s [49.61, 88.52], , 5 reps, 0 threads.```
spider-mario
2025-09-17 11:28:52
my bisection leads me to https://github.com/libjxl/libjxl/pull/3658
2025-09-17 11:29:00
maybe a bug in the CPU feature detection?
jonnyawsom3
2025-09-17 11:29:57
We figured out it's 2 separate regressions. One is compiling libjxl with MSVC, the other is with standalone fast lossless MSVC is also giving different output to Clang, apparently ```cjxlGithub -v -d 0 -e 1 --num_reps 50 --num_threads 0 Test.ppm Github.jxl JPEG XL encoder v0.9.0 6768ea8 [AVX2,SSE4,SSSE3,SSE2] Read 3840x2160 image, 24883217 bytes, 3964.6 MP/s Encoding [Modular, lossless, effort: 1] Compressed to 7818.6 kB (7.541 bpp). 3840 x 2160, geomean: 22.902 MP/s [21.32, 23.44], 50 reps, 0 threads. Wall time: 0 days, 00:00:18.147 (18.15 seconds) User time: 0 days, 00:00:00.359 (0.36 seconds) Kernel time: 0 days, 00:00:17.781 (17.78 seconds) cjxlClang -v -d 0 -e 1 --num_reps 50 --num_threads 0 Test.ppm Clang.jxl JPEG XL encoder v0.9.1 b8ceae3a [AVX2,SSE4,SSE2] Read 3840x2160 image, 24883217 bytes, 4035.2 MP/s Encoding [Modular, lossless, effort: 1] Compressed to 7817.8 kB (7.540 bpp). 3840 x 2160, geomean: 140.517 MP/s [106.11, 148.57], 50 reps, 0 threads. Wall time: 0 days, 00:00:03.001 (3.00 seconds) User time: 0 days, 00:00:00.375 (0.38 seconds) Kernel time: 0 days, 00:00:02.625 (2.62 seconds)```
2025-09-17 11:30:51
Clang output is 0.001 bpp smaller and 6x faster
A homosapien
2025-09-17 12:05:42
The standalone fjxl binary seems to thread better than libjxl
2025-09-17 12:39:02
Comparing libjxl and fjxl-standalone, compiled with clang on Windows ``` v0.8.4 ╔═════════╀═════════════╀═══════════════╗ β•‘ Threads β”‚ cjxl β”‚ fast-lossless β•‘ ╠═════════β•ͺ═════════════β•ͺ═══════════════╣ β•‘ 1 β”‚ 249.32 MP/s β”‚ 295.357 MP/s β•‘ β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’ β•‘ 6 β”‚ 694.60 MP/s β”‚ 1027.186 MP/s β•‘ β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’ β•‘ 12 β”‚ 773.32 MP/s β”‚ 1248.640 MP/s β•‘ β•šβ•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• v0.10.4 ╔═════════╀══════════════╀═══════════════╗ β•‘ Threads β”‚ cjxl β”‚ fast-lossless β•‘ ╠═════════β•ͺ══════════════β•ͺ═══════════════╣ β•‘ 1 β”‚ 235.001 MP/s β”‚ 274.598 MP/s β•‘ β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’ β•‘ 6 β”‚ 606.156 MP/s β”‚ 888.753 MP/s β•‘ β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’ β•‘ 12 β”‚ 670.190 MP/s β”‚ 1025.519 MP/s β•‘ β•šβ•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• v0.11.1 ╔═════════╀══════════════╀═══════════════╗ β•‘ Threads β”‚ cjxl β”‚ fast-lossless β•‘ ╠═════════β•ͺ══════════════β•ͺ═══════════════╣ β•‘ 1 β”‚ 65.821 MP/s β”‚ 69.284 MP/s β•‘ β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’ β•‘ 6 β”‚ 263.546 MP/s β”‚ 304.011 MP/s β•‘ β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’ β•‘ 12 β”‚ 354.191 MP/s β”‚ 451.722 MP/s β•‘ β•šβ•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• ```
spider-mario my bisection leads me to https://github.com/libjxl/libjxl/pull/3658
2025-09-17 12:55:21
Can confirm, libjxl's e1 speeds have been halved since this commit. Wow this is crazy, the MSVC builds from github average around ~175 MP/s while these old builds get around 670+.
jonnyawsom3
2025-09-17 01:19:07
https://github.com/libjxl/libjxl/issues/4447
2025-09-17 01:27:30
On a vageuly related note, remember that only main is truly lossless for fast lossless right now
monad
2025-09-17 01:35:47
"truly lossless"
jonnyawsom3
monad "truly lossless"
2025-09-17 01:37:52
https://github.com/libjxl/libjxl/issues/4026
TheBigBadBoy - π™Έπš›
Fox Wizard jhead ``-du``
2025-09-17 01:39:33
guess what finally found what were those "6 bytes" you managed to remove thanks to `jhead -du` It is an additional marker, `0xFFCC` (SOF12 or DAC), which explicitly "Define Arithmetic Coding" [β €](https://cdn.discordapp.com/emojis/867794291652558888.webp?size=48&name=dogelol) So you are not supposed to remove that
2025-09-17 01:40:15
-# and I just won 1 pizza πŸ•
Fox Wizard
2025-09-17 01:41:46
How did you remember that lmao
TheBigBadBoy - π™Έπš›
2025-09-17 01:43:37
I have a folder with various files that "idk how he beat me" <:KekDog:805390049033191445>
Fox Wizard
2025-09-17 01:43:56
Hmmm, interesting
TheBigBadBoy - π™Έπš›
2025-09-17 01:44:10
and since I'm working on my file optimizer, I decided to take a look at that file
Fox Wizard
2025-09-17 01:44:30
Although that probably won't ever happen again since I stopped caring about shaving off every last byte lol
TheBigBadBoy - π™Έπš›
2025-09-17 01:45:34
https://tenor.com/view/wtf-gif-11403402916845629606
2025-09-17 01:46:26
[β €](https://cdn.discordapp.com/emojis/1368821202969432177.webp?size=48&name=SupremelyDispleased)
Fox Wizard
2025-09-17 01:46:30
I now only do random stuff like that when it's necessary
2025-09-17 01:46:39
Because too lazy to thinkβ„’
TheBigBadBoy - π™Έπš›
2025-09-17 01:46:57
that, and also because it takes soooo much time <:KekDog:805390049033191445>
Fox Wizard
2025-09-17 01:47:16
Probably true
2025-09-17 01:47:25
And I need motivation which is very hard to get
2025-09-17 01:48:06
But when I'm motivated I sometimes end up doing things I never expected I could even come close to accomplishing <:KekDog:884736660376535040>
TheBigBadBoy - π™Έπš›
2025-09-17 01:48:23
that's one of the main reason my music library does not grow (that, and the time it takes me to manually credit everyone including violins...)
Fox Wizard
2025-09-17 01:48:41
Mood
TheBigBadBoy - π™Έπš›
Fox Wizard But when I'm motivated I sometimes end up doing things I never expected I could even come close to accomplishing <:KekDog:884736660376535040>
2025-09-17 01:48:47
well anyway, if you ever do this kind of stuff again do not hesitate to hit me up πŸ˜‰
Fox Wizard
2025-09-17 01:49:45
My music library consists mostly of slow encoded FLACs that now feels like a waste of time since FLAC 1.5.0 was more efficient almost every time with much faster params <a:KMS:821038589096886314>
TheBigBadBoy - π™Έπš›
2025-09-17 01:50:14
`-j` is really nice indeed x)
Fox Wizard
2025-09-17 01:50:25
My interests and the things I end up doing are always mega random and rarely last long
2025-09-17 01:50:48
And I don't know why, but I do the most interesting things when I'm not really sober lmao
2025-09-17 01:52:37
Like recently. Stuff happened which caused me to not be able to sleep for a long time, so I had to force myself asleep with medication. And then I had to force myself out of a drowsy barely mentally existing state after not being able to sleep anymore for 3 hours on heavy meds
2025-09-17 01:55:24
Long story short, someone put info that's very important to me in a protected zip file. First layer was a Caesar Cipher with key. Second layer was a key that was a base64 encode of a HEX encode of German text that had to get translated into English
2025-09-17 01:57:34
And then that key had to be used to unlock CBC PKCS5 128 BITS IV encrypted text lmao
TheBigBadBoy - π™Έπš›
2025-09-17 01:59:16
Bruh <:KekDog:805390049033191445> Hopefully you're doing well now <:FrogSupport:805394101528035328>
Fox Wizard
2025-09-17 01:59:21
And some other stuff got involved too. Long story short, I don't even remember doing that since I was so... let's say zoooooted and no idea how to do that again
2025-09-17 01:59:50
But I got it done <:KekDog:884736660376535040>
TheBigBadBoy - π™Έπš›
2025-09-17 02:00:29
Maybe it's time to finally get pizzas together [β €](https://cdn.discordapp.com/emojis/992960743169347644.webp?size=48&name=PirateCat)
Fox Wizard
2025-09-17 02:02:05
Maybe someday
2025-09-17 02:02:32
Currently fully anti social, because of reasons and barely feeling emotions because of other reasons <:KekDog:884736660376535040>
spider-mario
Fox Wizard My music library consists mostly of slow encoded FLACs that now feels like a waste of time since FLAC 1.5.0 was more efficient almost every time with much faster params <a:KMS:821038589096886314>
2025-09-17 02:59:54
have you tried FLACCL?
2025-09-17 02:59:58
GPU-accelerated flac encoding
Fox Wizard
2025-09-17 03:00:38
I vaguely remember doing that a long time ago, but it was less efficient I think
TheBigBadBoy - π™Έπš›
2025-09-17 03:10:27
it's especially good for long files, bc of its init time
spider-mario
spider-mario my bisection leads me to https://github.com/libjxl/libjxl/pull/3658
2025-09-17 05:24:12
seems to have caused an even larger regression on my personal machine, from 250MP/s to 60MP/s (I bisected again from scratch and landed on that exact commit again)
jonnyawsom3
2025-09-17 05:37:27
Depending on what version, compiler and image, I was seeing around 6x slower. Clang brought back 6x, but the bisecting revealed a further 2x that was lost from standalone. Of course, 'just use Clang' isn't really a solution :P
spider-mario
2025-09-17 05:37:55
mine is with clang
2025-09-17 05:39:05
```console $ clang++ --version clang version 20.1.8 Target: x86_64-w64-windows-gnu Thread model: posix InstalledDir: C:/msys64/mingw64/bin ``` I’d just like to interject for a moment. What you're referring to as Windows, is in fact, GNU/Windows, or as I've recently taken to calling it, GNU plus Windows.
2025-09-17 05:40:26
I haven’t checked yet but I suspect https://github.com/libjxl/libjxl/blob/1c3d187019537700e26a426de7b8be58e4f8262a/lib/jxl/enc_fast_lossless.cc#L185-L188 might be triggering?
2025-09-17 05:42:43
ooh, bits 5-6-7 are for AVX-512 (https://en.wikipedia.org/wiki/Control_register#XCR0_and_XSS)
2025-09-17 05:42:54
so if we don’t detect AVX-512, we clear everything and don’t even detect AVX2?
2025-09-17 05:43:00
<@811568887577444363> am I reading this right?
jonnyawsom3
2025-09-17 05:43:27
I left some details in the [issue](<https://github.com/libjxl/libjxl/issues/4447>) but the github releases are around 2-3x slower. I'm probably getting some numbers mixed up, I had 7 versions of cjxl in my console and a wall of results to pick from
spider-mario I haven’t checked yet but I suspect https://github.com/libjxl/libjxl/blob/1c3d187019537700e26a426de7b8be58e4f8262a/lib/jxl/enc_fast_lossless.cc#L185-L188 might be triggering?
2025-09-17 05:43:33
Maybe? But even without AVX or SSE, a 6x slowdown seems extreme (12x in some cases)
spider-mario
2025-09-17 05:45:15
it seems plausible to me
2025-09-17 05:45:20
an AVX2 vector is 8 floats (or int32s)
jonnyawsom3
2025-09-17 05:47:12
Actually right, I had the usual few percentage gains in my head from AVX2 on random binaries. Fast lossless has a lot of handwritten optimizations so it could be pulling it off
2025-09-17 05:48:44
I'm honestly surprised fast lossless even has AVX512, seeing as it was disabled by default in libjxl due to little gains
spider-mario
2025-09-17 05:56:10
this makes it fast again for me: ```diff diff --git a/lib/jxl/enc_fast_lossless.cc b/lib/jxl/enc_fast_lossless.cc index e8ea04913..c87ae229b 100644 --- a/lib/jxl/enc_fast_lossless.cc +++ b/lib/jxl/enc_fast_lossless.cc @@ -184,7 +184,8 @@ uint32_t DetectCpuFeatures() { const uint32_t xcr0 = ReadXCR0(); if (!check_bit(xcr0, 1) || !check_bit(xcr0, 2) || !check_bit(xcr0, 5) || !check_bit(xcr0, 6) || !check_bit(xcr0, 7)) { - flags = 0; // TODO(eustas): be more selective? + // No AVX-512; disable everything but AVX2 if present + flags &= CpuFeatureBit(CpuFeature::kAVX2); } } ```
jonnyawsom3
2025-09-17 05:57:18
Don't suppose you could send the binary for me to check too?
spider-mario
2025-09-17 06:00:04
hopefully, this works
jonnyawsom3
2025-09-17 06:13:35
Yeah, that got singlethreaded back to normal, maybe 10% slower than v0.8 Multithreaded still has a way to go v0.8 ```Compressed to 9027058 bytes including container (8.707 bpp). 3840 x 2160, geomean: 328.88 MP/s [260.63, 367.54], 50 reps, 8 threads.``` Fixed Main ```Compressed to 9027.1 kB including container (8.707 bpp). 3840 x 2160, geomean: 289.702 MP/s [243.688, 319.687], 50 reps, 8 threads.``` Standalone fast lossless ```485.644 MP/s 8.701 bits/pixel``` Still both faster and slightly denser
A homosapien
gb82 oops; do u have the 444 link somewhere?
2025-09-18 02:16:03
https://media.xiph.org/video/derf/subset1-y4m.tar.gz I have good news and bad news. The bad news is that the daala image set was always distributed as limited range 420 in a y4m, which explains the blocky chroma in the corpus. The good news is that all the images are publicly sourced, so it's easy to find the real original 444 hi-res jpgs. Also they completely ignore the icc profiles of the original jpgs as well.
2025-09-18 02:17:15
I've run into this issue before with other xiph images https://discord.com/channels/794206087879852103/794206170445119489/1364305480759119984
username
2025-09-18 02:19:07
the readme has the sources listed it seems
A homosapien
2025-09-18 02:21:29
Maybe I should remake the set with proper color conversion/downscaling<:Thonk:805904896879493180>
2025-09-18 02:27:19
Wait, the daala set is even worse than I thought. It's not just desaturated due to it ignoring the icc profiles, there is also slight color shift. I think somebody used ffmpeg to covert it png, which assumes bt.601 color primaries instead of bt.709.
2025-09-18 02:31:18
[This set](https://github.com/WyohKnott/image-formats-comparison/tree/gh-pages/comparisonfiles/subset1/Original) is flawed in multiple ways.
jonnyawsom3
2025-09-18 08:01:29
So, I was messing around with a black 4096*4096 image to see how fast I could get fast lossless to go... Interesting results Fixed main cjxl ```JPEG XL encoder v0.12.0 b662606ed [_AVX2_,SSE4,SSE2] {Clang 20.1.8} Encoding [Modular, lossless, effort: 1] Compressed to 18281 bytes (0.009 bpp). 4096 x 4096, geomean: 100.359 MP/s [76.933, 118.257], 50 reps, 8 threads. PageFaultCount: 4322443 PeakWorkingSetSize: 163.4 MiB QuotaPeakPagedPoolUsage: 52.43 KiB QuotaPeakNonPagedPoolUsage: 9.422 KiB PeakPagefileUsage: 163.7 MiB Creation time 2025/09/18 08:55:34.010 Exit time 2025/09/18 08:55:42.485 Wall time: 0 days, 00:00:08.475 (8.48 seconds) User time: 0 days, 00:00:16.640 (16.64 seconds) Kernel time: 0 days, 00:00:20.843 (20.84 seconds)``` v0.10 standalone fast lossless ```wintime -- fjxl Black.png nul 2 50 8 582.256 MP/s 0.003 bits/pixel PageFaultCount: 92840 PeakWorkingSetSize: 76.41 MiB QuotaPeakPagedPoolUsage: 31.75 KiB QuotaPeakNonPagedPoolUsage: 8.086 KiB PeakPagefileUsage: 201.4 MiB Creation time 2025/09/18 08:56:05.187 Exit time 2025/09/18 08:56:06.715 Wall time: 0 days, 00:00:01.528 (1.53 seconds) User time: 0 days, 00:00:00.421 (0.42 seconds) Kernel time: 0 days, 00:00:03.187 (3.19 seconds)``` v0.9 ```JPEG XL encoder v0.9.1 b8ceae3a [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 1] Compressed to 1576 bytes (0.001 bpp). 4096 x 4096, geomean: 3576.196 MP/s [2910.24, 3812.05], 50 reps, 8 threads. PageFaultCount: 42017 PeakWorkingSetSize: 36.23 MiB QuotaPeakPagedPoolUsage: 44.27 KiB QuotaPeakNonPagedPoolUsage: 7.031 KiB PeakPagefileUsage: 51.04 MiB Creation time 2025/09/18 08:57:08.045 Exit time 2025/09/18 08:57:08.329 Wall time: 0 days, 00:00:00.284 (0.28 seconds) User time: 0 days, 00:00:00.125 (0.12 seconds) Kernel time: 0 days, 00:00:00.750 (0.75 seconds)```
2025-09-18 08:02:22
I know it's a pure black image, but *something* else big changed to go from 3.6GP/s to 100MP/s and 0.001 bpp to 0.009 bpp
spider-mario
2025-09-18 08:10:03
1-bit, 8-bit or 16-bit image?
jonnyawsom3
2025-09-18 08:12:00
Huh, cool... 5.6GP/s ```JPEG XL encoder v0.9.1 b8ceae3a [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 1] Compressed to 1576 bytes (0.001 bpp). 4096 x 4096, median: 5653.272 MP/s [2906.15, 6052.60] (stdev 892.577), 5000 reps, 16 threads.```
spider-mario 1-bit, 8-bit or 16-bit image?
2025-09-18 08:12:10
1-bit Greyscale
spider-mario
2025-09-18 08:34:49
apparently, one of the reps reached 11GP/s > 4096 x 4096, geomean: 6588.863 MP/s [3905.04, 11231.99], 100 reps, 8 threads.
2025-09-18 08:34:57
(v0.9.1 – now bisecting)
2025-09-18 08:35:37
main (after avx2 fix): > 4096 x 4096, geomean: 140.945 MP/s [111.058, 159.413], 100 reps, 8 threads.
2025-09-18 08:35:38
yeah.
jonnyawsom3
2025-09-18 08:48:52
Isn't that 11GP/s?
spider-mario
2025-09-18 08:49:10
sorry, yes
2025-09-18 08:49:29
probably wasn’t well awake yet
jonnyawsom3
2025-09-18 08:49:34
Feels weird talking about such high resolutions regardless haha
2025-09-18 08:50:52
For me, increasing reps kept increasing speed due to cache, so you could try increasing it to 5K reps and see if you get closer to that 11GP/s on average
spider-mario
2025-09-18 08:51:41
oh, interesting effect
Orum
1-bit Greyscale
2025-09-18 08:52:06
isn't 1-bit by definition B&W, not grayscale?
jonnyawsom3
2025-09-18 08:52:28
Yeah, but pipelines interpret it as greyscale unless they have a dedicated bitmap pipeline
Orum
2025-09-18 08:52:47
yeah, I noticed cjxl will not accept pbm <:FeelsSadMan:808221433243107338>
jonnyawsom3
2025-09-18 08:55:39
Reminds me, I wanted to explore adding custom squeeze levels, so 1-bit images could have 1 squeeze level applied to help compression <https://github.com/libjxl/libjxl/issues/3775#issuecomment-2317324336>
spider-mario
2025-09-18 09:39:51
<@238552565619359744> it’s https://github.com/libjxl/libjxl/pull/3661
2025-09-18 09:47:05
(later updated in https://github.com/libjxl/libjxl/pull/3733)
jonnyawsom3
2025-09-18 10:13:48
Apparently that was the one time I didn't try effort 2, but yeah. It's falling back to effort 2 as an unsupported bitdepth now
2025-09-18 07:30:28
Hmm, <@604964375924834314> I just downloaded and tried your latest PR <https://github.com/libjxl/libjxl/pull/4449> from Github actions, but I saw no improvement Pre-fix ```JPEG XL encoder v0.12.0 ef6f677 [_AVX2_,SSE2] {MSVC 19.44.35215.0} Encoding [Modular, lossless, effort: 1] Compressed to 7824.6 kB including container (7.547 bpp). 3840 x 2160, geomean: 21.510 MP/s [19.247, 22.122], 50 reps, 0 threads.``` Post-fix ```JPEG XL encoder v0.12.0 13dfd6f [_AVX2_,SSE2] {MSVC 19.44.35215.0} Encoding [Modular, lossless, effort: 1] Compressed to 7824.6 kB including container (7.547 bpp). 3840 x 2160, geomean: 21.602 MP/s [20.023, 22.387], 50 reps, 0 threads.``` Clang v0.9 ```JPEG XL encoder v0.9.1 b8ceae3a [AVX2,SSE4,SSE2] Encoding [Modular, lossless, effort: 1] Compressed to 7823.9 kB including container (7.546 bpp). 3840 x 2160, geomean: 127.952 MP/s [115.64, 134.51], 50 reps, 0 threads.``` Clang pre-fix ```JPEG XL encoder v0.12.0 029cec42 [_AVX2_] {Clang 20.1.8} Encoding [Modular, lossless, effort: 1] Compressed to 7824.6 kB including container (7.547 bpp). 3840 x 2160, geomean: 49.596 MP/s [45.826, 51.540], 50 reps, 0 threads.```
2025-09-18 07:48:26
Maybe this was a regression purely with Clang? And MSVC has just always been this bad...
2025-09-18 07:53:00
Hmm, yeah v0.10 and v0.11 with MSVC are the same speed, 20MP/s while Clang was hitting 50MP/s and now hits 126MP/s with the fix
2025-09-18 07:55:02
And the output is different...
2025-09-18 07:55:34
Fixed Clang ```JPEG XL encoder v0.12.0 b662606ed [_AVX2_,SSE4,SSE2] {Clang 20.1.8} Encoding [Modular, lossless, effort: 1] Compressed to 7823.9 kB including container (7.546 bpp). 3840 x 2160, geomean: 125.763 MP/s [105.741, 131.516], 20 reps, 0 threads.``` Fixed MSVC ```JPEG XL encoder v0.12.0 13dfd6f [_AVX2_,SSE2] {MSVC 19.44.35215.0} Encoding [Modular, lossless, effort: 1] Compressed to 7824.6 kB including container (7.547 bpp). 3840 x 2160, geomean: 21.456 MP/s [20.881, 22.045], 20 reps, 0 threads.```
2025-09-18 07:56:03
6x faster and slightly denser, so something is wonky between them
spider-mario I haven’t checked yet but I suspect https://github.com/libjxl/libjxl/blob/1c3d187019537700e26a426de7b8be58e4f8262a/lib/jxl/enc_fast_lossless.cc#L185-L188 might be triggering?
2025-09-19 01:15:02
I know almost nothing about compilers so I'm not sure, but isn't this disabling AVX2 when MSVC is being used? <https://github.com/libjxl/libjxl/blob/13dfd6ffee8a342a16b90fb66a024fee27835dbb/lib/jxl/enc_fast_lossless.cc#L59>
2025-09-19 01:27:02
A google search later and apparently MSVC just sucks, regardless of what compile options or intrinsics you give it
username
A google search later and apparently MSVC just sucks, regardless of what compile options or intrinsics you give it
2025-09-19 01:35:06
jonnyawsom3
2025-09-19 01:41:42
πŸ˜”
2025-09-19 01:41:42
https://github.com/libjxl/libjxl/pull/3368
Orum
2025-09-19 02:05:09
why was that never merged?
2025-09-19 02:05:19
just awaiting review?
jonnyawsom3
2025-09-19 02:17:38
It was waiting on this https://github.com/libjxl/libjxl/pull/3388
2025-09-19 02:18:59
Though the Github actions run clang builds already, so I'm not sure what's holding it up
spider-mario
username
2025-09-19 07:01:20
gone are the times when MSVC, while way behind in language support, was at least competitive perfomance-wise
veluca
Though the Github actions run clang builds already, so I'm not sure what's holding it up
2025-09-19 09:41:15
life
2025-09-19 09:41:44
or me having forgotten completely about it and being busy with... a million things, including jxl-rs
jonnyawsom3
2025-09-19 09:50:42
Understandably so
Kupitman
A google search later and apparently MSVC just sucks, regardless of what compile options or intrinsics you give it
2025-09-20 11:32:13
Wtf is msvc, use gcc
A homosapien
2025-09-20 12:39:04
Clang faster than gcc for libjxl
Orum
2025-09-20 01:33:33
gcc is dead, long live clang
Kupitman
2025-09-20 02:45:27
GCC >>>
2025-09-20 02:46:37
https://tenor.com/view/richard-stallman-stallman-rms-emacs-gnu-gif-23943451
2025-09-23 12:04:25
https://media.giphy.com/media/4DB6MagAp0F7EkOH6U/giphy.gif
jonnyawsom3
2025-09-23 02:55:18
I completely forgot, my CPU is so old that AVX2 takes 2 cycles. It's surprising fast lossless is only 50% slower than Veluca's old 5800x results I was comparing against, seeing as it's already 50% higher clock speed and 4 years of architectural improvements between them
Orum
2025-10-02 01:56:30
JXL doing nicely here, 61% of the original TIFF (which is compressed as well): ``` 87738044 Webb_Reveals_Cosmic_Cliffs,_Glittering_Landscape_of_Star_Birth-10.jxl 103789782 Webb_Reveals_Cosmic_Cliffs,_Glittering_Landscape_of_Star_Birth-7.webp 143648188 Webb_Reveals_Cosmic_Cliffs,_Glittering_Landscape_of_Star_Birth.tiff ```
2025-10-02 01:56:56
if only the government would use it <:FeelsSadMan:808221433243107338>
AccessViolation_
2025-10-07 05:03:57
potentially interesting benchmark image: https://upload.wikimedia.org/wikipedia/commons/e/ea/Mandelbox_mit_farbigem_Nebel_und_Licht_20241111_%28color%29.png
Mine18
AccessViolation_ potentially interesting benchmark image: https://upload.wikimedia.org/wikipedia/commons/e/ea/Mandelbox_mit_farbigem_Nebel_und_Licht_20241111_%28color%29.png
2025-10-07 07:34:55
surprisingly, AVIF on the lowest quality level looks reallly good
2025-10-07 07:38:40
here's what i got
AccessViolation_
2025-10-07 07:58:20
jxl does pretty alright here too!
2025-10-07 08:03:51
I got a weird thing at `-q 10` where it didn't seem to apply [adaptive LF smoothing](<https://arxiv.org/pdf/2506.05987#subsection.6.2>), but then it *did* apply it at `-q 5` and `-q 1`, so surprisingly, those lower quality settings end up looking better, since a large part of this image is its gradients
2025-10-07 08:10:23
the lowest quality JXL (just `cjxl source.png -d 25 d25.jxl`) still looks pretty good. I think the gradients look better than the AVIF, but AVIF is better at turning the structure into smooth blocks with nice sharp edges
2025-10-07 08:11:08
- others for reference: (specifically for if someone wants to look into why adaptive LF smoothing didn't seem to trigger at `-q 10`, it looks *really* bad)
jonnyawsom3
AccessViolation_ I got a weird thing at `-q 10` where it didn't seem to apply [adaptive LF smoothing](<https://arxiv.org/pdf/2506.05987#subsection.6.2>), but then it *did* apply it at `-q 5` and `-q 1`, so surprisingly, those lower quality settings end up looking better, since a large part of this image is its gradients
2025-10-07 08:14:46
What distance is that?
AccessViolation_
What distance is that?
2025-10-07 08:16:09
`Encoding [VarDCT, d15.267, effort: 7]`
jonnyawsom3
2025-10-07 08:17:51
Interesting, I can't think of any thresholds around there
2025-10-07 08:29:51
Well, at least I made the right call enabling resampling at distance 10. Looks sharper and 20% smaller using a lower internal resolution... Does hint that the encoder could try harder though
AccessViolation_
2025-10-07 08:45:15
especially on a massive image like this that's often going to be looked at at 10%-20% scale anyway it's a smart move
2025-10-07 08:46:39
but even when zoomed in like that the resampling helps nicely πŸ‘€
jonnyawsom3
2025-10-07 09:11:59
I was going to explore using 4x resampling too on top of the automatic 2x, but it would need to check the resolution as it's just too blurry for anything below 4K
juliobbv
AccessViolation_ the lowest quality JXL (just `cjxl source.png -d 25 d25.jxl`) still looks pretty good. I think the gradients look better than the AVIF, but AVIF is better at turning the structure into smooth blocks with nice sharp edges
2025-10-07 09:45:01
I was curious and tried the image with my mystery AVIF encoder (file-size matched to d25.jxl) πŸ‘€ the second encode is the absolute lowest quality available (57 KB)
2025-10-07 09:47:08
"mystery encoder": SVT-AV1 tune IQ, preset -1
2025-10-07 09:47:25
this is a nice benchmark image
AccessViolation_
2025-10-07 09:49:52
very impressive
2025-10-07 09:51:23
this is the source, btw https://commons.wikimedia.org/wiki/File:Mandelbox_mit_farbigem_Nebel_und_Licht_20241111_(color).png
juliobbv
2025-10-07 09:51:24
yeah, I love love JXL's handling of gradients though
AccessViolation_
2025-10-07 09:55:20
yeah, adaptive LF smoothing works really well
jonnyawsom3
2025-10-07 09:59:20
And the gradient predictor :P
A homosapien
juliobbv "mystery encoder": SVT-AV1 tune IQ, preset -1
2025-10-07 10:01:20
Wait, tune IQ? Not tune=4 "still image"?
juliobbv
A homosapien Wait, tune IQ? Not tune=4 "still image"?
2025-10-07 10:02:15
yep, mainline SVT-AV1 now has its own tune IQ too
2025-10-07 10:02:46
so it got assigned tune=3 because that was the next one available lol
A homosapien
2025-10-07 10:03:11
how does it compare to the psy forks?
juliobbv
2025-10-07 10:03:31
it's mostly the same as the psy forks
2025-10-07 10:04:11
the only exception is that I had to switch it back to use SSD instead of SSIM RDO as the latter wasn't compatible with `--avif`
2025-10-07 10:04:38
but all of the other tweaks were transferred verbatim from -psy
A homosapien
2025-10-07 10:04:54
nice to see big features getting merged into mainline
juliobbv
2025-10-07 10:05:12
yeah, this one was long overdue too
2025-10-07 10:05:30
libaom's is actually a derivative, and that one got merged first
2025-10-07 10:08:25
and it's also a relief for maintenance purposes, because the code delta between mainline and the psy forks is becoming much smaller, so rebasing the forks onto new mainline versions is now faster
A homosapien
juliobbv the only exception is that I had to switch it back to use SSD instead of SSIM RDO as the latter wasn't compatible with `--avif`
2025-10-07 11:35:44
Where is that located in the code? I would like to compile a fresh build myself.
juliobbv
A homosapien Where is that located in the code? I would like to compile a fresh build myself.
2025-10-07 11:38:04
it's in a few places
2025-10-07 11:38:22
basically, everything that references the `TUNE_SSIM` constant
A homosapien
2025-10-07 11:39:52
And I would just change it to `SSD`?
juliobbv
A homosapien And I would just change it to `SSD`?
2025-10-07 11:41:17
you'd need to add `TUNE_IQ` to be alongside `TUNE_SSIM`
2025-10-07 11:41:43
most of the time the constants are used in checks in if clauses
2025-10-07 11:44:30
for example, in `src_ops_process.c` you'd add the `TUNE_IQ` check : ``` if (scs->static_config.tune == TUNE_SSIM || scs->static_config.tune == TUNE_IQ) { aom_av1_set_mb_ssim_rdmult_scaling(pcs); } ```
AccessViolation_ - others for reference: (specifically for if someone wants to look into why adaptive LF smoothing didn't seem to trigger at `-q 10`, it looks *really* bad)
2025-10-08 01:05:07
oh wow, you weren't joking on adaptive LF
2025-10-08 01:05:29
it really improves the quality of gradients
2025-10-08 01:06:41
there's so much banding in the no adaptive LF image that it makes it look like it was decoded with like 6 bits lol
2025-10-08 01:07:08
jonnyawsom3
2025-10-08 04:37:16
Actually, it's worse than that, because it should be dithered for VarDCT
AccessViolation_
juliobbv oh wow, you weren't joking on adaptive LF
2025-10-08 04:41:00
yeah I'm surprised adaptive LF smoothing actually works this well
jonnyawsom3
AccessViolation_ yeah I'm surprised adaptive LF smoothing actually works this well
2025-10-08 07:37:55
Well, it was built for it
AccessViolation_
2025-10-08 07:40:23
yeah I know but for some reason I didn't expect it to work well on varblocks this large
A homosapien
juliobbv the only exception is that I had to switch it back to use SSD instead of SSIM RDO as the latter wasn't compatible with `--avif`
2025-10-09 12:38:15
Well I just compiled SVT-AV1 from main and `tune=IQ` seems to work fine with `-a avif=1`, when I make the changes you suggested avifenc crashes lol.
juliobbv
A homosapien Well I just compiled SVT-AV1 from main and `tune=IQ` seems to work fine with `-a avif=1`, when I make the changes you suggested avifenc crashes lol.
2025-10-09 12:38:51
yep, that's why those changes were left out πŸ˜›
2025-10-09 12:39:37
you can still use it without avif mode though
2025-10-09 12:40:08
you'll lose a few bytes due to the full AV1 header being used instead of the reduced still picture one
A homosapien
2025-10-09 12:40:30
I just like using avif mode for the reduced memory usage
juliobbv
2025-10-09 12:40:37
yeah, that's fair
A homosapien
2025-10-09 12:40:39
SVT is quite memory hungry
2025-10-09 12:49:26
Wow avif mode reduces memory consumption quite a lot. Using the 8000x8000 Mendel image as a test. ``` wintime -- avifenc -y 420 --sharpyuv --cicp 1/2/1 -d 10 -c svt -a tune=3 -a avif=x -a lp=4 -q 10 --tilecolslog2 0 --tilerowslog2 0 mandel.png mandel.avif -a avif=1 = PeakWorkingSetSize: 2.216 GiB -a avif=0 = PeakWorkingSetSize: 15.1 GiB ```