JPEG XL

https://discord.com/channels/794206087879852103/824000991891554375/1370391774434168913 I don't *think* that's how it works, but that is an interesting way to do it. Recursively iterate on the MA tree until all residuals are below a certain value, instead of only testing a tree on a percentage of all pixels

_wb_

2025-05-09 02:28:48

The `-I` percentage is the percentage of samples used to do learning. The samples use quantized property values to save memory (property values are int32_t but they get quantized to at most 256 buckets so they fit into uint8), and to decide how to quantize the property values, first 10% of the `-I` percentage is sampled just to figure out how to quantize, before the actual sample is taken.

Olav

2025-05-12 05:21:09

Will jxl-rs also do JPEG decoding, like libjxl djxl?

jonnyawsom3

2025-05-12 05:34:08

Oh, you mean JPEG reconstruction?

AccessViolation_

2025-05-12 05:51:32

I assume they mean decoding from any type of JXL to JPEG, like `djxl`

_wb_

2025-05-12 06:34:44

If it's not JPEG reconstruction, then it's just decoding to pixels and encoding to JPEG. Not really in the scope of a jxl decoder library...

Olav

2025-05-12 06:44:59

I ment JPEG to pixels. Just thought I'd be a good argument for jxl-rs adoption for browsers if they could replace their current JPEG decoder with it.

TheBigBadBoy - 𝙸𝚛

2025-05-12 09:08:57

well jxl-rs will be adopted by FireFox even without having "JPEG decoder" <:FeelsAmazingMan:808826295768449054>

CrushedAsian255

	Olav I ment JPEG to pixels. Just thought I'd be a good argument for jxl-rs adoption for browsers if they could replace their current JPEG decoder with it.
2025-05-12 11:13:21	as in it can read both JXL and JPEG files?
2025-05-12 11:13:25	not jpeg reconstruction?

Olav

	CrushedAsian255 not jpeg reconstruction?
2025-05-13 01:17:37	Yes, as in read both JPEG and JPEG XL.

jonnyawsom3

2025-05-13 09:44:24

<@794205442175402004> you forgot to add this to the merge queue after you approved it. No rush though https://github.com/libjxl/libjxl/pull/4244

monad

2025-05-13 10:32:43

surely it was intentional

_wb_

2025-05-13 10:50:23

Right. On the queue it goes.

jonnyawsom3

2025-05-13 10:50:31	Thanks
2025-05-15 05:55:54	Well, that doesn't look good. The release builds are all failing https://github.com/libjxl/libjxl/actions/workflows/release.yaml
2025-05-15 06:00:32	Think it was caused by this https://github.com/libjxl/libjxl/pull/4220

_wb_

2025-05-15 06:52:44	Yeah I think I am referencing something that doesn't exist or something like that, see also https://github.com/libjxl/libjxl/pull/4220#issuecomment-2854376395
2025-05-15 06:53:39	<@811568887577444363> if you get a chance please fix my mess-up 🙂

jonnyawsom3

2025-05-16 03:02:51	<@795684063032901642> and if you get a chance, could you take another glance at this? Been ready for quite a while, but we can only iterate on it once people start using it https://github.com/google/jpegli/pull/130
2025-05-16 03:03:23	Then the repos also need to by synced, there were some missing commits on the google one that aren't in libjxl

Demiurge

2025-05-18 09:14:51	It would definitely be easier if it was broken up into separate PRs like one for the app14 fix, one for the chroma subsampling changes and fixes, and one for the other tweaks.
2025-05-18 09:16:32	Speaking as someone who isn't really involved in the project, but knows it's a bad idea to have everything lumped together as an all-in-one like that...

Mine18

2025-05-20 06:35:16

<@238552565619359744> https://github.com/libjxl/libjxl/pull/4258 is it only preset 7 that's slower? or preset 7 and higher are slower?

jonnyawsom3

	Mine18 <@238552565619359744> https://github.com/libjxl/libjxl/pull/4258 is it only preset 7 that's slower? or preset 7 and higher are slower?
2025-05-20 01:53:15	Now that you mention it... Yeah, for some reason it's only effort 7 that's massively slower. All others are around 5% or 25% for effort 8 and 9 instead of 70% for effort 7 on the new image I just tried. Maybe something to do with patches and chunked encoding?

Mine18

2025-05-20 01:58:23

what about the fastest presets, <=4, these would be very impactful for any casual user or realtime usecases for JXL

jonnyawsom3

2025-05-20 02:00:53	Yeah, it's patches triggering because progressive_dc disables buffering
2025-05-20 02:01:10	And patches are only enabled at effort 7 or higher, causing the sudden speed hit
2025-05-20 02:01:49	Effort 8 and 9 already disable buffering when distance is above 0.5, so they remain a lot closer to before
	Mine18 what about the fastest presets, <=4, these would be very impactful for any casual user or realtime usecases for JXL
2025-05-20 02:04:41	Effort 1 and 2 are actually around 8% faster, 3 remains the same, 4 is 10% slower, 5 and 6 are 5% slower
2025-05-20 02:06:50	Also bear in mind this is only when progressive encoding is used, so probably not for the casual user or realtime

Mine18

2025-05-20 02:08:47

is progressive encoding like chunked encoding? what does it have to do with progressive decoding?

jonnyawsom3

	Mine18 is progressive encoding like chunked encoding? what does it have to do with progressive decoding?
2025-05-20 02:34:47	JPEG XL is inherently progressive by loading per-group, or with the 1:8 DC for lossy, but you can split the AC in half to get a 'half-quality' result at around 50% loaded. You can also turn the DC into a 1:64 image instead, with it's own 1:8 DC, which allows decoding only a few pixels and every doubling of resolution in-between. Old behaviour only did progressive AC, so at around 50% you'd have a full image without black gaps. The PR adds progressive DC, so you have a full image much sooner, at a lower resolution. For example, the first progressive pass of a 4K image: ``` Regular Lossy: 3840 x 2160 --> 480 x 270 Progressive DC: 3840 x 2160 --> 60 x 33.75 --> 7.5 x 4.2 ```
	Yeah, it's patches triggering because progressive_dc disables buffering
2025-05-20 02:41:24	Hmm.. Maybe not. Disabling patches brings it down from 72% slower to 54%, but that's still double what effort 8 and 9 lost
2025-05-20 02:50:34	I've been testing on the wrong version... Lovely, running tests again

Mine18

	Also bear in mind this is only when progressive encoding is used, so probably not for the casual user or realtime
2025-05-20 03:03:16	so does this mean that when only using the new progressive decoding, encoding performance isnt as bad?

jonnyawsom3

2025-05-20 03:04:55

Around 25% slower for all effort levels using a fresh build, but you can still override the DC with `--progressive_dc 0` to avoid the slowdown and return to old behaviour (but centre-out)

Mine18

2025-05-20 03:07:42

good to know, i think these changes are overall worth it as center out prog decoding is a huge strength for web images and this should make jxl more appealing to everyone

jonnyawsom3

2025-05-20 03:21:15

```Current -p Compressed to 518.4 kB (0.500 bpp). 3840 x 2160, 2.285 MP/s [2.28, 2.28], , 1 reps, 16 threads. Current PR -p Compressed to 529.6 kB (0.511 bpp). 3840 x 2160, 1.661 MP/s [1.66, 1.66], , 1 reps, 16 threads. PR -p --patches 0 Compressed to 529.6 kB (0.511 bpp). 3840 x 2160, 3.184 MP/s [3.18, 3.18], , 1 reps, 16 threads.``` Hmm, so disabling patches makes it even faster than current, but could have some hefty density impacts for certain images

Mine18

2025-05-20 03:44:09

density? what do you mean?

jonnyawsom3

2025-05-20 03:45:54

Larger filesize if an image has a lot of patch candidates and I disable them for progressive

Mine18

2025-05-20 03:46:43

oh, for certain images i was too focused on the results you showed

jonnyawsom3

	_wb_ Yeah I think I am referencing something that doesn't exist or something like that, see also https://github.com/libjxl/libjxl/pull/4220#issuecomment-2854376395
2025-05-20 04:20:39	Also means the nightly builds are 2 weeks old https://artifacts.lucaversari.it/libjxl/libjxl/latest/
2025-05-30 05:29:09	Hit 3K commits the other day

CrushedAsian255

	Hit 3K commits the other day
2025-05-30 09:25:12	This community is very commit-ted

Kleis Auke

2025-06-03 12:40:57

Just wondering, is `JPEGXL_SO_MINOR_VERSION` only incremented when the existing ABI/API changes? If not, could this be reconsidered? libvips runs into issues with this on "stable" distros, where the convention is to _not_ update libraries when the ABI/API has changed. See PR <https://github.com/libvips/libvips/pull/4550> for context.

_wb_

2025-06-03 01:32:48	we follow https://semver.org/ as far as I know
2025-06-03 01:40:18	I guess there have always been small API changes at every new released version in the past; I think if we do a 0.12 now then probably we can keep JPEGXL_SO_MINOR_VERSION at 11 since I don't think there has been any ABI/API change.

Kleis Auke

	_wb_ I guess there have always been small API changes at every new released version in the past; I think if we do a 0.12 now then probably we can keep JPEGXL_SO_MINOR_VERSION at 11 since I don't think there has been any ABI/API change.
2025-06-03 02:12:45	That would be great!
2025-06-03 02:13:06	FWIW, the SONAME bump wasn't really necessary between v0.9.0 and v0.11.1 too, according to `abi-compliance-checker`. <https://kleisauke.nl/compat_reports/libjxl/0.9.0_to_0.11.1/compat_report.html> <https://kleisauke.nl/compat_reports/libjxl_cms/0.9.0_to_0.11.1/compat_report.html> <https://kleisauke.nl/compat_reports/libjxl_threads/0.9.0_to_0.11.1/compat_report.html>
2025-06-03 02:13:24	(IIUC, you can safely ignore that "parameter _XX_ became passed in _YY_ register instead of _ZZ_"-errors in those reports)
	Kleis Auke FWIW, the SONAME bump wasn't really necessary between v0.9.0 and v0.11.1 too, according to `abi-compliance-checker`. <https://kleisauke.nl/compat_reports/libjxl/0.9.0_to_0.11.1/compat_report.html> <https://kleisauke.nl/compat_reports/libjxl_cms/0.9.0_to_0.11.1/compat_report.html> <https://kleisauke.nl/compat_reports/libjxl_threads/0.9.0_to_0.11.1/compat_report.html>
2025-06-03 03:02:38	Ah, never mind, I forgot to check `libjxl_extras_codec.so`, which had some changes between v0.9.x and v0.10.x ([according to Debian](<https://salsa.debian.org/debian-phototools-team/libjxl/-/commit/6cdb58efec2ecfea9e9b4f5e38fe5d1b9a4760c3>)) and was converted to a static library in v0.11.0 (see e.g. [this commit](<https://salsa.debian.org/debian-phototools-team/libjxl/-/commit/a160a95b5f408b041ce7912f7d1d4e1fc7786328>)).

jonnyawsom3

2025-06-03 03:20:26	Finally got round to splitting this into separate PRs. [APP14 Marker](<https://github.com/google/jpegli/pull/135>), [444 Defaults](<https://github.com/google/jpegli/pull/136>) and [Quality based settings](<https://github.com/google/jpegli/pull/137>) <https://github.com/google/jpegli/pull/130#issuecomment-2935880782>
2025-06-03 03:23:46	The first two should be ready to merge, the quality based settings may require some more work due to test failures

Demiurge

2025-06-03 11:58:06	Yes, horay!
2025-06-03 11:59:24	Now someone needs to tweak the cjxl tool to give a nonzero exit status when encountering unrecognized PNG chunks.
2025-06-04 12:00:32	Since that seems to be the latest new controversy these days with cjxl not giving a warning when ignoring/throwing out metadata

jonnyawsom3

2025-06-07 01:43:53

I thought I'd ask, could I get similar permissions for the jpegli repo as libjxl? Then we can run the tests without requiring a core dev to manually approve it every commit. It would help a lot with finding bugs in PRs and being able to fix them.

veluca

2025-06-07 01:48:38

fine with me but I don't have those permissions myself xD

afed

2025-06-07 02:24:20	yeah, jpegli needs some active maintainers there are a couple of useful PRs, including static compilation fixes and using it as a lib for Apple and Windows
2025-06-07 02:31:54	mabs also removed jpegli <:FeelsSadMan:808221433243107338> yeah, it's just about libjxl, but jpegli wasn't re-added as a separate distro either https://github.com/m-ab-s/media-autobuild_suite/pull/2917

Melirius

2025-06-10 11:22:09	There is an annoying problem in skcms, not sure what will be the right way to go https://github.com/libjxl/libjxl/issues/4280
2025-06-10 11:25:57	SKCMS maintainers remove old commits, so almost none of old libjxl versions can be built "from scratch" as they point to non-existent commits in this repo

spider-mario

2025-06-10 11:57:21	as far as I can tell, they don’t?
2025-06-10 11:57:37	I’m able to checkout libjxl v0.1 and the skcms commit it references from a fresh clone

CrushedAsian255

2025-06-10 01:08:26	can't you just pull from github or is that not include old history ?
2025-06-10 01:08:33	oh wait skcms
2025-06-10 01:09:06	oh wait git submodules

Melirius

2025-06-10 02:14:50

Presumably this was the source of the problem, I cannot reproduce it on my freshly pulled repo, sorry to bother

jonnyawsom3

2025-06-13 02:49:00	Seems like there were a few bugs with effort 1 not being lossless https://github.com/libjxl/libjxl/issues/4287
2025-06-13 05:08:47	And this one is only on main https://github.com/libjxl/libjxl/issues/4026

Demiurge

2025-06-14 03:50:06

Stuff like that should probably be backported

CrushedAsian255

	Demiurge Stuff like that should probably be backported
2025-06-14 07:30:22	are there any real reasons to support older versions? If an app is able to update from lets say 0.9 to 0.9.1 or something they should be able to update to 0.12.0 or the latest version

Demiurge

2025-06-14 07:31:50

No, not really. The only "real" reason is retards that will upgrade minor patches only, like Debian.

CrushedAsian255

2025-06-14 08:16:04	technically aren't all libjxl versions technically minor?
2025-06-14 08:16:08	as they're 0.x.y

A homosapien

2025-06-14 10:18:55

libjxl follows semantic versioning https://semver.org/

_wb_

	Seems like there were a few bugs with effort 1 not being lossless https://github.com/libjxl/libjxl/issues/4287
2025-06-16 01:18:23	that one was pretty subtle, this was the bugfix: https://github.com/libjxl/libjxl/pull/4291/files

nol

2025-06-16 02:03:48

Are there any plans to make a v0.12.0 release soon-ish? If not, is there a particular version/commit that is considered stable? I see that for some commits, the CI doesn't pass but there are also different number of checks run on the commits

_wb_

2025-06-16 02:13:00

I hope we will have a 0.12 soonish

HCrikki

2025-06-16 02:19:24

please consider shipping a precompiled **.DLL** (non-EXE) version of jpegli somewhere that can be dropped in place of mozjpeg (handy for both devs bundling unmodified dlls in their apps and users in need of swapping in jpegli for themselves).

afed

2025-06-16 02:28:04

at first, some PRs need to be merged like <https://github.com/google/jpegli/pull/116> <https://github.com/google/jpegli/pull/112>

jonnyawsom3

	_wb_ I hope we will have a 0.12 soonish
2025-06-16 02:36:25	Ideally, we can get some jpegli PRs merged and then pushed to libjxl too. They fix a lot of odd behaviour and blockers to adoption (Apple XYB support, Empty DHT marker crashes, XYB JXL transcoding failures)
2025-06-16 02:40:20	Oh, also the changelog should really be updated as part of PRs, instead of a single update before a release. Makes it much easier to manage and know what's important to note

_wb_

2025-06-18 03:02:49	lots of open issues at the libjxl repo; if someone can point me to the issues that are actual bugs that need fixing, that would be useful
2025-06-18 03:03:06	(we need to figure out a better way to triage issues and get them closed)

jonnyawsom3

2025-06-18 03:57:13	I closed between a few dozen and a hundred a few months ago, but many are pending developer input or I can't test myself to see if they can be closed
2025-06-18 04:14:02	Lots of them are "This image compresses worse than PNG/WebP", which I could probably close as duplicate, but that would take some time
2025-06-18 04:14:56	Though, it might be useful to keep them open for future benchmarking/testing

_wb_

2025-06-18 04:46:01

You can reference the ones you close from the one that is left open to keep the test cases reachable

jonnyawsom3

2025-06-18 04:47:14	Actually, I forgot Github should automatically link them when you close as a duplicate, so it'll just take time to go through all the issues again
2025-06-18 11:56:38	So, it would seem jpegli isn't a drop-in replacement, due to missing 12bit JPEG `undefined symbol: jpeg12_write_raw_data` <https://github.com/RawTherapee/RawTherapee/issues/7125#issuecomment-2746670684> > To create a JPEG file with 9 to 12 bits per sample, use `jpeg12_write_scanlines()` or `jpeg12_write_raw_data()` instead of `jpeg_write_scanlines()` or `jpeg_write_raw_data()`. <https://github.com/libjpeg-turbo/libjpeg-turbo/blob/81feffa632bcd928d4cd1c35e5bb6c1eb02ac199/doc/libjpeg.txt#L1092>
2025-06-18 11:57:59	Also... Maybe we should add a `jpegli` channel underneath this one, to help keep things separate

Demiurge

2025-06-19 11:12:23	I mentioned this before
2025-06-19 11:12:31	The high bit depth api is missing
2025-06-19 11:15:26	Please, please merge libjxl and jpegli back into one repo, and just improve the build system to make it trivially simple for packagers to build/install jpegli and libjxl as separate build/install targets.
2025-06-19 11:17:45	This will allow the most efficient consolidation and re-use of code and build objects, without the massive confusion of maintaining duplicate shared code in a separate repo...
2025-06-19 11:18:47	Plus, it will attract more awareness and attention to the existence of jxl, from people curious about jpegli

jonnyawsom3

2025-06-23 06:49:06

<https://github.com/libjxl/libjxl/pull/4298> > Wait, how are there valid bitstreams where `!use_prefix_code_ && num_to_copy_ != 0` ? > > I would expect an LZ77 run to terminate at the end of an entropy-coded stream, and never continue into the next entropy-coded stream, especially not if that one has LZ77 disabled. > > Maybe the spec needs to be clarified on this point. I'd prefer this to be invalid, and have the invariant that num_to_copy == 0 at the end of an entropy-coded stream. At the very least, this should be true at the end of every bitstream section because otherwise parallel decoding is impossible. > aren't use_prefix_code_ and lz77.enabled independent? > Oh right, my bad. I must have seen too much Deflate, causing me to conflate lz77 and prefix coding :) <@794205442175402004> Does your last comment invalidate your previous concern about multithreaded decoding? I don't know enough about LZ77 xP

_wb_

2025-06-23 06:50:50

No, but I'm assuming bitstream sections are always ending with num_to_copy == 0. At least the libjxl encoder should never produce runs that cross sections.

jonnyawsom3

2025-06-23 06:56:29

So it *is* valid, but generally shouldn't be done. Then when it *is* done, now libjxl will handle it gracefully on less threads instead of failing, with other files having no change in behaviour?

_wb_

2025-06-23 07:01:47	no, I think it is not valid to have lz77 runs that cross sections, but maybe the spec has to be more clear about it
2025-06-23 07:02:46	what that pull request fixes is lz77 runs that cross entropy-coded streams within the same section

jonnyawsom3

2025-06-23 07:24:25	Ahh right, I see now. I didn't realise you said section instead of stream
2025-06-23 07:25:44	Skimming the spec, I can't see anything explicitly mentioning sections, only this at the start of C.1 > The codestream contains multiple independently entropy-coded streams.

Tirr

2025-06-23 07:38:28

jxl-oxide discards leftover lz77 runs

_wb_

2025-06-23 07:40:53	I think it should be considered invalid input if there are leftovers, so it can discard but it can also refuse to decode.
2025-06-23 07:41:18	Unless we have a good reason to allow it.

jonnyawsom3

2025-06-23 07:42:48

My only thought would be for large images with lots of repeating elements, but then you'd hit distance limits anyway...

_wb_

2025-06-23 07:57:59

In any case, sections should not start with an ongoing lz77 run, that would break parallel decode and ROI decode. Only question is whether unfinished lz77 runs at the end of a section should be allowed (and ignored) or not. <@179701849576833024> thoughts?

veluca

2025-06-23 08:01:33

Totally agree on the first, I am not sure I have strong opinions on the second tbh

jonnyawsom3

2025-06-23 08:22:12

C.3.3 says > NOTE It is not necessarily the case that num_to_copy is zero at the end of the stream.

_wb_

2025-06-23 08:39:31

In case there are multiple entropy streams in a section, libjxl does create runs that cross the streams so the spec is right that it allows that. At the end of a section though, num_to_copy will always be zero in libjxl encodes, right <@179701849576833024> ? Or does fast lossless or something else produce runs that are too long?

veluca

2025-06-23 08:40:14

I don't _think_ so

_wb_

2025-06-23 08:41:56

If it is always zero at the end of a section, then I would propose to make that a requirement so it gives another way to detect corrupted streams. Decoders can still ignore it and decode anyway if they like (you can do whatever you want with invalid input), but they would also be allowed to return an error.

veluca

2025-06-23 08:46:37	tbh I don't see the harm in allowing it to be non0
2025-06-23 08:46:43	might make life easier for encoders

Tirr

2025-06-23 08:53:01

jxl-oxide creates fresh lz77 state per stream, a bit surprising that it didn't have any problem

veluca

2025-06-23 08:59:35	stream or section?
2025-06-23 08:59:46	or modular-sub-bitstream, actually
2025-06-23 09:00:12	(I would be surprised if we made vardct and modular share a ANS decoder)

Tirr

2025-06-23 09:07:08	I mean entropy-coded stream
2025-06-23 09:08:24	so leftover lz77 runs are just discarded and don't affect subsequent entropy-coded streams
2025-06-23 09:09:32	maybe I'm just confused and this is intended way to handle those situations, not sure

jonnyawsom3

	Tirr jxl-oxide creates fresh lz77 state per stream, a bit surprising that it didn't have any problem
2025-06-23 09:18:37	From the sounds of it, it's only become a problem recently, likely due to my PR re-enabling LZ77 for lossless by default. Would be handy if we could find an example file though

_wb_

2025-06-23 09:56:49	Well https://github.com/libjxl/libjxl/pull/4298 did fix failing roundtrips so that should give us an example file where libjxl encodes something that jxl-oxide (or libjxl 0.11) does not decode.
2025-06-23 10:01:50	btw `// TODO(veluca): No optimization for Huffman mode yet.` — isn't this trivial for Huffman mode?

veluca

2025-06-23 10:03:45

ah very likely

_wb_

2025-06-23 10:06:56	ah wait that function is called at the beginning of every modular channel, so probably the lz77 run does not actually cross entropy coded streams, but it does cross channels
2025-06-23 10:07:30	like when you have a grayscale image represented in RGBA, which after YCoCg becomes a Y channel followed by 3 channels that are all-zeroes
2025-06-23 10:08:22	then those all-zeroes could end up getting encoded as one large lz77 run
2025-06-23 10:09:40	(instead of the jxl-art way of encoding them as a singleton-histogram, which is what that fast path was made for)
2025-06-23 10:10:50	This is the only place where that function is being used: https://github.com/libjxl/libjxl/blob/main/lib/jxl/modular/encoding/encoding.cc#L203

veluca

	_wb_ ah wait that function is called at the beginning of every modular channel, so probably the lz77 run does not actually cross entropy coded streams, but it does cross channels
2025-06-23 10:12:01	yes, I would believe that

_wb_

2025-07-04 12:20:21	https://linuxsecurity.com/advisories/debian/debian-dsa-5958-1-jpeg-xl-bjcnnxypjiqt
2025-07-04 12:20:55	> For the stable distribution (bookworm), these problems have been fixed in version 0.7.0-10+deb12u1. > We recommend that you upgrade your jpeg-xl packages.

jonnyawsom3

2025-07-04 03:08:15

Huh, apparently builds are getting double zipped in the Github actions

Tirr

2025-07-05 09:58:40	I'm writing Rust-based libjxl frontend, and it seems that using Rayon thread pool is faster than default libjxl thread pool? though it's just one-shot encoding ```console $ time cjxl -d 0 -e 3 input.png output.jxl JPEG XL encoder v0.12.0 [_NEON_] Encoding [Modular, lossless, effort: 3] Compressed to 22998.5 kB (7.768 bpp). 5787 x 4093, 43.982 MP/s, 8 threads. cjxl -d 0 -e 3 input.png output.jxl 3.62s user 0.18s system 438% cpu 0.868 total $ time ./target/release/jexcel Encoder setup took 2.59 ms Decoding input took 150.41 ms Encoding and output took 526.74 ms ./target/release/jexcel 3.48s user 0.21s system 527% cpu 0.699 total ``` (both uses libjxl `81247d5c`, `-d 0 -e 3`)
2025-07-05 10:00:34	and 526.74 ms corresponds to 44.968 MP/s
2025-07-05 11:47:56	on d0 e1 rayon is much faster (1000 MP/s vs. 800 MP/s) but it doesn't seem to scale well

_wb_

2025-07-05 12:16:16

I wonder what they do differently

Tirr

2025-07-05 12:17:37

does libjxl thread pool do work stealing too?

spider-mario

2025-07-05 12:17:56

I don’t believe it does

Tirr

2025-07-05 12:18:38

I guess the difference is from some propety of work-stealing queue

spider-mario

2025-07-05 12:22:52

https://github.com/libjxl/libjxl/blob/81247d5cf29e472700d65ebef72d91edcde6a396/lib/threads/thread_parallel_runner_internal.cc

jonnyawsom3

	Tirr on d0 e1 rayon is much faster (1000 MP/s vs. 800 MP/s) but it doesn't seem to scale well
2025-07-05 02:53:26	For d0 e1, Clang is also substantially faster (150%+) than gcc. I figured out it's to do with the multithreading, so I wouldn't be surprised if something similar is happening https://github.com/libjxl/libjxl/issues/2268#issuecomment-2999629502

Tirr

2025-07-05 02:53:52

both are built using clang though

jonnyawsom3

2025-07-05 02:54:59

Yeah, I mean that if Clang is already doing something better than gcc, then maybe Rayon is doing even more. Though you already figured that out

Tirr

	Tirr on d0 e1 rayon is much faster (1000 MP/s vs. 800 MP/s) but it doesn't seem to scale well
2025-07-06 06:51:45	nevermind, I was setting `use_original_profile = 1` when doing lossy 😅 maybe it was color management overhead
2025-07-06 06:52:43	now rayon threadpool perf is comparable to libjxl threadpool
2025-07-06 07:24:27	anyway the code is here: <https://github.com/tirr-c/jexcel>, no prebuilt binaries though
2025-07-06 07:24:50	and it's encoding only currently
2025-07-06 08:35:56	in my testing, rayon threadpool is comparable to or faster than libjxl one, depending on image / encoder params

Lilli

2025-07-24 02:41:36	Hello hello, it's been a while. I was wondering if it's at all possible to set the `JxlPixelFormat` to FLOAT16, while my input data is in float32, and the data is far from the float16 limit
2025-07-24 02:42:36	I'm saying that because so far, it outputs float32 when checking the decompressed jxl
2025-07-24 02:57:27	I managed to get it to compress in uint16, which I suppose I could do but it wasn't the goal

jonnyawsom3

	Lilli I'm saying that because so far, it outputs float32 when checking the decompressed jxl
2025-07-24 03:17:46	Are you doing lossy or lossless? Lossy is always float32 internally with bit depth just being metadata

Lilli

2025-07-24 03:29:55	lossy oh okay, then that's fine I suppose
2025-07-24 03:30:41	When compressing with uint16, it's also float32 inside?

spider-mario

2025-07-24 03:37:59

yes

Lilli

2025-07-24 03:38:11	Okayyy thanks !
2025-07-29 08:37:01	So, with a bit of testing, I noticed that the file size is very different if I give uint16 compared to a float, but saying it's float16 and giving float32 works but does not make the filesize equal to the uint16 filesize (nor in the middle), it's just like the float32.
2025-07-29 08:44:00	Is there a way to make it behave like I imagine it should ? is it a matter of actually sending float16 data? Because there's no easy way to do that in c++

_wb_

2025-07-29 08:56:40	you are doing lossy, right? do the decoded images look the same?
2025-07-29 08:57:14	when passing data as uint (8 or 16), iirc it will implicitly assume it uses the sRGB transfer function
2025-07-29 08:57:33	while when passing it as float, the implicit assumption is that it's linear
2025-07-29 09:01:31	if the input is interpreted incorrectly, then the results of lossy compression will be wrong — the jxl image will not render correctly, and if you fix that by decoding to ppm/pfm and interpreting it correctly, then still results will be poor since the lossy compression will be done in a perceptually wrong way
2025-07-29 09:04:27	to avoid implicit assumptions, it's best to explicitly pass the colorspace info to the encoder

Lilli

2025-07-29 09:06:00	Yes lossy, and I always explicitly pass the colorspace as JxlColorEncodingSetToLinearSRGB
2025-07-29 09:12:30	What I have is a float32 array that I'd like to compress with a precision of float16 but the output filesize of the float16 is the same as float32 So maybe I'm doing it wrong :/
2025-07-29 09:59:15	Also, must the values in the float image be [0,1] ?

_wb_

2025-07-29 10:08:40	the nominal range is [0,1] but you can go outside it
2025-07-29 10:11:17	for lossy, the internal precision for processing is always float32 and the actual precision depends on the DCT quantization, which depends on the quality/distance setting and cannot be expressed in terms of precision of RGB samples since it's in the frequency domain and for XYB components

Lilli

2025-07-29 01:06:12

Okay, that's interesting, I suppose that's why it gives different results. I'm giving raw data ( [0,65000]) and it compresses completely differently in float compared to u16, which sounds like it's on purpose according to what you say, but it also means that the distance target I give has a totally different impact. When I normalize the input, I obtain exactly the same as u16 compression. (if I stretch my u16 input to 0-65535)

_wb_

2025-07-29 01:45:53	Float data in [0,65000] would be super bright since nearly everything is brighter than the nominal white. When decoding that to u16, nearly everything will be clamped to white.
2025-07-29 01:48:31	So you should normalize it to [0,1], and probably also do white balancing. Lossy is perceptual, so the input needs to correspond with how it gets rendered.

Lilli

2025-07-29 02:00:53

Yes, jxl really has this perceptual element embedded in, but my images are raw linear, coming from a camera, so I'm forced to use distance = 0.025 for the output to look like something. The image is basically mostly 0-2000 with some smaller high dynamic structures (stars...). I played around with the coefficient about target luminosity I don't recall the name, but it produced inconsistent results across my testing dataset

2025-07-29 02:01:43

They'll definitely not be rendered (not before having gamma and co applied), they're just for storage (because there's actually nothing beating jxl right now on my dataset)

jonnyawsom3

2025-07-29 02:06:44

Yeah, right now there's no way to tell the encoder "Hey, this is for editing, don't crush the blacks". There is intensity target, but that's a workaround with it's own problems

Lilli

2025-07-29 03:06:19

What does this intensity target really do? I've tried it with a few different parameters, I guess it somehow shifts the relative importance of the regions Also another question maybe less related, as far as I understand a distance of 1 is supposed to be "perceptually equivalent", but if I, say, zoom x2, then should the distance be 0.5, 0.25, 0.75, 0.1 .. ?

_wb_

2025-07-29 03:33:03	zooming in or increasing display brightness makes artifacts easier to see but by how much exactly is hard to tell and may depend a bit on the image content itself
2025-07-29 03:36:19	intensity target is how many nits the signal value 1.0 (the nominal max value) is rendered at

Lilli

2025-07-29 03:56:03	Oh I see ! So I'd need quite a big value for my content since I assume. I'm not sure I'll get around to testing that but it's good to know if I need more control over the compression ratio
2025-07-29 03:56:42	because at some point, reducing the distance doesn't increase quality anymore, it sort of "caps out" even though artifacts are still present

jonnyawsom3

2025-07-29 04:14:34

Yeah, generally for floats increasing intensity target to 10000 or 20000 is the way to go, since then it still 'sees' the perceptual image, but doesn't see the brightness difference

_wb_

2025-07-29 06:39:31

As a rule of thumb: if you make your input a PNG file then the way it looks in Chrome is the way cjxl sees it, i.e. what the perceptual compression is trying to work with. In your case you cannot do that since your input is not something that is rendered directly but it's a raw that will still be heavily postprocessed. Libjxl isn't really designed for that. I would recommend either doing lossless raw, or cook the raw enough to make it a visual image that makes sense as-is before applying lossy compression to it.

𝕰𝖒𝖗𝖊

_wb_ As a rule of thumb: if you make your input a PNG file then the way it looks in Chrome is the way cjxl sees it, i.e. what the perceptual compression is trying to work with. In your case you cannot do that since your input is not something that is rendered directly but it's a raw that will still be heavily postprocessed. Libjxl isn't really designed for that. I would recommend either doing lossless raw, or cook the raw enough to make it a visual image that makes sense as-is before applying lossy compression to it.

2025-07-29 08:10:40

> if you make your input a PNG Is there a similar reason for why we can't achieve perfect metric scores (butteraugli, ssimulacra2, cvvdp) with lossless JPEG reconstruction? I always see scores such as ~85 ssimulacra2. If I create an intermediate PNG file; then the true/modular lossless JXL works as expected

spider-mario

2025-07-29 08:20:22	is that 85 with the same decoder for source and reconstructed JPEG? if you pass the original jpeg + the recompressed jxl, AFAIK butteraugli/ssimulacra2 will use libjpeg-turbo to decode the former but libjxl for the latter, which gives different (better) results (but still compliant)
2025-07-29 08:22:25	since the original JPEG can be reconstructed bit-exactly from the jxl, there is obviously no inherent loss, but the JPEG standard gives decoders some leeway regarding the exact decoded pixels

𝕰𝖒𝖗𝖊

2025-07-29 08:22:43
2025-07-29 08:23:08
2025-07-29 08:23:50	`djxl` then doing the tests, still the same

spider-mario

2025-07-29 08:23:57

yeah – to get a better score, you should either decode `ref.jpg` the same way that libjxl does to a PNG and pass that (I believe djpegli should get you there or close), or decode `enc.jxl` to a JPEG and pass the result

𝕰𝖒𝖗𝖊

2025-07-29 08:24:29

spider-mario

2025-07-29 08:24:42

decoding `enc.jxl` to a PNG instead of a JPEG will have the exact same “problem” as passing `enc.jxl` directly

𝕰𝖒𝖗𝖊

2025-07-29 08:24:46

can `djxl` decode jpg?

spider-mario

2025-07-29 08:24:56

iirc no

𝕰𝖒𝖗𝖊

2025-07-29 08:25:11

then how can we decode both of them the same way?

spider-mario

2025-07-29 08:25:26

djpegli should give the same or at least similar results

𝕰𝖒𝖗𝖊

2025-07-29 08:25:41

let me check

spider-mario

2025-07-29 08:26:03

but really, to test lossless JPEG reconstruction, arguably the best way is to check that you can reconstruct the original file

𝕰𝖒𝖗𝖊

2025-07-29 08:26:06

spider-mario

2025-07-29 08:26:14

if you can then you know that you can, in principle, arrive at the same pixels

𝕰𝖒𝖗𝖊

2025-07-29 08:27:17	I also tried building libjxl dynamically (the current build is static) and using the provided `libjpeg.so` system-wide
2025-07-29 08:27:22	ther result is also the same

_wb_

2025-07-29 08:34:29	ssimulacra2 is very sensitive to even small errors. If you use djpegli to decode to a 16-bit PNG, the score should be higher, but still not 100 since there will be tiny differences compared to the float32 decode
2025-07-29 08:35:00	Can djpegli decode to PFM? Maybe if you do that, you can get a perfect match
2025-07-29 08:40:22	JPEG does not have a fully specified decode (there are large tolerances for conforming decoders) so you cannot really check losslessness by looking at decoded pixels. The DCT coefficients are defined exactly, but not the RGB reconstruction — it is defined mathematically but the precision required by the spec is quite low and many popular JPEG decoders like libjpeg-turbo use a fast and low memory but quite low precision implementation (quantizing every intermediate step to 8-bit, etc).

𝕰𝖒𝖗𝖊

	_wb_ Can djpegli decode to PFM? Maybe if you do that, you can get a perfect match
2025-07-29 08:44:36	Yes, it can. and the score increases
2025-07-29 08:44:52	but never seen over 90
2025-07-29 08:45:12	if it can't be decoded as is How would viewers decode it as the same?

_wb_

2025-07-29 08:46:01

They don't, different viewers show slightly different pixels for the same jpeg file.

𝕰𝖒𝖗𝖊

2025-07-29 08:46:55

So, it's a kind of a limitation of JPEG?

_wb_

2025-07-29 08:47:23

I would expect djpegli to be closer to the djxl decode of jxl-recompressed-jpegs, but maybe there's some difference.

𝕰𝖒𝖗𝖊

	𝕰𝖒𝖗𝖊 So, it's a kind of a limitation of JPEG?
2025-07-29 08:48:10	I first assumed that, since JPG is already lossy; JXL would handle artifacts in a different way
2025-07-29 08:48:12	so we get different scores
2025-07-29 08:48:16	but it's probably not the case here

_wb_

2025-07-29 08:48:38	You could try turning off Chroma from Luma when recompressing jpegs, that introduces some difference that is too small to matter for reconstructing the original jpeg file but is still there
2025-07-29 08:50:02	Even with libjpeg-turbo alone, you can decode a jpeg in different ways depending on the options you give it, and you will get ppm files that have some differences
2025-07-29 08:53:27	ssimulacra2 is very picky about small differences, even converting an 8-bit image to 16-bit will cause it to see some error even though mathematically those two images would be identical (8-bit numbers correspond exactly to 16-bit numbers since 255 is a divisor of 65535). It still sees a difference because they don't convert to the exact same float32 values (due to precision limits in the conversion arithmetic), so you get a score below 100.

𝕰𝖒𝖗𝖊

2025-07-29 08:54:00	Got it
2025-07-29 08:54:08	Then modular lossless is extremely precise with it
2025-07-29 08:54:20	so I always get 100
2025-07-29 08:55:36	Here, the jpeg decoding is not a limitation

spider-mario

2025-07-29 09:06:03	that’s because like `ssimulacra2`, `cjxl` uses libjpeg-turbo to get the pixel values from the input jpeg
2025-07-29 09:06:28	if an update to `ssimulacra2` made it use jpegli instead, that jxl file made from libjpeg-turbo pixels would stop getting a score of 100
2025-07-29 09:07:17	it’s not inherently any more correct than the jxl you get with lossless jpeg recompression
2025-07-29 09:07:21	just more libjpeg-turbo-esque

𝕰𝖒𝖗𝖊

	spider-mario just more libjpeg-turbo-esque
2025-07-29 09:11:30	I actually tried to replace `libjpeg-turbo` with jpegli while statically compiling `libjxl`
2025-07-29 09:11:32	but I got many errors
2025-07-29 09:11:50	I am assuming jpegli jpeg library misses some functions

jonnyawsom3

2025-07-31 08:42:29

<@794205442175402004> I assume these quant tables are 'backwards' compared to classic JPEG? With LF at the start and HF at the end? <https://github.com/libjxl/libjxl/blob/c50010fb13160e4d2745ee27c6458aa63c51ebce/lib/jxl/enc_modular.cc#L95>

_wb_

2025-07-31 08:50:29

yes, it's HF to LF there — the array index corresponds to the hshift+vshift where 0 is the highest frequency; the LF of JPEG/VarDCT would correspond to index 6 (hshift==vshift==3, since it corresponds to 8x8 downscaling)

jonnyawsom3

2025-07-31 08:56:54

Wait, HF to LF? Don't you want less quantization for the high frequency? B starts all the way at 2048

_wb_

2025-07-31 08:26:59	higher quant factor means more aggressive quantization, lower quality
2025-07-31 08:28:26	in the context of squeeze, if the first two quant factors are +infinity for X and B, that basically corresponds to 4:2:0 chroma subsampling.
2025-07-31 08:31:53	note that these quant factors are scaled according to the distance setting

jonnyawsom3

2025-07-31 09:19:15	Huh, interesting. That would explain the chroma bleeding I noticed
2025-07-31 09:42:54	It took me a while to get DCT out of my head, but I get it now. Each quant entry is for a squeeze level, I must've only skimmed your first message

A homosapien

2025-08-03 06:24:26

There might be a few bugs when using photon noise combined with lossy progressive. Photon noise seems be applied to the LF and the HF. This is causing a LF blurry noise texture to be present. This is present in all major versions of libjxl I've tested so far. Another issue is that photon noise is much much stronger when used with progressive, likely meaning it's being applied multiple times over.

CrushedAsian255

A homosapien There might be a few bugs when using photon noise combined with lossy progressive. Photon noise seems be applied to the LF and the HF. This is causing a LF blurry noise texture to be present. This is present in all major versions of libjxl I've tested so far. Another issue is that photon noise is much much stronger when used with progressive, likely meaning it's being applied multiple times over.

2025-08-03 07:43:54

maybe its being applied once to each progressive scan

A homosapien

2025-08-03 09:12:03	That's what I think as well
2025-08-03 09:39:23	I made an issue https://github.com/libjxl/libjxl/issues/4368

Traneptora

	𝕰𝖒𝖗𝖊 Then modular lossless is extremely precise with it
2025-08-03 10:43:09	it's less that modular lossless is "extremely precise" but modular, fundamentally, is a sophisticated way to store an array of integers, not of floats
2025-08-03 10:43:53	so there's no precision issue because the pixel arrays are identical integer arrays

jonnyawsom3

2025-08-03 10:52:15

Photon noise is flagged in the frame header, so it's likely being set for every frame instead of just the final one > A major condition to trigger this bug is that the image has to be larger than 2048x2048, implying chunked encoding must be at play here. That's more interesting though, since the frame header can't have a value other than 0 or 1, so either the ISO setting is somehow getting added internally per MA tree, or it's a decoder bug

spider-mario

	Traneptora so there's no precision issue because the pixel arrays are identical integer arrays
2025-08-03 11:28:15	in this instance, it arguably has less to do with VarDCT being conceptually floats and more to do with it being, well, DCT coefficients and not pixels
2025-08-03 11:28:38	VarDCT stores exactly the original JPEG’s DCT coefficients; Modular stores exactly the libjpeg-turbo-decoded pixels
2025-08-03 11:29:18	the proper comparison to detect whether there is any loss compared to the original JPEG file is arguably the former

jonnyawsom3

2025-08-03 11:29:43

There's an extremely long Github issue complaining that storing the DCT is less acccurate than the pixels, but testing showed that the DCT is (unsurprisingly) more accurate to the original

𝕰𝖒𝖗𝖊

	Traneptora so there's no precision issue because the pixel arrays are identical integer arrays
2025-08-03 11:30:22	So it's: JPEG path: Original pixels -> JPEG DCT coefficients -> Decoded pixels (with decoder variations) Modular path: Original integer pixels -> Stored integer pixels -> Identical integer pixels
2025-08-03 11:30:47	There's no mathematical conversion or loss because you're storing the exact same integers you started with.

JaitinPrakash

𝕰𝖒𝖗𝖊 So it's: JPEG path: Original pixels -> JPEG DCT coefficients -> Decoded pixels (with decoder variations) Modular path: Original integer pixels -> Stored integer pixels -> Identical integer pixels

2025-08-03 11:51:09

The reconstruction route is JPEG DCT → VarDCT → jxl decoder, while the modular route is JPEG DCT → pixels → Modular → jxl decoder, and if ssimulacra2 also uses JPEG DCT → pixels with the same libjpeg as modular, then despite reconstruction technically being a more exact lossless, it introduces an extra variance in the form of libjxl and libjpeg decoding the DCT differently, causing that discrepancy. Or at least, that's what i think is going on here. So reconstruction is more lossless than modular lossless from an extremely exact perspective, but because ssimulacra2 reads the DCT differently between the source and repacked jpeg, the modular lossless, which used the same DCT → pixels step as ssimulacra2, is more similar to the jpeg.

𝕰𝖒𝖗𝖊

2025-08-04 12:59:57

Not just ssimulacra2, it's similar for butteraugli (pretty much expected since it's libjxl) but also external metrics such as CVVDP or others The decoding difference applies to all cases here, then

_wb_

2025-08-04 07:36:51	ssimulacra2 does take jpeg as input but it's not really well-defined what that means since ssimulacra2 compares pixel values and those are not fully specified for a jpeg image. The current jpeg decoder it uses is libjpeg-turbo but if we change that (or use different decoder options) the pixels will be different and the ssimulacra2 scores will change too.
2025-08-04 07:38:47	if you want to compare two jpegs with a pixel-based metric, you should use the same decoder implementation for both, otherwise the decoder difference is included in the metric score and you'll end up with nonsense conclusions like an image being different from itself.
2025-08-04 07:44:19	It would help if we'd make it so libjxl can also decode jpegs directly; the main ingredients to do that is already in the codebase anyway, you just have to load the jpeg as if you're going to recompress it, but then skip the entropy coding and decoding and just feed it directly to the render pipeline. Though it may be a bit more complicated in practice since probably the encode-side structs and decode-side structs don't match up that nicely.

jonnyawsom3

_wb_ It would help if we'd make it so libjxl can also decode jpegs directly; the main ingredients to do that is already in the codebase anyway, you just have to load the jpeg as if you're going to recompress it, but then skip the entropy coding and decoding and just feed it directly to the render pipeline. Though it may be a bit more complicated in practice since probably the encode-side structs and decode-side structs don't match up that nicely.

2025-08-04 08:15:15

Removing dependencies on external JPEG libraries and just using jpegli would also be nice. Requiring other JPEG libraries, to build a JPEG library is... Odd

_wb_

2025-08-04 10:04:41

libjxl itself doesn't depend on an external jpeg library, but yes, the tools on the libjxl repo do use it. It's not a strict dependency iirc — you could build a cjxl/djxl/ssimulacra2/etc that can only deal with jxl and ppm/pam.

jonnyawsom3

2025-08-04 10:08:33

I meant for compiling jpegli too, since it's intertwined in the repo (even with the Google repo split, it has duplicate libjxl files so it can't be a submodule)

𝕰𝖒𝖗𝖊

2025-08-04 10:09:04	we also can't build with jpegli static library instead of libjpeg-turbo I tried to modify the build process, and create a phantom pkgconfig by also renaming compiled jpegli library to `libjpeg.a` The compilation won't be successful.
2025-08-04 10:09:20	AFAIK, jpegli provided library should cover everything, no?
2025-08-04 10:10:00	> When building the project, two binaries, tools/cjpegli and tools/djpegli will be built, as well as a lib/jpegli/libjpeg.so.62.3.0 shared library that can be used as a drop-in replacement

jonnyawsom3

2025-08-04 10:11:35

It's also missing the 12bit API, so it's not actually a drop in replacement for some applications

CrushedAsian255

	_wb_ libjxl itself doesn't depend on an external jpeg library, but yes, the tools on the libjxl repo do use it. It's not a strict dependency iirc — you could build a cjxl/djxl/ssimulacra2/etc that can only deal with jxl and ppm/pam.
2025-08-05 01:31:57	to decode a JPEG with only libjxl, can't you use the libjxl JPEG->JXL transcoding, and then decode the resultant JXL?

𝕰𝖒𝖗𝖊

	CrushedAsian255 to decode a JPEG with only libjxl, can't you use the libjxl JPEG->JXL transcoding, and then decode the resultant JXL?
2025-08-05 01:59:19	you can just encode it in this case? <:galaxybrain:821831336372338729>
2025-08-05 01:59:50	but technically you decode a JXL then; not the JPEG

_wb_

	CrushedAsian255 to decode a JPEG with only libjxl, can't you use the libjxl JPEG->JXL transcoding, and then decode the resultant JXL?
2025-08-05 06:54:19	Yes you can. There would be room for optimization though, since doing it that way does unnecessary entropy coding + decoding.

CrushedAsian255

2025-08-05 06:55:15

I feel it would be beneficial to have libjxl also handle JPEGs , and have a libjpeg compatible API, so it can be a drop in replacement to libjpeg

jonnyawsom3

2025-08-05 07:16:18

An idea that's come up a few times, but even jpegli isn't fully drop-in, so trying to make libjxl so would be rough

CrushedAsian255

	_wb_ Yes you can. There would be room for optimization though, since doing it that way does unnecessary entropy coding + decoding.
2025-08-05 10:43:38	Proof of Work 😜

jonnyawsom3

2025-08-06 01:06:57	https://github.com/libjxl/libjxl/issues/4376
2025-08-06 01:07:16	I left a comment, as I think there's still a lot left to do
2025-08-06 01:10:10	A lot of pending jpegli PRs, some bugs recently found and some deeper issues that have bodged fixes in place... But it's certainly about time we got a new version out

afed

2025-08-06 01:11:43

yeah, that would be nice

HCrikki

2025-08-06 02:28:14

please a **.DLL** of jpegli a user or lazy dev can use to swap their existing jpeg decoder/encoder library without needing to bother with source code

Quackdoc

2025-08-08 03:55:31	compiling libjxl with `-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=TRUE` seems to really balloon the size for some reason
2025-08-08 03:58:43	~~meson when?~~
2025-08-08 04:09:55	actually this dudes build script is just fubar, no idea how he is building
2025-08-08 09:36:28	so highway is the cause of both major size regressions in libjxl, at least it's localized to one dependency. would it be helpful workflows would have one to specifically check file sizes of various compile envs?

jonnyawsom3

2025-08-08 10:33:08

<@811568887577444363> the above message is in regards to this issue https://github.com/libjxl/libjxl/issues/3887 (I think)

Quackdoc

2025-08-08 11:12:50

https://cdn.discordapp.com/emojis/721359241113370664?size=64

Lilli

2025-08-08 03:02:44

Hi there it's me again! I was testing my implementation regarding memory usage. I implemented the chunking approach for writing in a file in chunks of 1MB of input data i.e something like this: ``` // START_ALLOC is 1<<20, 1MiB std::vector<uint8_t> lCompressed(START_ALLOC); uint8_t* lNext_out = lCompressed.data(); size_t lAvail_out = lCompressed.size(); while ( true ) { JxlEncoderStatus lResult = JxlEncoderProcessOutput(lEnc.mEnc.get(), &lNext_out, &lAvail_out); size_t lWritten = lNext_out - lCompressed.data(); if ( lWritten > 0 ) {// write data to file on stream if ( fwrite(lCompressed.data(), 1, lWritten, lEnc.mFp) != lWritten ) { lError = "Failed to write to file: "+ pFilename + "\n"; break; } // Always reset buffer for next chunk lNext_out = lCompressed.data(); lAvail_out = lCompressed.size(); } if ( lResult == JXL_ENC_NEED_MORE_OUTPUT ) { // we must not resize, just continue continue; } ... ``` Of course I used the `JxlEncoderAddChunkedFrame`. So this is how I write into my file without needing the entire output buffer in memory, and this works. But it still takes a big amount of memory! I encode in Lossy and tried different settings of distance and effort to no avail. The peak memory usage for a file of about 300MB is 832MiB! (including the initial 300MB of input data) It seems a copy of the initial data is performed at some point: `286.0 MiB: (anonymous namespace)::JxlEncoderAddImageFrameInternal(JxlEncoderFrameSettingsStruct const*, unsigned long, unsigned long, bool, jxl::JxlEncoderChunkedFrameAdapter&&) (encode.cc:2366)` This line seem to allocate 286MiB Which is the size of the input data. I used Massif for the profiling:

2025-08-08 03:09:34

I can give more code of course, but I don't want to unnecessarily pollute this channel 🙂 I of course only pass pointers around... I'm not sure how to specify that chunked frame should only use the data as is, and not copy it

Demiurge

2025-08-10 09:08:58	Hey <@238552565619359744> I saw the work you recently did and I'm pretty impressed. I hope your patches get merged.
2025-08-10 09:09:23	But I'm sure they will.

jonnyawsom3

2025-08-10 09:26:38

I've not done anything recently, just made notes of ideas to try

Quackdoc

2025-08-10 09:30:42	imma bisect highway and see if I cant figure out what commit of that made it baloon so much
2025-08-10 10:00:59	yeah bisected it down to the commit, commented on the issue thread of the codegen, dynamic dispatch commit is what gone done it

Lilli

2025-08-11 06:47:29

Would someone know who to ask about the memory consumption I noted above ?

jonnyawsom3

	Lilli Would someone know who to ask about the memory consumption I noted above ?
2025-08-11 06:59:15	There's two ideas I have. The first is adding `JxlEncoderFrameSettingsSetOption(frame_settings, JXL_ENC_FRAME_SETTING_PATCHES, 0);` to disable Patches The other is trying effort 1 lossless, as that has minimal memory usage for encoding. Then you can see if it's an encoder problem or a buffer handling issue

Lilli

2025-08-11 06:59:36

I didn't mention, I'm using lossy !

jonnyawsom3

2025-08-11 07:02:50

I know, patches works on lossy and the lossless is just to test memory overhead

Lilli

	There's two ideas I have. The first is adding `JxlEncoderFrameSettingsSetOption(frame_settings, JXL_ENC_FRAME_SETTING_PATCHES, 0);` to disable Patches The other is trying effort 1 lossless, as that has minimal memory usage for encoding. Then you can see if it's an encoder problem or a buffer handling issue
2025-08-11 07:11:55	Ok I tried it, The command to disable patches doesn't seem to change anything in terms of memory The effort 1 also did not change anything regarding memory either, the peak memory usage is the same 832MiB
2025-08-11 07:15:22	It seems the graph showing the distribution of memory is a little different, but the total is the same. Notably the "unknown inlined fun" now is much lower, but the total is still the same
2025-08-11 07:16:47	I can give the massif files, so that one could check what's going on
2025-08-11 07:52:29	I did it again making sure I was lossless vs lossy, and lossless effort 1 uses even more memory at 852MiB, lossy effort 1 is 832 with or without patches
2025-08-11 07:55:59	the peak memory usage is highlighted and the detail is on the other image. Data for Lossless Effort 1 The orange part corresponds to the 286MiB allocated by the `JxlEncoderAddChunkedFrame` The cyan part is the `jxl::AlignedMemory` 181MiB The blue part is apparently a `std::vector<jxl::Token>` 96MiB

_wb_

2025-08-11 08:08:49

that looks like most memory is allocated not by libjxl but by your tiff reader, no?

Lilli

2025-08-11 08:17:07

You are correct, the Tiff reader allocated the 286MiB that is then the input to the Jxl encoder. Then the Jxl encoder's "AddChunkedFrame" allocates the same amount again (which is my problem). I give the pointer to my data inside the "chunking state" opaque structure inside the `JxlChunkedFrameInputSource` I want to understand how to avoid an entire copy of my data 🤷‍♂️

_wb_

2025-08-11 08:31:56

how large is the image and how large are the chunks?

Lilli

2025-08-11 08:33:56

The image is 4291x11649 What do you mean how large are the chunks? I use JXL_ENC_FRAME_SETTING_BUFFERING= 2

jonnyawsom3

2025-08-11 09:07:58

<https://github.com/libjxl/libjxl/blob/095b8b2a088483b2f95e33638778db935ea9d43f/lib/jxl/enc_frame.cc#L1772> ``` // TODO(veluca): handle different values of `buffering`. if (frame_data.xsize <= 2048 && frame_data.ysize <= 2048) { return false;``` Currently `JXL_ENC_FRAME_SETTING_BUFFERING=2` is using the same behaviour as `JXL_ENC_FRAME_SETTING_BUFFERING=1`, though I think the results you're getting are still correct. Lossy effort 7 uses around 400 MB, with the image itself being 143 MB and your TIFF loader using 286 MB which adds up to over 800 MB. The strange part to me, is that effort 1 lossless used more... That should use around 170 MB instead of 400 MB like effort 7 lossy

2025-08-11 09:09:21

Though, my results are based on an 8K image, so only 30 MP instead of 50 MP

Lilli

2025-08-11 09:14:36

Oh wow, okay! Thank you for finding that out! For lossless I just turned the distance to 0, but if there's another flag to raise, I may have failed to do it properly In any case, that makes it very hard for me to use then, I have 2GB of RAM on embedded device ...

jonnyawsom3

2025-08-11 09:16:44

This may be of interest, if lower effort levels of libjxl aren't enough https://github.com/Traneptora/hydrium

Lilli

2025-08-11 09:17:47	Okay that may be ! Thanks a lot for all this information !
2025-08-11 09:37:16	Still, I don't really get the point of making a streaming API if the whole data is copied first in memory... I might've missed something. But this hydrium thing looks quite early stages, hopefully it gives similar compression results

jonnyawsom3

Lilli Still, I don't really get the point of making a streaming API if the whole data is copied first in memory... I might've missed something. But this hydrium thing looks quite early stages, hopefully it gives similar compression results

2025-08-11 09:57:50

``` wintime -- cjxl --streaming_input --streaming_output Test.ppm nul JPEG XL encoder v0.12.0 7deb57d7 [_AVX2_,SSE4,SSE2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 4857.7 kB (1.171 bpp). 7680 x 4320, 7.532 MP/s [7.53, 7.53], , 1 reps, 16 threads. PageFaultCount: 909106 PeakWorkingSetSize: 367.8 MiB QuotaPeakPagedPoolUsage: 234.1 KiB QuotaPeakNonPagedPoolUsage: 21.38 KiB PeakPagefileUsage: 398.5 MiB wintime -- cjxl Test.ppm nul JPEG XL encoder v0.12.0 7deb57d7 [_AVX2_,SSE4,SSE2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 4857.7 kB (1.171 bpp). 7680 x 4320, 7.462 MP/s [7.46, 7.46], , 1 reps, 16 threads. PageFaultCount: 960855 PeakWorkingSetSize: 567.9 MiB QuotaPeakPagedPoolUsage: 44.44 KiB QuotaPeakNonPagedPoolUsage: 21.77 KiB PeakPagefileUsage: 688 MiB ``` Streaming certainly does make a difference, so maybe <@794205442175402004> has an idea what's wrong with your implementation. Hydrium is equilavent to libjxl effort 3

Lilli

2025-08-11 10:22:31

Oh I see ! I suppose the relevant code is this part. I think I'm just filling the structures according to the documentation ```cpp struct ChunkingState { const uint8_t* mImage; // Pointer to the image data uint32_t mWidth; uint32_t mOffset; uint32_t mChannels; uint32_t mBytesPerPixel; JxlDataType mInputDataType; }; // in my export jxl function ChunkingState state = {reinterpret_cast<const uint8_t*>(pBuffer), pSizeX, 0, pChNb, pChNb * lInputBitsPerSample / 8, lType}; JxlChunkedFrameInputSource lChunked = {}; lChunked.opaque = &State; lChunked.get_color_channels_pixel_format = get_color_channels_pixel_format; lChunked.get_color_channel_data_at = get_color_channel_data_at; lChunked.release_buffer = release_buffer; JxlEncoderFrameSettings* lFrameSettings = JxlEncoderFrameSettingsCreate(lEnc.get(), nullptr); //... some more options, effort, distance JxlEncoderFrameSettingsSetOption(lFrameSettings, JXL_ENC_FRAME_SETTING_BUFFERING, 2); JxlEncoderStatus lIsChunkedSet = JxlEncoderAddChunkedFrame(lFrameSettings, true, lChunked); ```

_wb_

2025-08-11 10:28:35

for fast-lossless e1 the pixel format has to be uint8 or uint16 iirc, not a float one

Lilli

2025-08-11 10:30:12	I see, in my case I want e4 with a distance of 0.025
2025-08-11 10:30:40	So, not fast and not lossless, just float

_wb_

2025-08-11 10:30:49	ah ok
2025-08-11 10:32:15	that's a very low distance, are you sure you're feeding the input in a way that makes perceptual sense, i.e. not something that renders as a near-black image?

Lilli

2025-08-11 10:34:11	yes it does render as a near black image, and we've tried to transform it into a "perceptually meaningful image", which requires then another transform to undo it when decompressing. But the whole point of my process is to compress the raw sensor image so that the processing can be done elsewhere
2025-08-11 10:35:31	But I imagine that won't be the issue regarding memory usage would it?

_wb_

2025-08-11 11:19:47	No it shouldn't.
2025-08-11 11:26:29	But I would avoid encoding near-black images. Even if you cannot do real channel balancing, you maybe could just scale things by some constant so it's not super dark. If you can get it so d1 looks reasonable and you can use d0.2 or so, you will probably get better results than using d0.025 on an image where d1 looks very bad...

Lilli

2025-08-11 11:31:19	I hear you, this is an issue I've discussed on this channel before, also playing with the intensity target. I have to do with the constraints we currently face. Currently, the compression is perfectly within our requirements, and JXL performs much better than the other encoders we tried, even with our near black image. It's just that if libjxl copies my buffer internally or whatever it does, I just can't use it. My goal is to keep the input image, so there's 300MB of data that I don't want to free, and then a footprint of 150MB for the compression would be very acceptable, just not 450MB...
2025-08-11 11:32:47	And hydrium sounds great, but I'd rather make small changes to my codebase instead of importing a new library that I'd need to wire from scratch.
2025-08-11 11:35:37	You mentioned earlier "what size are your chunks" but I don't see how to specify chunk size, or even give separate chunks, what did you mean?

jonnyawsom3

2025-08-11 11:45:22

I didn't realise you were using such a low distance, IIRC hydrium is fixed at distance 1

_wb_

2025-08-11 11:55:33

I don't remember how the chunked encode api works but the idea is that you give large images in tiles to the encoder. <@179701849576833024> wrote most of that iirc so maybe he remembers

Lilli

2025-08-11 12:05:53

I made a few modifications which seem to make a difference. Seems to be in the default configuration, it by default disables streaming, even if it's activated by BUFFERING=2

veluca

	_wb_ I don't remember how the chunked encode api works but the idea is that you give large images in tiles to the encoder. <@179701849576833024> wrote most of that iirc so maybe he remembers
2025-08-11 12:17:01	I barely remember my name these days 😉

jonnyawsom3

2025-08-11 12:25:06

There's two checks that need to pass for full chunked/streamed encoding <https://github.com/libjxl/libjxl/blob/095b8b2a088483b2f95e33638778db935ea9d43f/lib/jxl/enc_frame.cc#L1772> and <https://github.com/libjxl/libjxl/blob/095b8b2a088483b2f95e33638778db935ea9d43f/lib/jxl/enc_frame.cc#L1644> (I think)

Lilli

2025-08-11 02:20:32

I have explicitly set most FrameSettings Options so that it passes each test in the `CanDoStreamingEncoding()`... but I'm not really sure it passes them all. I see there is a sort of a box concept, is that what the streaming is supposed to use ? I'm really struggling with the documentation 🙁

TheBigBadBoy - 𝙸𝚛

2025-08-11 02:36:22

what is the problem with 800MB of used RAM ? do you want to encode JXL files on an embedded system or something ?

Quackdoc

2025-08-11 02:36:57

800mb would still be a lot on something like a low end android phone

jonnyawsom3

	Lilli Oh wow, okay! Thank you for finding that out! For lossless I just turned the distance to 0, but if there's another flag to raise, I may have failed to do it properly In any case, that makes it very hard for me to use then, I have 2GB of RAM on embedded device ...
2025-08-11 02:37:42	<@693503208726986763> > In any case, that makes it very hard for me to use then, I have 2GB of RAM on embedded device ...

TheBigBadBoy - 𝙸𝚛

2025-08-11 02:39:20

yeah see ? they can even encode 2 images in parallel <:KekDog:805390049033191445> ||/s||

Lilli

2025-08-12 07:33:11	I feel that one issue was that I didn't use the `JxlEncoderOutputProcessor`, which I now do, along with `JxlEncoderFlushInput` And now the memory usage ballooned to 3GB! fantastic
2025-08-12 07:36:56	This is cjxl with my input data as a png file (which is indeed 286MiB on my machine too) with `--streaming_output` I find it curious that it has twice the 286MiB in memory... I would go as far as to say, suspicious 🧐
2025-08-12 07:37:17
2025-08-12 07:38:19	One is in the decode Image APNG, and one is in the JxlEncoderAddImageFrame
2025-08-12 07:45:26	it doesn't call addChunkedFrame... hmmm
2025-08-12 07:52:18	I barely remember my name these days 😉

jonnyawsom3

	Lilli This is cjxl with my input data as a png file (which is indeed 286MiB on my machine too) with `--streaming_output` I find it curious that it has twice the 286MiB in memory... I would go as far as to say, suspicious 🧐
2025-08-12 08:42:43	Try converting to PFM temporarily with --streaming_input

Lilli

2025-08-12 08:50:03	Oh wow okay
2025-08-12 08:50:25
2025-08-12 08:50:44	it did take only 300MiB which I think is a little too much still, but would be manageable The most surprising thing is that there's no full internal copy of the data anymore
2025-08-12 08:52:07	so I'd need to figure out how to setup the encoder to do exactly whatever this is doing

Traneptora

2025-08-14 12:13:21	Is there a way for a libjxl decoder client to tell libjxl that there's no more data remaining?
2025-08-14 12:14:01	If you subscribe to the `BOX` and `BOX_COMPLETE` events, it always expects more boxes, so it returns `NEED_MORE_INPUT` even when there's no more boxes remaining
2025-08-14 12:14:15	and never returns `DEC_SUCCESS`
2025-08-14 12:17:10	Normally, a client that has no more data to offer to libjxl that receives `JXL_DEC_NEED_MORE_INPUT` will assume the file is truncated
2025-08-14 12:17:30	but that isn't the case if you subscribe to those events

Lilli

2025-08-14 01:58:31

BTW I managed in the end, I'm not exactly sure why now it works, but it does, memory usage went from 800+ to 550, we'll work with that Thank you for you time and efforts, it's greatly appreciated 🎉

jonnyawsom3

2025-08-14 02:24:14

Sorry we couldn't give more straightforward answers, I mostly know the CLI and parameters, while the devs who know the API are busy with jxl-rs 😅

Traneptora

2025-08-14 03:35:05	currently a bit busy but I'm trying some more memory savings with hydrium
2025-08-14 03:35:10	using dist clustering

jonnyawsom3

2025-08-14 04:48:15

Don't suppose you could add a basic quality option? I imagine it wouldn't have a large impact to memory or performance scaling the quant tables and such

Traneptora

	Don't suppose you could add a basic quality option? I imagine it wouldn't have a large impact to memory or performance scaling the quant tables and such
2025-08-14 07:22:45	Wouldn't be too hard but I consider it lower priority
2025-08-14 07:23:26	considering adding tetrahedral 3dlut too. I don't think it is faster on modern x86 but should be without fma

jonnyawsom3

	Traneptora If you subscribe to the `BOX` and `BOX_COMPLETE` events, it always expects more boxes, so it returns `NEED_MORE_INPUT` even when there's no more boxes remaining
2025-08-15 02:15:42	Probably needs to track the listed size in the Image Header and how much has already been read

Traneptora

	Probably needs to track the listed size in the Image Header and how much has already been read
2025-08-16 03:10:52	Problem is boxes (e.g..exif) can occur after the last jxlp box
2025-08-16 03:14:17	so codestream size doesn't matter

jonnyawsom3

2025-08-16 03:17:45

Right, I forgot metadata is handled in `18181-2`, I was looking at `18181-1` thinking the image header covered it

Traneptora

2025-08-25 05:14:26	so I'm getting an interesting issue
2025-08-25 05:15:14	I have a PNG which is RGB (as they tend to be) with an XYB cjpegli profile (the file was a jpegli --xyb jpeg, decoded to PNG, with the iccp preserved)
2025-08-25 05:15:36	I encoded this PNG to a JXL with VarDCT lossy, and it's XYB encoded
2025-08-25 05:15:58	so I now have a JXL file which is xyb_encoded, and also has an attached ICC profile. a strange ICC profile, specifically, which is the cjpegli xyb profile
2025-08-25 05:16:13	Here's the JXL file
2025-08-25 05:17:02	using libjxl to request the pixel data in sRGB works as expected
2025-08-25 05:17:27	but using libjxl to request the pixel data in the space of the attached profile, using libjxl's cms engine, aborts
2025-08-25 05:32:36	I do the following
2025-08-25 05:36:26	```c // JXL_DEC_COLOR_ENCODING event handler const JxlCmsInterface cms = JxlGetDefaultCms(): JxlDecoderSetCms(decoder, cms); size_t icc_len; JxlDecoderGetICCProfileSize(decoder, JXL_COLOR_PROFILE_TARGET_ORIGINAL, &icc_len); uint8_t *icc_data = malloc(icc_len); JxlDecoderGetColorAsICCProfile(decoder, JXL_COLOR_PROFILE_TARGET_ORIGINAL, icc_data, icc_len); JxlDecoderSetOutputColorProfile(decoder, NULL, icc_data, icc_len); // rest of decoder loop ``` There's two issues
2025-08-25 05:38:18	1. It aborts at `stage_from_linear.cc` on line 178. This is a bug. ```cpp } else { // This is a programming error. JXL_DEBUG_ABORT("Invalid target encoding"); return nullptr; } ``` I tracked it down to `dec_cache.cc` line 324, which can be patched into the following. ```diff diff --git a/lib/jxl/dec_cache.cc b/lib/jxl/dec_cache.cc index 7b6d54d2..cce6cede 100644 --- a/lib/jxl/dec_cache.cc +++ b/lib/jxl/dec_cache.cc @@ -321,8 +321,10 @@ Status PassesDecoderState::PreparePipeline(const FrameHeader& frame_header, const size_t channels_dst = output_encoding_info.color_encoding.Channels(); bool mixing_color_and_grey = (channels_dst != channels_src); - if ((output_encoding_info.color_encoding_is_original) \|\| - (!output_encoding_info.cms_set) \|\| mixing_color_and_grey) { + if ((output_encoding_info.color_encoding_is_original && + !(output_encoding_info.color_encoding.WantICC() && + output_encoding_info.xyb_encoded)) \|\| (!output_encoding_info.cms_set) + \|\| mixing_color_and_grey) { // in those cases we only need a linear stage in other cases we attempt // to obtain a cms stage: the cases are // - output_encoding_info.color_encoding_is_original: no cms stage ```
2025-08-25 05:38:39	Once I fix this bug, there's another issue:
2025-08-25 05:58:16	``` ./lib/jxl/cms/jxl_cms.cc:793: LCMS error 13: Couldn't link the profiles ./lib/jxl/cms/jxl_cms.cc:1306: JXL_ERROR: Failed to create transform ./lib/jxl/color_encoding_internal.h:337: JXL_RETURN_IF_ERROR code=1: cms_data_ != nullptr ./lib/jxl/render_pipeline/stage_cms.cc:133: JXL_RETURN_IF_ERROR code=1: color_space_transform->Init( c_src_, output_encoding_info_.color_encoding, output_encoding_info_.desired_intensity_target, xsize_, num_threads) ./lib/jxl/render_pipeline/render_pipeline.cc:131: JXL_RETURN_IF_ERROR code=1: stage->PrepareForThreads(num) ./lib/jxl/dec_frame.h:275: JXL_RETURN_IF_ERROR code=1: dec_state_->render_pipeline->PrepareForThreads( storage_size, use_group_ids) ./lib/jxl/dec_frame.cc:700: JXL_RETURN_IF_ERROR code=1: PrepareStorage(num_threads, decoded_passes_per_ac_group_.size()) ./lib/jxl/base/data_parallel.h:76: JXL_FAILURE: [DecodeGroup] failed ./lib/jxl/dec_frame.cc:725: JXL_RETURN_IF_ERROR code=1: RunOnPool(pool_, 0, ac_group_sec.size(), prepare_storage, process_group, "DecodeGroup") ./lib/jxl/decode.cc:1141: frame processing failed ```
2025-08-25 05:58:25	The big one is lcms couldn't link the profiles
2025-08-25 05:58:42	however, setting the profiles succeeded
2025-08-25 06:06:19	I am wondering if it is possible for libjxl to try to link the profiles when you set the output profile
2025-08-25 06:06:37	so if it failed you'd know immediately

jonnyawsom3

2025-08-25 06:07:25

Might be related, maybe not. The jpegli ICC only has the decoding information to go to RGB, the color management could be tripping up because it has no conversion to use

Traneptora

2025-08-25 06:07:49	well it's RGB -> RGB right? in theory
2025-08-25 06:08:05	I'm just looking for a profile that won't be scraped into an enum
2025-08-25 06:08:07	and that was one

spider-mario

2025-08-25 07:21:25	the jpegli ICC has XYB (“RGB”) to PCS (profile connection space – I don’t remember if we made that XYZD50 or CIELAB), but not PCS->XYB
2025-08-25 07:21:30	so it can’t be used as a destination profile
2025-08-25 07:22:15	an “RGB->RGB” transform is really RGB->PCS->RGB
2025-08-25 07:24:07	you can try this instead

nol

2025-08-26 07:55:53

In some PRs (https://github.com/libjxl/libjxl/pull/2657 or https://github.com/libjxl/libjxl/pull/4236), the "jyrki31" collection of images is used for benchmarking. Is this publicly accessible and if so, can someone help me locate it?

CrushedAsian255

	nol In some PRs (https://github.com/libjxl/libjxl/pull/2657 or https://github.com/libjxl/libjxl/pull/4236), the "jyrki31" collection of images is used for benchmarking. Is this publicly accessible and if so, can someone help me locate it?
2025-08-27 04:53:03	Its probably refering to <@532010383041363969>

Traneptora

	spider-mario you can try this instead
2025-08-27 12:25:36	the problem I encountered was cjxl helpfully parsed the ICC and turned it into an enum space which I was trying to avoid for my testing, so I just added a bunch of code to hydrium. now it has a public api function `hyd_set_suggested_icc_profile(HYDEncoder encoder, const uint8_t icc_data, size_t icc_size)` which is nice
2025-08-27 12:26:03	so I spinned it up and tested it out and now the file won't decode with djxl
2025-08-27 12:26:18	I believe the file is valid because the other two decoders (jxlatte, jxl-oxide) both decode the file
2025-08-27 12:26:32	it only refuses to decode if hydrium isn't set to output one frame
2025-08-27 12:27:17	and now we have
2025-08-27 12:27:17	https://github.com/libjxl/libjxl/issues/4419

spider-mario

	Traneptora the problem I encountered was cjxl helpfully parsed the ICC and turned it into an enum space which I was trying to avoid for my testing, so I just added a bunch of code to hydrium. now it has a public api function `hyd_set_suggested_icc_profile(HYDEncoder encoder, const uint8_t icc_data, size_t icc_size)` which is nice
2025-08-27 02:06:05	for what it’s worth, it shouldn’t enum-ize the profile I attached – and that one should work as destination profile

Traneptora

	spider-mario for what it’s worth, it shouldn’t enum-ize the profile I attached – and that one should work as destination profile
2025-08-27 03:22:47	I can try that one then, since it appears just using a PQ profile on an XYB image has issues
	spider-mario so it can’t be used as a destination profile
2025-08-27 03:27:21	problem here is `JxlDecoderSetOutputColorProfile` doesn't return an error in this case.
2025-08-27 03:27:54	The documentation says it should fail
2025-08-27 03:27:55	> If a color management system (CMS) has been set with JxlDecoderSetCms, and the CMS supports output to the desired color encoding or ICC profile, then it will provide the output in that color encoding or ICC profile. If the desired color encoding or the ICC is not supported, then an error will be returned.
2025-08-27 03:28:06	but what happens is it returns successfully and then the conversion fails during the conversion step

spider-mario

2025-08-27 03:46:35	out of curiosity, is that also the case with libjxl built with skcms instead of lcms2?
2025-08-27 03:46:43	I think I remember an explicit check for that there

Traneptora

	spider-mario out of curiosity, is that also the case with libjxl built with skcms instead of lcms2?
2025-08-29 05:22:49	I don't know, I can check

A homosapien

	nol In some PRs (https://github.com/libjxl/libjxl/pull/2657 or https://github.com/libjxl/libjxl/pull/4236), the "jyrki31" collection of images is used for benchmarking. Is this publicly accessible and if so, can someone help me locate it?
2025-09-02 07:37:16	Found it https://storage.googleapis.com/artifacts.jpegxl.appspot.com/corpora/jyrki-full.tar

nol

2025-09-03 05:25:28

Awesome, thank you very much!

jonnyawsom3

2025-09-04 12:49:32	Continuing from https://discord.com/channels/794206087879852103/803574970180829194/1413134912567119962
2025-09-04 12:49:41

veluca


2025-09-04 12:55:00	I looked a bit at the libjxl code and it looks like memory usage might be higher if using noise
2025-09-04 12:55:10	(I mean, a _lot_ higher)
2025-09-04 12:55:18	not sure if that's what's going on here

jonnyawsom3

	veluca I looked a bit at the libjxl code and it looks like memory usage might be higher if using noise
2025-09-04 12:55:56	I think you were onto something looking at the render pipeline ```wintime -- djxl --num_threads 1 Alpha.jxl nul --output_format ppm JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8} Decoded to pixels. 7680 x 4320, 19.797 MP/s, 1 threads. PageFaultCount: 382918 PeakWorkingSetSize: 1.064 GiB QuotaPeakPagedPoolUsage: 36.4 KiB QuotaPeakNonPagedPoolUsage: 14.6 KiB PeakPagefileUsage: 1.21 GiB Creation time 2025/09/04 13:55:07.422 Exit time 2025/09/04 13:55:09.180 Wall time: 0 days, 00:00:01.757 (1.76 seconds) User time: 0 days, 00:00:00.546 (0.55 seconds) Kernel time: 0 days, 00:00:01.203 (1.20 seconds) wintime -- djxl --num_threads 1 Opaque.jxl nul --output_format ppm JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8} Decoded to pixels. 7680 x 4320, 35.496 MP/s, 1 threads. PageFaultCount: 77411 PeakWorkingSetSize: 198.2 MiB QuotaPeakPagedPoolUsage: 36.4 KiB QuotaPeakNonPagedPoolUsage: 6.898 KiB PeakPagefileUsage: 195.6 MiB Creation time 2025/09/04 13:55:17.141 Exit time 2025/09/04 13:55:18.142 Wall time: 0 days, 00:00:01.001 (1.00 seconds) User time: 0 days, 00:00:00.109 (0.11 seconds) Kernel time: 0 days, 00:00:00.890 (0.89 seconds)```
2025-09-04 12:56:05	Alpha adds 5x more memory usage

veluca

2025-09-04 12:56:39	so the good news is that all of this _should_ be fixable in jxl-rs
2025-09-04 12:56:50	the bad news is that I need to write that code 😛

jonnyawsom3

2025-09-04 12:58:24

I have to say, I'm surprised memory usage doesn't change with thread count much, if at all. I would've assumed the data is buffered until a thread starts working on it

veluca

2025-09-04 12:58:48

it should without alpha

jonnyawsom3

2025-09-04 12:59:59

By less than 1MB, with Alpha by 4MB ```wintime -- djxl --num_threads 8 Alpha.jxl nul --output_format ppm JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8} Decoded to pixels. 7680 x 4320, 65.846 MP/s, 8 threads. PageFaultCount: 383747 PeakWorkingSetSize: 1.068 GiB QuotaPeakPagedPoolUsage: 36.4 KiB QuotaPeakNonPagedPoolUsage: 15.53 KiB PeakPagefileUsage: 1.213 GiB Creation time 2025/09/04 13:57:20.063 Exit time 2025/09/04 13:57:20.654 Wall time: 0 days, 00:00:00.590 (0.59 seconds) User time: 0 days, 00:00:01.203 (1.20 seconds) Kernel time: 0 days, 00:00:01.859 (1.86 seconds) wintime -- djxl --num_threads 8 Opaque.jxl nul --output_format ppm JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8} Decoded to pixels. 7680 x 4320, 176.738 MP/s, 8 threads. PageFaultCount: 80059 PeakWorkingSetSize: 199.4 MiB QuotaPeakPagedPoolUsage: 36.4 KiB QuotaPeakNonPagedPoolUsage: 7.961 KiB PeakPagefileUsage: 200.4 MiB Creation time 2025/09/04 13:57:26.400 Exit time 2025/09/04 13:57:26.654 Wall time: 0 days, 00:00:00.253 (0.25 seconds) User time: 0 days, 00:00:00.281 (0.28 seconds) Kernel time: 0 days, 00:00:01.156 (1.16 seconds)```

veluca

2025-09-04 01:02:55

weird, but I won't complain

jonnyawsom3

2025-09-04 01:02:58	This also gives another reason to strip empty alpha on encode, or at least detect it and minimize decode time (g3 LZ77 only should be fast, but testing needed)
	veluca (I mean, a _lot_ higher)
2025-09-04 01:08:11	Also seeing a 30% increase with noise, but nowhere near the 5x from alpha blending ```wintime -- djxl --num_threads 1 Noise.jxl nul --output_format ppm JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8} Decoded to pixels. 7680 x 4320, 25.457 MP/s, 1 threads. PageFaultCount: 94708 PeakWorkingSetSize: 256.5 MiB QuotaPeakPagedPoolUsage: 36.4 KiB QuotaPeakNonPagedPoolUsage: 7.695 KiB PeakPagefileUsage: 259.1 MiB Creation time 2025/09/04 14:06:36.344 Exit time 2025/09/04 14:06:37.711 Wall time: 0 days, 00:00:01.366 (1.37 seconds) User time: 0 days, 00:00:00.093 (0.09 seconds) Kernel time: 0 days, 00:00:01.281 (1.28 seconds)```

veluca

2025-09-04 01:08:28

weird

jonnyawsom3

2025-09-04 01:09:18	Combining both does get a little hairy though ```wintime -- djxl --num_threads 1 AlphaNoise.jxl nul --output_format ppm JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8} Decoded to pixels. 7680 x 4320, 14.498 MP/s, 1 threads. PageFaultCount: 538869 PeakWorkingSetSize: 1.658 GiB QuotaPeakPagedPoolUsage: 36.4 KiB QuotaPeakNonPagedPoolUsage: 20.18 KiB PeakPagefileUsage: 1.819 GiB Creation time 2025/09/04 14:08:57.798 Exit time 2025/09/04 14:09:00.167 Wall time: 0 days, 00:00:02.368 (2.37 seconds) User time: 0 days, 00:00:00.671 (0.67 seconds) Kernel time: 0 days, 00:00:01.687 (1.69 seconds)```
2025-09-04 01:10:17	Around 18x more memory than the image itself
2025-09-04 01:23:02	Oxide has it's own quirks, we've had certain files use 4GB of memory for a 4K image IIRC

Quackdoc

2025-09-05 01:06:56

<@179701849576833024> I fell asleep :D, I printed out `num_buffers` and twice it prints `480` ``` ./tools/djxl --num_threads 1 mona-jxl.jxl --disable_output JPEG XL decoder v0.12.0 9f29783e [_AVX2_,SSE4,SSE2] {GNU 15.2.1} Number of buffers: 480 Number of buffers: 480 Decoded to pixels. 7432 x 3877, 18.364 MP/s, 1 threads. ```

veluca

2025-09-05 01:07:19	that explains it
2025-09-05 01:07:21	noise?

Quackdoc

2025-09-05 01:09:45

not entirely sure, it's been a while since I made the image, any way I can easily check?

2025-09-05 01:10:10

``` ./tools/jxlinfo -v mona-jxl.jxl JPEG XL image, 7432x3877, lossy, 8-bit RGB+Alpha Number of color channels: 3 Number of extra channels: 1 Extra channel 0: type: Alpha bits per sample: 8 alpha premultiplied: 0 (Non-premultiplied) Have preview: 0 Have animation: 0 Intrinsic dimensions: 7432x3877 Orientation: 1 (Normal) Color space: RGB White point: D65 Primaries: sRGB Transfer function: sRGB Rendering intent: Relative jxl-oxide info -v mona-jxl.jxl 2025-09-05T01:10:01.624293Z DEBUG jxl_render: Setting default output color encoding default_color_encoding=ColorEncodingWithProfile { encoding: Enum(EnumColourEncoding { colour_space: Rgb, white_point: D65, primaries: Srgb, tf: Srgb, rendering_intent: Relative }), icc_profile: (0 byte(s)), is_cmyk: false } JPEG XL image (BareCodestream) Image dimension: 7432x3877 Bit depth: 8 bits XYB encoded, suggested display color encoding: Colorspace: RGB White point: D65 Primaries: sRGB Transfer function: sRGB Extra channel info: #0 Alpha Frame #0 (keyframe) VarDCT (lossy) Frame type: Regular 7432x3877; (0, 0 ```

veluca

2025-09-05 01:12:44	ah, wait, it has alpha
2025-09-05 01:12:58	if that lossy alpha that should explain it
2025-09-05 01:13:18	and yes jxl-rs will do better 😛

Quackdoc

2025-09-05 02:52:51

yay :D

jonnyawsom3

2025-09-05 03:06:46

Wasn't jxl-rs also using half instead of float internally? That should knock out a lot of overhead for 8bit images

veluca

2025-09-05 07:07:37

Maybe

Quackdoc

2025-09-05 07:08:38

right now thumbnailing some of my folders that have a good number of these images is rough and will oom my PC lol

AccessViolation_

2025-09-05 09:30:24

does anyone else have issues encoding this image? I get `Getting pixel data failed`. a different PNG worked. I'm on 0.11.0

RaveSteel

	AccessViolation_ does anyone else have issues encoding this image? I get `Getting pixel data failed`. a different PNG worked. I'm on 0.11.0
2025-09-05 09:33:42	Source for the image?

AccessViolation_

	RaveSteel Source for the image?
2025-09-05 09:34:30	https://redlib.privadency.com/r/Silksong/comments/1n82yik/one_last_fanart_before_its_real_rework_of_a/
2025-09-05 09:37:21	the full images from discord's and reddit's cdn are bit identical so it shouldn't matter where you get it from

jonnyawsom3

	AccessViolation_ the full images from discord's and reddit's cdn are bit identical so it shouldn't matter where you get it from
2025-09-05 09:38:43	God bless the EXIF fix
2025-09-05 09:39:23	Trailing data I'd assume?

RaveSteel

	AccessViolation_ the full images from discord's and reddit's cdn are bit identical so it shouldn't matter where you get it from
2025-09-05 09:39:30	I was just asking to find the artist xd

AccessViolation_

2025-09-05 09:39:49	I figured :p
2025-09-05 09:39:52	it's nice art
	Trailing data I'd assume?
2025-09-05 09:40:55	hmm you think?
2025-09-05 09:44:18	could this be the trailing data in question
2025-09-05 09:46:16	yep seems like anything beyond that point isn't classified as anything part of the PNG format

RaveSteel

2025-09-05 10:28:51

I wonder where that trailing data comes from

jonnyawsom3

2025-09-06 02:23:07

https://github.com/libjxl/jxl-rs/releases/tag/v0.1.0

veluca

2025-09-06 06:13:23

don't read too much into it, I just wanted to grab the crates.io crate name before it was too late 😛

Quackdoc

2025-09-06 08:13:53

did they stop allowing name squatting?

veluca

2025-09-06 08:55:54	yup
2025-09-06 08:56:14	I had to send a request to take down the previous namesquat already

jonnyawsom3

2025-09-06 09:02:09

Hey, everything is progress

HCrikki

2025-09-06 01:56:16

whats the conformity level for jxl-rs 0.1.0 ? could someone keep https://libjxl.github.io/bench/ updated ?

Meow

2025-09-07 03:36:41

Homebrew pushes the newer version! `jpeg-xl 0.11.1_2 -> 0.11.1_3`

AccessViolation_

	veluca don't read too much into it, I just wanted to grab the crates.io crate name before it was too late 😛
2025-09-08 08:57:03	you can't stop me! HEAR YE JXL RS IS COMINGG

Meow

2025-09-08 09:03:07	It already came
2025-09-09 07:40:31	https://github.com/libjxl/jxl-rs/releases/tag/v0.1.1

Amiralgaby 🕊

	AccessViolation_ does anyone else have issues encoding this image? I get `Getting pixel data failed`. a different PNG worked. I'm on 0.11.0
2025-09-12 11:48:27	I had the same error. Cause Not implemented = error data pixel instead of not implemented error or warning

jonnyawsom3

2025-09-15 01:08:04

<@263300458888691714> back when you improved patch detection, was that by tweaking values in the exisiting heuristic, or did you make something entirely new for it? Wondering if we could try and replicate it

_wb_

2025-09-15 01:32:54	the current heuristic starts from the assumption of a solid background, but that's just covering a specific type of patches (letter-like stuff)
2025-09-15 01:37:45	for something like arbitrary repetitive stuff (say an oldschool game screenshot where most stuff is composed out of repeating sprites), the current heuristic doesn't work at all
2025-09-15 01:39:32	some intra block copy algorithm from a video codec should be added to detect such kinds of repetition.

monad

2025-09-16 01:29:30

The first results were the consequence of relaxing existing constraints (particularly around the background requirement), then of mixing some regular grid scanning based on detected patch size, then of implementing a dedicated algorithm assuming a grid. My strategy was naive and its results only intended to inspire. Besides efficiently finding patch candidates, one should judge which features would practically benefit density. A singular bright pixel may be the most important feature to extract, while cutting out large sections of a texture or deduplicating photo content may be harmful. Even the existing algorithm is dubious or damaging outside strict text content.

Demiurge

2025-09-16 05:17:25	I kinda dislike how the jxl tools use so many threads by default. Especially with SMT cores, it uses way more threads than actual physical cores on your CPU, by default. Which is wasteful.
2025-09-16 05:18:16	And even if it were smart enough to count physical cores instead of logical cores, the scaling is still very sub-linear.

jonnyawsom3

Demiurge And even if it were smart enough to count physical cores instead of logical cores, the scaling is still very sub-linear.

2025-09-16 05:42:18

1 Thread ```3840 x 2160, geomean: 0.744 MP/s [0.731, 0.751], 5 reps, 0 threads. Wall time: 0 days, 00:00:55.997 (56.00 seconds) User time: 0 days, 00:00:00.937 (0.94 seconds) Kernel time: 0 days, 00:00:54.984 (54.98 seconds)``` 2 Threads ```3840 x 2160, geomean: 1.456 MP/s [1.434, 1.457], 5 reps, 2 threads. Wall time: 0 days, 00:00:28.724 (28.72 seconds) User time: 0 days, 00:00:01.078 (1.08 seconds) Kernel time: 0 days, 00:00:55.093 (55.09 seconds)``` 4 Threads ```3840 x 2160, geomean: 2.719 MP/s [2.653, 2.774], 5 reps, 4 threads. Wall time: 0 days, 00:00:15.472 (15.47 seconds) User time: 0 days, 00:00:01.218 (1.22 seconds) Kernel time: 0 days, 00:00:56.984 (56.98 seconds)``` Seems pretty linear to me

Demiurge

2025-09-16 06:25:24	a: 1 thread: 82.276 MP/s 2 threads: 158.567 MP/s 4 threads: 291.051 MP/s b: 2 threads: 207.995 MP/s 4 threads: 388.176 MP/s
2025-09-16 06:25:42	(These are with an unusually-massive image)

diskorduser

2025-09-16 06:26:56

Sees Okay-ish to me.

Demiurge

2025-09-16 06:32:55	going from 9 to 10 threads adds about 13 MP/s; going from 8 to 9 adds 40 MP/s. going from 1 to 2 adds 100 MP/s.
2025-09-16 06:33:25	of a difference
2025-09-16 06:33:43	I think "use as many threads as logical cores" is a bad default
2025-09-16 06:34:13	Like even if that's your intention you are better off counting physical cores instead of logical cores.
2025-09-16 06:34:24	Since beyond that you get zero speedup
2025-09-16 06:36:28	And you're just reducing efficiency after that and spamming the CPU scheduler

jonnyawsom3

Demiurge Since beyond that you get zero speedup

2025-09-16 06:52:58

On an 8 core CPU 8 Threads ```3840 x 2160, geomean: 5.715 MP/s [5.512, 5.769], 5 reps, 8 threads. Wall time: 0 days, 00:00:07.450 (7.45 seconds) User time: 0 days, 00:00:01.906 (1.91 seconds) Kernel time: 0 days, 00:00:50.921 (50.92 seconds)``` 16 Threads ```3840 x 2160, geomean: 7.473 MP/s [7.122, 7.626], 5 reps, 16 threads. Wall time: 0 days, 00:00:05.743 (5.74 seconds) User time: 0 days, 00:00:06.093 (6.09 seconds) Kernel time: 0 days, 00:01:05.437 (65.44 seconds)``` 30% faster is not zero. Considering SMT is usually around 40-50% the speed of real cores, and my system in in use, that's still pretty linear

Demiurge

2025-09-16 06:59:00	If you disable SMT in the bios, you will probably get even better results...
2025-09-16 06:59:32	On my system, with SMT enabled, and 12 physical cores, I get a small improvement with 13 threads and zero improvement at 14
2025-09-16 07:04:06	I do not think there is any reason to use so many threads as the default setting honestly...
2025-09-16 07:04:51	Matching the number of physical cores would make more logical sense :)

jonnyawsom3

	Demiurge On my system, with SMT enabled, and 12 physical cores, I get a small improvement with 13 threads and zero improvement at 14
2025-09-16 07:07:37	What resolution and encode settings?

Demiurge

2025-09-16 07:43:39	I'm decoding, not encoding
2025-09-16 07:44:06	And it's an unusually-large image but I have 128G RAM

jonnyawsom3

2025-09-16 07:50:36	Huh, you're right, djxl does hit a roadblock with a single hyperthread
2025-09-16 07:57:37	Moving to <#803645746661425173>

Demiurge

2025-09-16 08:02:56	This is a physical feature, not a software feature. It's the nature and design of SMT
2025-09-16 08:03:20	You are limited by the number of physical cores.

Kupitman

	Demiurge If you disable SMT in the bios, you will probably get even better results...
2025-09-16 09:12:03	no, OS always trying use physical cores, and get more perfomance from SMT

Demiurge

2025-09-16 09:48:54	It's well known that disabling SMT improves performance. SMT is not designed to improve performance. It's designed to sacrifice performance but improve core scheduling in return.
2025-09-16 09:49:43	It's a feature that makes sense for servers and not desktops.
2025-09-16 09:50:52	Some servers even have 8 logical cores per physical core.
2025-09-16 09:51:26	But it's not able to do 8 times the work... It's still limited by the 1 physical core.
2025-09-16 09:51:59	It's just able to split the work more efficiently.
2025-09-16 09:52:16	With less overhead of context switching

ignaloidas

2025-09-16 11:05:00	Not really? SMT is meant to improve core resource utilization, with very few changes to the core itself once you have a proper OoO core (just one extra instruction decoder and another set of registers)
2025-09-16 11:06:34	of course some workloads can utilize all of the cores resources, and with them you're not going to see the improvement, but most workloads that aren't "a whole bunch of tight math with little branching" do end up being faster
2025-09-16 11:07:00	any advantage from more instruction decoders leading to less context switching is mostly incidental

A homosapien

	Demiurge It's well known that disabling SMT improves performance. SMT is not designed to improve performance. It's designed to sacrifice performance but improve core scheduling in return.
2025-09-16 11:14:53	That's a very bold claim, do you have a source for that?

Demiurge

2025-09-16 11:22:02	I take it for granted based on my own personal experience. My experience led me to read more about it to see if other people have had similar results to my own. And from what I can gather, there seems to be a consensus that disabling SMT makes sense for desktop machines and increases per-core performance.
2025-09-16 11:22:45	which does not seem very outrageous or surprising to me.

Info

JPEG XL

General chat

Voice Channels

Archived

libjxl